Design it right, Do Math, Check %

Cloud is getting traction. Customer wants to get rid of existing on-premise server, as they see cloud the way forward to manage up-time, upgrades, elasticity and reduce cost. Hold-on Up-time, Yes there are SLAs for up-time. But does that mean customer has nothing to do ???  Every OEM and their every service carry its own Up-time backed by respective SLAs. Designing Highly Available Infrastructure Architecture totally depends upon customer’s Architects.

Read through SLA documents prior is highly recommended. Specifically, Microsoft Azure has made great effort to make it quite easy for customer to review SLA for every services. All Azure component SLA are available here. Then doing Simple mathematics can help setting up right expectations like below;

  • Azure Virtual Machines give 99.9% uptime (VM Connectivity) if you have premium storage as Data Disk, it becomes 99.95% if you have Availability Set configured for any scenario i.e. any type of disk any type of OS. Similar kind of SLA based available for other components. So, spend sometime and read through it as there would be lots of if’s and but’s.
  • What is ‘9’s? Two, Three, … 9’s matter to us only if we know how many hours/minutes systems would not be available. Below is simple visibility of hours, minutes available/non-available across Year/Month/Week/Day9's
  • Simple Example below may simplify further more w.r.t. to Infrastructure designing on Azure. Leveraging concept such Availability Set, Regional Replication using Hybrid Connectivity, DB replication mechanism and Solution like External Load Balancer (with rules inside) can help reducing risk of any downtime as much as possible. This is simple representation of Architecture design on Azure. It may not be covering details about it which is not objective of this blog post.Architectur Design Simple

 

  • Above mentioned example consist of an simple application using Tiers for Web, DB and Authentication. Infrastructure is being designed in two Regions (1) Region A and (2) Region B
    • Scenario 1: If we use single region HA approach. Then, we may have following amount of risk;
      S1
    • Scenario 2: If we use Single Region HA approach backed by Geo-redundant site (non-HA), then follow amount of risk;
    • Scenarion 2
    • Scenario 2: If we use Single Region HA approach backed by Geo-redundant site (non-HA), then follow amount of risk;
    • Scenarion 3

Therefore, if you really thinking of moving any mission critical application which has impact of million $$ in single minutes of transaction, then think of Highly Available architecture design which does not only depends on specific  geography but also leverage all possible component of true hyper scale cloud like Azure.