Virtual Machine Availability, Downtime and Scale

Virtual Machine Availability, Downtime and Scale

Virtual Machine Availability and Downtime

Azure VMs can be affected by:

  1. Unplanned Hardware Maintenance

    • Triggered when Azure predicts a hardware/platform failure.

    • Uses Live Migration to move VMs to healthy hardware, causing minimal downtime.

  2. Unexpected Downtime

    • Caused by hardware/network failures.

    • Azure automatically migrates the VM to healthy hardware, but a reboot may occur, and temporary disk data can be lost.

  3. Planned Maintenance

    • Routine updates to the underlying Azure platform.

    • Usually no impact on VMs.

    • Microsoft does not update your VM OS or software—this is the administrator’s responsibility.


2. Availability Sets

Purpose: Minimize impact of downtime by avoiding single points of failure.

  • VMs in an availability set are distributed across:

    • Fault Domains (FDs): Physical separation (racks, power, networking). At least 2 FDs per set.

    • Update Domains (UDs): Logical groupings for rolling updates. By default 5 UDs, can configure up to 20.

Best Practices:

  • Place multiple VMs in an Availability Set for redundancy.

  • Separate application tiers into different Availability Sets.

  • Combine with Load Balancer and Managed Disks.

SLA Guarantees:

  • Two or more VMs across Availability Zones: 99.99% connectivity.

  • Two or more VMs in an Availability Set: 99.95% connectivity.

  • Single VM using Premium Storage: 99.9% connectivity.


3. Availability Zones

  • High availability across physical datacenters in a region.

  • Each zone has independent power, cooling, and networking.

  • Minimum of 3 zones per region.

  • Use zonal services (resources pinned to a zone) or zone-redundant services (automatic replication).

  • SLA: 99.99% VM uptime when properly configured.


4. Scaling Concepts

Vertical Scaling (Scale Up/Down)

  • Change the size of a VM (CPU, memory).

  • Useful for under-utilized or resource-intensive VMs.

  • Limited by hardware availability; usually requires VM restart.

Horizontal Scaling (Scale Out/In)

  • Change the number of VM instances.

  • More flexible in cloud environments.

  • Supports autoscale to dynamically adjust based on workload.


5. Virtual Machine Scale Sets

Purpose: Manage a group of identical VMs with auto-scaling capability.

Benefits:

  • Simplifies management of hundreds of VMs.

  • Supports Azure Load Balancer (Layer 4) and Application Gateway (Layer 7).

  • Autoscale adjusts VM count dynamically to meet demand.

  • Supports up to 1,000 VMs (300 for custom images).

Configuration Considerations:

  • Initial instance count, VM size, managed disks, Azure Spot instances.

  • Spreading algorithm (max spreading recommended).

  • Scaling beyond 100 instances may require multiple placement groups.


6. Autoscale

Features:

  • Automatically adjusts VM count based on performance metrics.

  • Scale Out: Increase VM instances when demand rises.

  • Scale In: Decrease VM instances when demand falls.

  • Supports scheduled scaling (fixed time scaling).

Configuration Parameters:

  • Minimum/Maximum number of VMs

  • Default number of VMs

  • CPU thresholds for scale-out and scale-in

  • Number of VMs to add/remove per scaling event

Benefits: Reduces cost and management overhead while ensuring application performance.