Avoiding allocation failure error in Microsoft Azure

June 21, 2014, 3:06 am

≫ Next: Use port pings instead of ICMP to test Azure VM connectivity

In some cases you may encounter an allocation failure error when you are trying to perform an action which includes:

Deleting and recreating a VM
Creating a new VM
Deploying new packages
Deploying updates to existing deployments
Starting a VM from stopped de-allocated state

Below let’s take a look at the different versions of this error message that you may see depending on the type of operation that you are trying to perform and also let’s understand what an affinity group is and how it looks within a datacenter before finally taking a look at the reason and recommendations to avoid this allocation failure.

Different versions of allocation failure errors

"Allocation failed; unable to satisfy constraints in request. The requested new service deployment is bound to an Affinity Group, or it targets a Virtual Network, or there is an existing deployment under this hosted service. Any of these conditions constrains the new deployment to specific Azure resources. Please retry later or try reducing the VM size or number of role instances. Alternatively, if possible, remove the aforementioned constraints or try deploying to a different region."

"Service allocation failure"

“The server encountered an internal error. Please retry the request.”

Understanding affinity groups

Affinity groups are a way you can group your cloud services by proximity to each other in the Azure datacenter in order to achieve optimal performance. When you create an affinity group, it lets Azure know to keep all of the services that belong to your affinity group as physically close to each other as possible. For example, if you want to keep the services running your data and your code together, you would specify the same affinity group for those cloud services. They would then run on hardware that is located close together in the datacenter. This can reduce latency and increase performance.

Affinity group from a datacenter perspective

Windows Azure partitions nodes in a datacenter into clusters, with each cluster containing about a thousand nodes/blades. When you create an affinity group this affinity group is pinned/bound to a particular cluster. All services created within an affinity group are housed within a single cluster, and currently don’t span across clusters/datacenter.

Reason behind this error

In some cases as availability of free blades or free space within a blade in a cluster fluctuates, the constraints to find an allocation for the service may not be met. When these constraints are not satisfied, we return back an allocation failure error message.

Recommendations to avoid this error

To avoid this error create multiple affinity groups to house services that are not required to be placed together, spin up smaller numbers of VM’s at a time, ensure retry logic is in place when an operation is unsuccessful and also utilize multiple datacenters for your deployments.

↧

Use port pings instead of ICMP to test Azure VM connectivity

June 22, 2014, 9:28 am

≫ Next: How to Copy files to/from Azure Storage

≪ Previous: Avoiding allocation failure error in Microsoft Azure

Because the ICMP protocol is not permitted through the Azure load balancer, you will notice that you are unable to ping an Azure VM from the internet, and from within the Azure VM, you are unable to ping internet locations.

Note that while this applies to network traffic going through the external IP (VIP) through configured endpoints, ICMP is not blocked when connecting through an Azure virtual network gateway or ExpressRoute. Also, ICMP will work between internal IPs of VMs in the same virtual network or in the same cloud service.

Also note that while an instance-level public IP lets you communicate directly to a specific VM instead of through the cloud service VIP that can be used for multiple VMs, ICMP is not permitted in that scenario either.

To test connectivity, we instead recommend that you do a port ping. While Ping.exe uses ICMP, other tools such as PsPing, Nmap, or Telnet allow you to test connectivity to a specific TCP port.

For example, trying to ping yahoo.com from within an Azure VM fails as expected with request timed out because ICMP is blocked at the Azure load balancer:

C:\>ping yahoo.com

Pinging yahoo.com [206.190.36.45] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 206.190.36.45:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

But using the Sysinternals PsPing tool, which allows you to test connectivity to a specific TCP port, you can successfully test connectivity from within the Azure VM to port 80 on an internet site.

C:\Users\craig\Downloads\PSTools>psping yahoo.com:80

PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 206.190.36.45:80:
5 iterations (warmup 1) connecting test:
Connecting to 206.190.36.45:80 (warmup): 53.25ms
Connecting to 206.190.36.45:80: 52.26ms
Connecting to 206.190.36.45:80: 52.14ms
Connecting to 206.190.36.45:80: 52.32ms
Connecting to 206.190.36.45:80: 51.48ms

TCP connect statistics for 206.190.36.45:80:
Sent = 4, Received = 4, Lost = 0 (0% loss),
Minimum = 51.48ms, Maximum = 52.32ms, Average = 52.05ms

Note that one exception to this is that ICMP pings will work to bing.com because Azure and Bing are both Microsoft properties.

C:\Users\craig\Downloads\PSTools>psping bing.com

PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

Pinging 204.79.197.200 with 32 bytes of data:
5 iterations (warmup 1) ping test:
Reply from 204.79.197.200: 6.85ms
Reply from 204.79.197.200: 2.47ms
Reply from 204.79.197.200: 2.30ms
Reply from 204.79.197.200: 2.95ms
Reply from 204.79.197.200: 2.39ms

Ping statistics for 204.79.197.200:
Sent = 4, Received = 4, Lost = 0 (0% loss),
Minimum = 2.30ms, Maximum = 2.95ms, Average = 2.53ms

Testing from on-premises to the Azure VM shows the same behavior. The ICMP traffic is blocked by the Azure load balancer and the ping requests timeout. But if you instead do a port ping, they will succeed (assuming the VM is running, isn't blocking the port in the guest firewall, and the port has a configured endpoint for the VM).

To confirm which ports are opened to the VM with Azure endpoints, in the Azure management portal, go to Virtual Machines, select the VM, then select Endpoints.

C:\>ping CLJun21WS12R2A.cloudapp.net

Pinging CLJun21WS12R2A.cloudapp.net [23.100.76.67] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 23.100.76.67:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\>psping CLJun21WS12R2A.cloudapp.net:56972

PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 23.100.76.67:56972:
5 iterations (warmup 1) connecting test:
Connecting to 23.100.76.67:56972 (warmup): 60.44ms
Connecting to 23.100.76.67:56972: 61.28ms
Connecting to 23.100.76.67:56972: 63.41ms
Connecting to 23.100.76.67:56972: 63.69ms
Connecting to 23.100.76.67:56972: 60.41ms

TCP connect statistics for 23.100.76.67:56972:
Sent = 4, Received = 4, Lost = 0 (0% loss),
Minimum = 60.41ms, Maximum = 63.69ms, Average = 62.20ms

C:\>psping CLJun21WS12R2A.cloudapp.net:5986

PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 23.100.76.67:5986:
5 iterations (warmup 1) connecting test:
Connecting to 23.100.76.67:5986 (warmup): 61.49ms
Connecting to 23.100.76.67:5986: 65.29ms
Connecting to 23.100.76.67:5986: 67.08ms
Connecting to 23.100.76.67:5986: 62.70ms
Connecting to 23.100.76.67:5986: 60.99ms

TCP connect statistics for 23.100.76.67:5986:
Sent = 4, Received = 4, Lost = 0 (0% loss),
Minimum = 60.99ms, Maximum = 67.08ms, Average = 64.02ms

↧

How to Copy files to/from Azure Storage

June 28, 2014, 1:56 am

≫ Next: Unable to add VM agent extension to Azure VM

≪ Previous: Use port pings instead of ICMP to test Azure VM connectivity

If you are looking for a reliable way to copy files to or from Azure storage then this blog post might just have the right answer.

Note: In this post I am not writing anything new but merely pointing to posts or tools that already exists.

Copying VHDs (Blobs) between Storage Accounts using Azure PowerShell

This is an excellent blog post which details the steps of copying VHDs from one account to another along with PowerShell examples.

http://michaelwasham.com/windows-azure-powershell-reference-guide/copying-vhds-blobs-between-storage-accounts/

This is the MDSN link for Start-AzureStorageBlobCopy which is the cmdlet used in the above blog post.

Downloading a VHD locally from Azure using PowerShell

The Save-AzureVhd cmdlet allows to download VHD images stored in a blob in to a file. It has parameters to configure the download process such as specifying the number of downloader threads that will be used or overwriting the file which already exists in the specified file path. Save-AzureVhd doesn't do any VHD format conversion and the blob is downloaded as it is.

Uploading a local VHD to Azure using PowerShell

The Add-AzureVhd cmdlet allows to upload on premise VHD images to a blob storage account as fixed VHD images. It has parameters to configure the upload process such as specifying the number of uploader threads that will be used or overwriting a blob which already exists in the specified destination uri. For on premise VHD images, patching scenario is also supported so that diff disk images can be uploaded without having to upload the already uploaded base images. SAS Uri is supported as well.

AzCopy – Uploading/Downloading files for Azure Blobs

AZCopy utility is designed to simplify the task of transferring data in to and out of a Windows Azure Storage account. You can use this as a standalone tool or incorporate this utility in an existing application.

You can download latest version of AzCopy from aka.ms/AzCopy.

Review these two blog posts for more information, syntax and examples regarding AzCopy tool.

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/12/03/azcopy-uploading-downloading-files-for-windows-azure-blobs.aspx#comments

http://blogs.msdn.com/b/windowsazurestorage/archive/2013/04/01/azcopy-using-cross-account-copy-blob.aspx

Hope this blog post is of assistance to you when you want to transfer/copy files to or from Azure Storage.

↧

Unable to add VM agent extension to Azure VM

July 1, 2014, 9:55 am

≫ Next: Event ID 2: Session "WindowsAzure-GuestAgent-Metrics" failed to start with the following error: 0xC0000035

≪ Previous: How to Copy files to/from Azure Storage

[Update 7/17/2014] Azure PowerShell 0.8.5 has released which resolves the inability to use VM agent extensions if the VM is in an availability set.

Links to install it via Web Platform Installer or Windows Standalone (MSI) are available on GitHub:

https://github.com/Azure/azure-sdk-tools/releases

When you try to add a VM agent extension such as BGInfo or VM Access to an Azure VM, the extension may not be enabled even though the command completes successfully. In the case of BGInfo this means the desktop background will not be updated, and in the case of VM Access, this means that attempts to enable RDP or reset your password will not work.

Both the VM agent itself and the BGInfo extension are added by default when creating a VM from an image if you leave Install VM Agent selected, which is the default. But for a VM where the agent was installed later, you would then use Azure PowerShell to configure extensions. A link to the agent MSI package is available at the Manage Extensions on MSDN.

For example, you would use this Azure PowerShell command to enable the BGInfo extension:

Get-AzureVM -ServiceName 'CLJun27WS12R2A' -Name 'CLJun27WS12R2A' | Set-AzureVMBGInfoExtension | Update-AzureVM

Though the command completes successfully, C:\WindowsAzure\Logs\WaAppAgent.log in the VM shows the agent did not install the extension (including the typo):

Skiping additional plugin installation because there is no specified plugin.

This issue can occur when the VM is part of an availability set. There is currently an issue with the Update-AzureVM cmdlet where it incorrectly puts AvailabilitySetName before ResourceExtensionReferences instead of after in the REST body of the API call. This issue is documented on GitHub:

2679 Update-AzureVM cannot add extension when the vm has an availability set.

This issue occurs with 0.8.4 or earlier versions of Azure PowerShell. A fix is planned for the next version. You can check the version by running Get-Module azure and looking at the Version column.

You can add -Debug to Update-AzureVM to see the incorrect ordering:

Get-AzureVM -ServiceName 'CLJun27WS12R2A' -Name 'CLJun27WS12R2A' | Set-AzureVMBGInfoExtension | Update-AzureVM -Debug

First make sure you are looking at the output after the Update-AzureVM operation begins since Get-AzureVM output will look similar above that and will show the ordering correctly:

VERBOSE: 11:21:29 AM - Begin Operation: Update-AzureVM

Then below that you can see that AvailabilitySetName is incorrectly ordered to be before ResourceExtensionReferences, when it should be after.

<AvailabilitySetName>clwestus</AvailabilitySetName>
<ResourceExtensionReferences>
<ResourceExtensionReference>
<ReferenceName>BGInfo</ReferenceName>
<Publisher>Microsoft.Compute</Publisher>
<Name>BGInfo</Name>
<Version>1.*</Version>
<ResourceExtensionParameterValues />
<State>Enable</State>
</ResourceExtensionReference>
</ResourceExtensionReferences>

Workaround

As a workaround, you can temporarily remove the VM from the availability set, then run the PowerShell command again to add the extension. After you confirm the extension has been added, you can then add the VM back to the availability set.

To remove the VM from the availability set, in the management portal, select Virtual Machines, select the VM, select the Configure tab, and change the Availability Set drop-down menu to Remove from Availability Set, then click Save at the bottom of the page. Before the change is made, the portal will prompt you saying that if the VM is currently running, it will need to be restarted to make the availability set change.

You can also script that mitigation with PowerShell:

Get the VM object and the current availability set name:

$servicename = '<service name>'
$name = '<VM name>'

$vm = Get-AzureVM -ServiceName $servicename -Name $name
$oldname = $vm.VM.AvailabilitySetName

Remove the VM from availability set:

$vm.VM.AvailabilitySetName = $null
$vm | Update-AzureVM

Add the BGInfo extension:

$vm | Set-AzureVMBGInfoExtension | Update-AzureVM

Add the VM back to the original availability set:

$vm.VM.AvailabilitySetName = $oldname
$vm | Update-AzureVM

In the case of the BGInfo extension, you'll need to sign out and sign back into the VM to see the wallpaper change.

C:\WindowsAzure\Logs\WaAppAgent.log will show the following when the BGInfo extension was successfully enabled:

Plugin enabled (name: Microsoft.Compute.BGInfo, version: 1.1)., Code: 0

↧

Event ID 2: Session "WindowsAzure-GuestAgent-Metrics" failed to start with the following error: 0xC0000035

July 9, 2014, 8:18 am

≫ Next: Azure Import/Export Service Error: The provided file doesn’t contain valid drive information

≪ Previous: Unable to add VM agent extension to Azure VM

You may notice the following error on an Azure VM in the event log under Applications and Services Logs, Microsoft, Windows, Kernel-EventTracing, Admin:

Log Name: Microsoft-Windows-Kernel-EventTracing/Admin
Source: Microsoft-Windows-Kernel-EventTracing
Event ID: 2
Task Category: Session
Level: Error
Keywords: Session
User: SYSTEM
Computer: CLJul8WS08R2A
Description:
Session "WindowsAzure-GuestAgent-Metrics" failed to start with the following error: 0xC0000035

This error can be safely ignored. In scenarios where you are monitoring for event log errors and want to prevent the error from being logged, you can use the following steps.

Run the following commands from an elevated PowerShell prompt.

Set the MetricsSelfSelectionSelected value in the registry to 2.

New-ItemProperty -path HKLM:\SOFTWARE\Microsoft\GuestAgent -name 'MetricsSelfSelectionSelected' -value 2 -type dword –force
Restart the Windows Azure Telemetry Service service.

Restart-Service WindowsAzureTelemetryService
Stop the WindowsAzure-GuestAgent-Metrics event trace session.

logman stop 'WindowsAzure-GuestAgent-Metrics' -ets

Note that setting MetricsSelfSelectionSelected to 2 in the registry and stopping the WindowsAzure-GuestAgent-Metrics event trace session has no impact on the normal functioning of the VM agent.

The logman command achieves the same as clicking Stop on the WindowsAzure-GuestAgent-Metrics under Event Trace Sessions in Performance Monitor:

↧

Azure Import/Export Service Error: The provided file doesn’t contain valid drive information

July 12, 2014, 3:13 am

≫ Next: Unable to setup AutoScale for Virtual Machines from the Azure Management Portal

≪ Previous: Event ID 2: Session "WindowsAzure-GuestAgent-Metrics" failed to start with the following error: 0xC0000035

The Windows Azure Import/Export Service enables you to move large amount of data in or out of your Microsoft Azure storage accounts. It does this by enabling you to securely ship hard disk drives directly to our Microsoft Azure datacenters. Once we receive the drives we’ll transfer the data to or from your Microsoft Azure Storage account.

For more information on Windows Azure Import/Export Service please refer to this article: http://azure.microsoft.com/en-us/documentation/articles/storage-import-export-service/

For each hard drive that you prepare with the Azure Import/Export tool, the tool will create a single journal file. You will need the journal files from all of your drives to create the import job. The journal file can also be used to resume drive preparation if the tool is interrupted.

You might come across the following error when you are adding a journal file while creating a new import job using the Azure Management Portal.

“Error: The provided file doesn’t contain valid drive information”.

This issue is caused due to incorrect date format in the journal file. When the .jrn file is opened in a text editor, you will find the date format is different from the expected format. The date format in the journal file will be “2014.07.06 11:52:53.704” but the expected date format is “2014/07/06 11:52:53.704”. The date format by default on Windows machine has “/” for the separator but this could be changed in the Regional Settings. If you have made changes in the regional settings to the date format then you may see this error.

We are aware of this issue and are working towards a fix. As a workaround edit the journal file using a text editor to find and replace all the dates in incorrect format from 2014.07.06 to 2014/07/06 format.

↧

Unable to setup AutoScale for Virtual Machines from the Azure Management Portal

July 14, 2014, 9:20 am

≫ Next: Learning Azure Service Management REST API through Powershell, and Azure CLI tools

≪ Previous: Azure Import/Export Service Error: The provided file doesn’t contain valid drive information

You can set up AutoScale for your Cloud Services, Virtual Machines, Web Sites from the Azure Management Portal. For Virtual Machines, autoscale can be configured for each availability set. You can setup AutoScale for your availability set only if:

You have only‘Standard’ VMs in your Availability Set. Basic VMs cannot participate in AutoScaling.
Your ‘Standard’ VMS are of same size.

Currently the Portal doesn’t allow you to configure AutoScale for the AvailabilitySet if you have a ‘Basic’ VM in the Cloud Service, even if that ‘Basic’ VM is not present in this AvailbilitySet for which you are configuring AutoScale. This is already identified as a problem, and our teams are working to fix it sooner in the Portal. I’ll update this blog post once this is resolved.

However, you can easily configure AutoScale for the Virtual Machines using the Service Management REST APIs. You can programmatically use the REST APIs, and pass in the necessary parameters, or you could any tool that’s capable of creating a simple HTTP(S) request, attaching a client certificate. To use the REST API, you need to pass in the client certificate that is already uploaded to your Azure Subscription.

Here are the brief steps:

Create a new self-signed Certificate, and upload to the Azure Management Portal

You can follow this article to create a new certificate. Once you execute the makecert command mentioned, the certificate will be created at user’s personal store. You need to then export the certificate as a .cer file, and then upload it to the Azure Management Portal (Settings –> Management Certificates).

Use Fiddler to generate the REST API Request, and pass the right parameters

AutoScale REST API is documented here. To set up AutoScale, we need to use the ‘Update Autoscale settings’ operation. This is a PUT request with the right parameters. URI Parameter to the endpoint would be resourceId=/virtualmachines/<cloudservice>/availabilitySets/<availability-set>. The documentation page has the structure of request body.

Let’s use Fiddler to craft this HTTPS request, with the client certificate.

Place the exported .cer file under C:\Users\<username>\Documents\Fiddler2 naming it as ‘ClientCertificate.cer’.
Run Fiddler, and you can stop the capture (File –> Capture Traffic). Else you will see all the outgoing HTTP(S) traffic from this machine on the Fiddler.
In the right hand side, choose “Composer” tab.
Use the below settings:

Setting	Value
Request Verb	PUT
URL	https://management.core.windows.net/<subscription-ID>/services/monitoring/autoscalesettings?resourceId=/virtualmachines/<cloud-service-name>/availabilitySets/<availabilitySetName>
Headers	x-ms-version: 2013-10-01 Content-Type: application/json
Body	{"Profiles":[{"Name":"Day Time","Capacity":{"Minimum":"2","Maximum":"2","Default":"2"},"Rules":[],"Recurrence":{"Frequency":"Week","Schedule":{"TimeZone":"Pacific Standard Time","Days":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"],"Hours":[9],"Minutes":[0]}}},{"Name":"Night Time","Capacity":{"Minimum":"1","Maximum":"1","Default":"1"},"Rules":[],"Recurrence":{"Frequency":"Week","Schedule":{"TimeZone":"Pacific Standard Time","Days":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"],"Hours":[18],"Minutes":[0]}}}],"Enabled":true}

Above is an example setting that would scale based on the timings. Day time (9AM to 6PM PST), it scales to 2 instances by default, and in the night time, it reduces the instance size to just 1. You could use the various other parameters documented in this page to alter the request body to choose the right setting.

Tip : You could also do a GET operation to the same endpoint, and get the scale setting. For example, since you are not able to configure it via the portal if you have a Basic VM in your Cloud service, you could create a test Cloud Service, and add standard VMs, and configure the AutoScale from the Portal. Then, do a GET operation to that using Fiddler, or any other tool, and use the JSON output in the PUT operation.

Hope this helps!

↧

Learning Azure Service Management REST API through Powershell, and Azure CLI tools

July 15, 2014, 11:06 am

≫ Next: How to Monitor for Storage Account Throttling

≪ Previous: Unable to setup AutoScale for Virtual Machines from the Azure Management Portal

Azure Service Management REST API is used by all the public tools provided by Microsoft to manage your Azure Subscription – Portal, New Portal, Powershell, X-Plat CLI. You can use the same APIs to build your own tools that would manage your Azure Subscription. Sometimes, it is little difficult to bring up all the necessary information needed to be passed on to the REST API call. MSDN documentation of all these REST APIs has all the information that you need. And, there are sometimes, you might need more help.

Azure Powershell, and Azure xplat CLI tools, both use the same REST API endpoints to manage the subscription. You could take help from them as well. They have their own debug switch that shows up the REST API endpoint, and the parameters that needs to be passed.

For example, if you want to list all the VMs, from Azure Powershell, you run Get-AzureVM. You can pass in -Debug switch to any Azure Powershell cmdlet, and that’ll emit the debug information which includes the REST API endpoint, and the parameters passed in.

Typically, you are always good in any GET operations, since you only have to pass in the right URIs. Any create/update operation is tricky, since that typically involves you sending an request body as an XML, or JSON. Try -Debug for any command, and you will be able to see the entire request payload that goes.

And, for Azure Cross Platform CLI, it is the –vv switch.

Hope this helps!

↧

How to Monitor for Storage Account Throttling

August 2, 2014, 10:54 am

≫ Next: Configuring Azure Virtual Machines for Optimal Storage Performance

≪ Previous: Learning Azure Service Management REST API through Powershell, and Azure CLI tools

For Azure Virtual Machines, we document storage limits of 500 IOPS per disk, and the guidance to not place more than 40 highly used VHDs in any single storage account since there is a 20,000 IOPS limit per storage account.

By default, each subscription can have up to 20 storage accounts containing up to 500 TB each. The limit of 20 storage accounts per subscription can be increased to up to 50 by contacting Microsoft support.

The risk of storage account throttling having a negative performance impact on your VMs exists once you have more than 40 highly used disks in a single storage account. And with more than 40 disks in a single storage account, each disk wouldn't need to be hitting the 500 IOPS limit for the entire storage account to be hitting the 20,000 IOPS limit.

Determining the Number of Disks in a Storage Account

You can get a count of disks per storage account manually by looking in the portal under Virtual Machines, then select Disks. Disks in the same storage account will have the same first label of the DNS name under the Location column. For example, the location of https://clnorthcentralusstorage.blob.core.windows.net/... indicates the disk resides in the storage account named clnorthcentralusstorage.

A simpler way to get a count of disks per storage account is with Azure PowerShell using the Get-AzureDisk cmdlet. In the example below, the clnorthcentralusstorage account has 46 disks. If all of those disks were highly utilized at the same time, the storage account would be at risk for being throttled due to the 20,000 IOPS limit for a single storage account, and some of the disks would see performance lower than the 500 IOPS per-disk limit.

Get-AzureDisk | Where-Object { $_.AttachedTo } | Group-Object {$_.Medialink.Host.Split('.')[0]} –NoElement

Count Name
----- ----
 46 clnorthcentralusstorage

Monitoring for Storage Account Throttling

To monitor for storage account throttling, in the Azure management portal, select Storage on the left, select the storage account you want to monitor on the right, the select the Configure tab and under the Monitoring section, change Blobs from Off to Minimal (Verbose will work also, but is not necessary to monitor just for storage account throttling).

Now switch to the Monitoring tab, select Add Metrics at the bottom of the page, and under Select Metrics to Monitor, select Throttling Error and Throttling Error Percentage, then click OK.

Now you can view minimum, maximum, average, and total values for those metrics for either 6-hour, 24-hour, or 7-day timeframes.

Additionally you can configure alerts to get email notification when those metrics hit a defined threshold.

To configure an alert, on the Monitor tab for a storage account, select the metric you want to alert on, then click Add Rule at the bottom of the page. Provide a display name for the alert rule, and then define the alert threshold.

Make sure to select Send an email to the service administrator and co-administrators so you are notified via email when the alert threshold is hit. Additionally, you can select Specify the email address for another administrator if you want to send to an email address other than one for the service admin and co-admins for the subscription.

Once the rule is created, you can click through on 1 rule configured on the far right column, or go to Management Services in the left navigation pane of the portal and select Alert, then select the rule to view the alert details and recent alert history (last 20 occurrences).

When the metric hits the defined threshold and triggers the alert, if email notification is enabled on the alert, you will get an email with a subject line similar to this:

[ALERT ACTIVATED] - PercentThrottlingError GreaterThan 0 (percentage) in the last 60 minutes

And when the metric falls below the threshold, you will get an email indicating the alert was resolved:

[ALERT RESOLVED] - PercentThrottlingError GreaterThan 0 (percentage) in the last 60 minutes

Here is an example of the mail when an alert is activated:

More Information

Azure Subscription and Service Limits, Quotas, and Constraints
Create Alert Rule (API reference for programmatic creation of alert rules)
SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency
Performance Best Practices for SQL Server in Azure Virtual Machines
Virtual Machine and Cloud Service Sizes for Azure

↧

Configuring Azure Virtual Machines for Optimal Storage Performance

October 14, 2014, 11:34 am

≫ Next: Recover Azure VM by attaching OS disk to another Azure VM

≪ Previous: How to Monitor for Storage Account Throttling

In support, one of the most common questions we get is: How do I achieve the best disk performance for Azure virtual machines?

Platform Planning:

In the standard tier of virtual machine in Azure, the maximum IOPS is 500 per disk. When planning for a high I/O virtual machine, you also need to take into consideration the Azure storage accounts’ 20,000 total request rate limit. Therefore, we should not place more than 40 highly utilized VHDs in a storage account (20,000/500 = 40 VHDs). Azure storage is charged on a per-usage basis; you only pay for what you use. Therefore, you should always attach the maximum data disks allowed for the virtual machine size (Get-AzureRoleSize shows a MaxDataDiskCount property for each size available). The data disks should always be the maximum size allowable: one terabyte. Locally Redundant Storage is the cheapest and highest performing choice for your storage account.

Summary of Microsoft Support’s recommendation for the Azure platform:

Plan for a separate single storage account for each high I/O Azure virtual machine

While you can safely have up to 40 VHDs per storage account, breaking each virtual machine into its own storage account allows for easier administration. You can have a maximum of 100 storage accounts per subscription

Locally Redundant Storage account setting
Attach the maximum data disks allowed for the virtual machine size
Data disks should always be the maximum allowable size (one terabyte)
Setup monitoring for storage account throttling on Azure platform

Virtual Machine Planning:

First, you need to use Storage Spaces inside your virtual machine in order to glue the disks together, overcoming the 500 IOPS per disk limitation of the Azure platform. The best Storage Spaces configuration we can use for performance is the simple storage spaces configuration. Next you need to set the number of columns, the number of columns specifies how many physical disks data is striped across, and has an impact on performance regardless of stripe and block size. For optimum performance, the number of columns must equal the number of attached data disks in Azure.

Second, you need to consider the interleave of the Storage Space. To maximize performance, ensure that the interleave value used by the storage space is at least as large as the I/Os of your workload. I/Os that exceed the interleave size are split into multiple stripes, turning one write into multiple writes, reducing performance. In the case of database management systems, we would want to set the space interleave to be 65536 (64 KB).

Finally, we need to consider the cluster size of the Storage Spaces virtual disk. The appropriate value for most installations should be 65536 (64 KB) for partitions on which databases or log files reside.

Summary of Microsoft Support’s recommendation for an Azure virtual machine:

Plan your IaaS virtual machine to use Storage Spaces
Configure the Storage Space to use all attached data disks
Set the number of columns in the Storage Space to match the number of attached data disks in Azure
Set the Storage Space interleave at least as large as the I/Os of your workload; for SQL use 65536 (64 KB)
Format the partition with a 65536 (64 KB) cluster size

Testing the I/O of disk:

A great blog to read for this question is by my co-worker, Jose Barreto. He covers this question in great depth: Using file copy to measure storage performance – Why it’s not a good idea and what you should do instead

Run the actual workload
Run a workload simulator. To simulate running SQL Server databases, you should try the SQLIO tool

Want a PowerShell script to do all of this work for you?

Automate the creation of an Azure VM preconfigured for max storage performance

Recommend further reading:

Want to reconfigure your VM’s into their own storage account?
https://gallery.technet.microsoft.com/scriptcenter/Azure-Virtual-Machine-Copy-1041199c

Count VHDs Per Storage Account
https://gallery.technet.microsoft.com/scriptcenter/Azure-Count-VHDs-Per-6a44ecbd

Performance Best Practices for SQL Server in Azure Virtual Machines
http://msdn.microsoft.com/en-us/library/azure/dn133149.aspx

Recommendations and Best Practices When Deploying SQL Server AlwaysOn Availability Groups in Microsoft Azure (IaaS)
http://blogs.msdn.com/b/alwaysonpro/archive/2014/08/29/recommendations-and-best-practices-when-deploying-sql-server-alwayson-availability-groups-in-windows-azure-iaas.aspx

↧

Recover Azure VM by attaching OS disk to another Azure VM

November 20, 2014, 1:43 am

≫ Next: Troubleshooting Windows activation failures on Azure VMs

≪ Previous: Configuring Azure Virtual Machines for Optimal Storage Performance

If you are unable to connect to an Azure VM with RDP or SSH even after restarting and resizing the VM you can use the following steps to make it accessible again.

Scripted Recovery Steps

[Updated 2/5/2015] The scripted recovery method is now included as part of the Microsoft Azure IaaS (Windows) diagnostics package.

When you run that package, on the Which of the following issues are you experiencing with your Azure VM? dialog, select Attempt system recovery using a recovery VM to perform this recovery method.

The Attempt system recovery using a recovery VM option in the diagnostics package automates the manual steps that are provided later in this article. This option is for Windows VMs only. For steps to recover Linux VMs, see the Manual Recovery Steps section below.

Overview of "Attempt system recovery using a recovery VM" option in the Microsoft Azure IaaS (Windows) diagnostic package

A backup is created of the problem VM’s OS disk in the same storage account. The backup will be located in a new container called <cloud service name>-<VM name>-backupvhds. The name of the file will be <VM name>-osdisk.vhd. If any errors are encountered, the script will exit.
A recovery VM is created in the same cloud service as the problem VM. This ensures the external IP address (virtual IP or VIP) does not change. The recovery VM will be named recoveryvm#####. If the subscription has sufficient available cores, it will be created as a large VM so the Chkdsk operation completes more quickly. If there are not sufficient cores available in the subscription for a large VM, a small VM is used instead.
The problem VM’s metadata is exported locally to be used later to recreate the VM with the same configuration. This includes configuration for data disks, subnet, endpoint (including load-balanced endpoints), static virtual network IP address (if configured) and VM agent extensions.
The problem VM is removed, keeping the OS and data disks as-is for recovery and recreation of the VM later in the script.
The problem VM’s OS disk is attached as a data disk to the recovery VM.
A remote PowerShell connection is made to the recovery VM. Chkdsk /F is run on the problem VM’s OS disk. The \Windows\System32\Config\SYSTEM registry hive is moved to SYSTEM_ORG, and \Windows\System32\Config\Regback\SYSTEM is copied to \Windows\System32\Config\SYSTEM. Windows automatically creates registry hive backups to \Regback every 10 days, so that step is restoring just the System hive of the registry with a version that is up to 10 days old. It then uses Bcdedit to make sure the default setting is configured so the VM will not boot to the Windows Recovery Environment (WinRE) since interacting with an Azure VM is not possible at that stage of the boot process. Also, the Windows Boot Loader section of the Bcdedit /enum output is checked, and if either device or osdevice properties show as unknown, they are set back to boot.
The problem VM’s OS disk is detached from the recovery VM, then the problem VM is recreated using the metadata that was exported earlier.
The recovery VM is automatically removed.

Manual Recovery Steps

Be aware of the following when using these steps to recover your VM:

It is recommended that you first backup the problem VM’s OS disk and data disk as a precautionary step. You can use one of the available Azure Storage Explorer tools, the AzCopy tool, or Azure PowerShell’s Start-AzureStorageBlobCopy cmdlet to create backups of the VHD files.
The D: drive, by default, is the temporary storage drive and is reset when an Azure VM is resized, put in Stopped Deallocated (via portal Shutdown), or recreated from the same disk as in the steps below. See also About Virtual Machine Disks in Azure.
The external IP (Virtual IP or VIP) will change if you recreate the VM without keeping another VM running in the same cloud service. To prevent this, keep another VM running in the cloud service while performing the steps.
The internal IP address will change when recreating the VM if it is not in a virtual network. And even for a VM in a virtual network, the internal IP address may change after recreating the VM if the IP address it previously had, has been taken by another VM.
Write down or copy and paste the disk names in the Disk column at the bottom of the Dashboard for the VM before recreating it as in the steps below. You will need to remember which disks belong to that VM so you recreate the VM with the same disks.

You can troubleshoot the VM by attaching the OS disk as a data disk to another Azure VM using the steps below.

Create a new VM in the same cloud service as the problem VM, using a gallery image of the same OS version as the problem VM. You will use this VM temporarily for troubleshooting.

For example if the problem VM is Windows Server 2012 R2, create the troubleshooting VM from the Windows Server 2012 R2 gallery image. Similarly, if the problem VM is Ubuntu 14.04 LTS, create a troubleshooting VM from the 14.04 LTS gallery image.

To verify the cloud service of the problem VM, select Virtual Machines in the management portal, select the problem VM, select Dashboard, then under Quick Glance on the right, the first part of the DNS Name is the name of the cloud service.

For example, in the screenshot below, the DNS Name is clnov8ws12r2a.cloudapp.net, so the cloud service name is clnov8ws12r2a.
After creating a new VM in the same cloud service as the problem VM, select Virtual Machines on the left, click the problem VM on the right, then click Dashboard.
Make note of the OS disk name in the Disks section at the bottom of the dashboard, since you will be using it later to recreate the VM. The disk name is under the Disk column on the far left.
Click Delete at the bottom right of the page, then click Keep the attached disks. This is necessary so the OS disk is not in use and can be attached to another VM.

Click Yes on the prompt asking if you want to continue, which explains that The attached disks and their VHD files won't be deleted from your storage account.
Click Virtual Machines on the left, and click the Disks tab at the top right.

Find the disk name from Step 2, and wait for the Attached To column to be blank. This can take up to 10 minutes after deleting the VM, though usually it will be much faster than that.
Click Virtual Machines on the left, and select the troubleshooting VM that you will use to attach the OS disk of the problem VM. Select Dashboard, then select Attach and then Attach Disk at the bottom of the dashboard.
In the Attach a disk to the virtual machine dialog, select Available Disks and choose the disk from the problem VM (you made note of the disk name in Step 2). Leave the Host Cache Preference on the default setting of None, and click OK.

If you do not see the disk here, either this troubleshooting VM is in a different location than the problem VM (i.e. in West US and the problem VM is in East US), or the disk has not yet been freed up for reuse and the Attached To column still shows the problem VM name instead of being blank.
When the disk is attached to the second VM you will see a message in the portal Successfully attached disk <disk name> to virtual machine <name of troubleshooting VM>.
Click Connect to make an RDP connection to the troubleshooting VM. Or if the troubleshooting VM is Linux, create an SSH connection to it.
For Linux VMs, skip to Step 16, for Windows, in the troubleshooting VM, go to Start, Search, type diskmgmt.msc<enter> to open the Disk Management tool.
If the disk you just added shows up as Offline, right-click it and select Online. Most Azure VMs will be configured to automatically online new disks so this may not be necessary.
The examples below reference F: for the drive letter. Please replace that with the appropriate drive letter if the largest (or only) partition on the OS disk from the problem VM (now attached as a data disk to the recovery VM) was assigned a different drive letter.

Note: Make sure you are running the commands from an elevated CMD prompt, not a PowerShell prompt, else the first Bcdedit command will fail.

Run Chkdsk to resolve possible file system consistency issues:

chkdsk F: /F

Run Bcdedit to make sure Windows will not boot to the Windows Recovery Environment (WinRE) since interacting with an Azure VM at that stage of the boot process is not possible.

bcdedit /store F:\boot\bcd /set {default} recoveryenabled Off

bcdedit /store F:\boot\bcd /enum

Important: If the above bcdedit /enum command shows device and osdevice to be unknown under the Windows Boot Loader section, run the following additional Bcdedit commands.

bcdedit /store F:\boot\bcd /set {default} osdevice boot

bcdedit /store F:\boot\bcd /set {default} device boot

Use the following commands to backup the existing SYSTEM registry hive and then revert it to the \Regback\SYSTEM copy. Windows automatically creates registry hive backups to \Regback every 10 days, so the steps below are restoring just the SYSTEM hive of the registry with a version that is up to 10 days old:

dir F:\Windows\System32\Config\RegBack\SYSTEM

Important: if the file size for F:\Windows\System32\Config\RegBack\SYSTEM is 0 bytes, do not run the move and copy commands below, but continue with the remaining steps.

move F:\windows\system32\config\system F:\windows\system32\config\system_org

copy F:\windows\system32\config\Regback\system F:\windows\system32\config\system
Back in Disk Management, right-click the disk from the problem VM and select Offline.
For Windows VMs, skip to Step 21. For Linux, in the troubleshooting VM, run the following commands:

sudo fdisk -l

ls /dev/sdc*

A series of items will be returned that say /dev/sdcX, where X will be a number.
Run mount | grep sdc
If any lines are returned by Step 17, run sudo unmount /dev/sdcX where X is the number shown on the line from Step 17.
Run mount | grep sda which will return output to similar to this:

/dev/sda1 on /type ext4 (rw,discard)

The highlighted portion is the file system that is in use by this OS.
For each of the items in Step 16 run sudo fsck -t X /dev/sdcY where X is the highlighted value from Step 19 and Y is the number from Step 16.

If you had additional data disks attached you may need to recover them individually. They may have different file systems depending on how they were created.
The remaining steps apply both to Windows and Linux.

Back in the portal, on the Dashboard for the troubleshooting VM, select Detach to detach the disk.
Now recreate the problem VM by creating a new VM using that same OS disk that you just repaired. Start the VM and RDP or SSH to the VM. If any data disks were previously attached to the problem VM, you can attach them again now.
If your VM is still not recovered, you will have to rebuild your VM and import your application-specific data (if any) into the new VM by following these steps:

Follow Steps 1-11 to get the old OS disk attached as a data disk to a new VM.

Copy your application-specific data from the old OS disk to the new VM.

↧

Troubleshooting Windows activation failures on Azure VMs

December 23, 2014, 1:36 pm

≫ Next: Unable to connect to multinic Linux VM

≪ Previous: Recover Azure VM by attaching OS disk to another Azure VM

[Update 12/07/2015] Windows VMs should be configured with the KMS client setup key for the version of Windows being used, and have connectivity to port 1688 at kms.core.windows.net in order to activate successfully.

If you are using site-to-site VPN with forced tunneling, please see Use Azure custom routes to enable KMS activation with forced tunneling.

If you are using ExpressRoute with a default route published, please see Azure VM may fail to activate over ExpressRoute.

If your Azure VM experiences Windows activation failures, please try the following steps to resolve the issue. An example of an error message you may see is:

Error(s): Activating Windows(R), ServerDatacenter edition
Error: 0xC004F074 The Software Licensing Service reported that the computer could not be activated. No Key Management Service (KMS) could be contacted. Please see the Application Event Log for additional information.

Note that when the grace period has expired and Windows is still not activated, Windows Server 2008 R2 and later versions of Windows will show additional notifications about activating, the desktop wallpaper remains black, and Windows Update will install security and critical updates only, but not optional updates. See also the Notifications section at the bottom of the Licensing Conditions TechNet page.

Steps to Troubleshoot Activation

Download and extract the Psping tool to a local folder in the VM that is failing to activate.

http://technet.microsoft.com/en-us/sysinternals/jj729731.aspx

To download the file, first go to Server Manager, Configure this local server, select IE Enhanced Security Configuration, and select Off under Administrators.
Go to Start, search on Windows PowerShell, right-click Windows PowerShell and select Run as administrator.
Make sure the VM is configured to use the Azure KMS server by running the following command. This is set at VM creation, so running this command is just a troubleshooting step to make sure the proper configuration is set.

iex "$env:windir\system32\cscript.exe $env:windir\system32\slmgr.vbs /skms kms.core.windows.net:1688"

The command should return:

Key Management Service machine name set to kms.core.windows.net:1688 successfully.
Verify with Psping that you have connectivity to the KMS server. Switch into the folder where you extracted the Pstools.zip download, then run:

.\psping.exe kms.core.windows.net:1688

In the second-to-last line of the output, make sure you see:

Sent = 4, Received = 4, Lost = 0 (0% loss)

If Lost is greater than 0, the VM does not have connectivity to the KMS server. In that case, if the VM is in a virtual network and has a custom DNS server specified, you must make sure that DNS server is able to resolve kms.core.windows.net. Or, change the DNS server to one that does resolve kms.core.windows.net. Note that if you remove all DNS servers from a virtual network, VMs will then use Azure's internal DNS service, which is able to resolve kms.core.windows.net.

Aside from DNS issues, verify the guest firewall has not been configured in such a way that would block activation attempts.
After verifying successful connectivity to kms.core.windows.net, run the following command from that elevated PowerShell prompt. This command attempts activation multiple times in a row.

1..12 | % { iex "$env:windir\system32\cscript.exe $env:windir\system32\slmgr.vbs /ato" ; start-sleep 5 }

Successful activation will return:

Activating Windows(R), ServerDatacenter edition (12345678-1234-1234-1234-12345678) ...
Product activated successfully.
If activation still failed, and the VM is running Windows Server 2012 R2 Datacenter, Standard, or Essentials, try the command below for the specific SKU. You can verify the OS version by going to Start, searching on Msinfo32, double-clicking Msinfo32.exe, and looking at the OS Name in the right pane.

For Windows Server 2012 R2 Datacenter, run the following from the elevated PowerShell prompt:

iex "$env:windir\system32\cscript.exe $env:windir\system32\slmgr.vbs /ipk W3GGN-FT8W3-Y4M27-J84CP-Q3VJ9"

For Windows Server 2012 R2 Standard, run the following from the elevated PowerShell prompt:

iex "$env:windir\system32\cscript.exe $env:windir\system32\slmgr.vbs /ipk D2N9P-3P6X9-2R39C-7RTCD-MDVJX"

For Windows Server 2012 R2 Essentials, run the following from the elevated PowerShell prompt:

iex "$env:windir\system32\cscript.exe $env:windir\system32\slmgr.vbs /ipk KNC87-3J2TX-XB4WP-VCPJV-M4FWM"

After entering the specific command above for the SKU of Windows Server 2012 R2 the VM is using, try activating again:

iex "$env:windir\system32\cscript.exe $env:windir\system32\slmgr.vbs /ato"
At this point if you are still unable to activate the VM, check the Application event log for events from source Microsoft-Windows-Security-SPP to help understand why activation is failing.

If you are still unable to activate after attempting the above steps, let us know in the comments or contact Azure support.

↧

Unable to connect to multinic Linux VM

February 10, 2015, 2:24 pm

≫ Next: Traffic Manager and Azure SLB - Better Together!

≪ Previous: Troubleshooting Windows activation failures on Azure VMs

When creating an Azure VM with multiple NICs and subnets, the guest's default gateway is not automatically set. This may cause loss of SSH connectivity as the default gateway is not assigned to the NIC that is listening for incoming traffic.

Troubleshooting

To access the VM, create a second VM in the same cloud service. Connect to the second VM, and from the second VM, connect to the internal IP address of the problem VM. Once connected to the problem VM you can view the dmesg, kern.log, messages or boot.log file to verify the default gateway configuration.

Ubuntu

The example below shows an Ubuntu VM with four NICs. The default gateway is denoted by UG and is assigned to eth2 and not eth0.

Restarting the VM may only temporarily resolve this issue if the default gateways gets assigned to the correct interface.

To resolve the issue on Ubuntu 14.x, create a script on the VM to configure the default gateway correctly. The script will be executed at next restart and assign the default gateway to the correct network. In this example, eth0 is used as the default gateway which maps to the network 10.0.0.1.

vi /etc/network/if-up.d/defaultgw

#!/bin/sh

route delete default gw 10.0.1.1

route delete default gw 10.0.2.1

route delete default gw 10.0.3.1

route add default gw 10.0.0.1

To set the correct privileges on the file, run the following command:

chmod 755 /etc/network/if-up.d/defaultgw

After restarting the VM the default gateway is assigned to the correct interface and SSH connections should succeed when connecting through the external (VIP) endpoint.

You can verify the default gateway configuration with netstat -rn or route -n.

CentOS 6.5

On a CentOS 6.5 VM, append the directive of GATEWAY in the /etc/sysconfig/network file.

For example:

HOSTNAME=myvmname
NETWORKING=yes
GATEWAY=10.0.0.1

See the following articles for more information on the multinic feature:

Multiple VM NICs and Network Virtual Appliances in Azure
http://azure.microsoft.com/blog/2014/10/30/multiple-vm-nics-and-network-virtual-appliances-in-azure/

Create a VM with Multiple NICs
https://msdn.microsoft.com/en-us/library/azure/dn848315.aspx

For additional resources for Linux on Azure topics please read here

↧

Traffic Manager and Azure SLB - Better Together!

April 10, 2015, 1:19 pm

≫ Next: Keeping Azure PowerShell Current

≪ Previous: Unable to connect to multinic Linux VM

This post was contributed by Pedro Perez

Can Traffic Manager coexist with Azure Load Balancer? And how do we keep session affinity with them? Yes they can coexist, and in fact it is a good idea to use both together. That is because Traffic Manager (TM) is a global load balancer (i.e. DNS load balancing) and the Azure Load Balancer (Azure LB) is a local load balancer.

Traffic Manager will failover or balance (whatever is your choice) between different endpoints that would ideally be located on different datacenters or regions. Azure LB, on the other hand, will balance between instances in the same Cloud Service, handling the traffic itself. Consider it as two different tiers of infrastructure/application logic. Traffic Manager could be used to reduce latency (e.g. resolving to the nearest datacenter) or disaster recovery(DR)/failover and Azure Load Balancer could be used for redundancy and scalability. Traffic Manager monitors the cloud service endpoints and Azure Load Balancer monitors the instances inside the cloud services. This way we make sure we’re sending traffic to a healthy server at any time.

As an example, TM will resolve the queries for www.contoso.com to the Cloud Service IP in the appropriate datacenter, so when the client contacts the Cloud Service on the endpoint’s port (e.g. 80) it’ll be effectively hitting Azure Load Balancer, which will make the decision to send the request to one of the VMs/instances and will keep a “source IP affinity” entry in a table.

In the above scenario we have the same application deployed in two different datacenters and we have one Cloud Service per datacenter (DC) that contains two VMs (instances). The application is deployed to all the 4 servers. The application needs to keep track of the client’s sessions so we need to use affinity. We decide to use North Europe DC as our primary datacenter, so we will direct all of our users to the endpoint in that DC, while we keep the Western Europe DC deployment as a DR solution. There’s no need for affinity between DCs because we will just be using one at a time.

You can choose the Traffic Manager’s load balancing method while creating the profile:

PS C:\>New-AzureTrafficManagerProfile -Name "ContosoTMProfile" -DomainName "wwwcontoso.trafficmanager.net" -LoadBalancingMethod "Failover"

Or change it at any time:

PS C:\> Get-AzureTrafficManagerProfile -Name "ContosoTMProfile" | Set-AzureTrafficManagerProfile -LoadBalancingMethod "Failover"

In order to enable session affinity on an endpoint you can also use PowerShell, specifying either sourceIP for a 2 tuple, sourceIPProtocol for a 3 tuple or none if you want to disable session affinity:

PS C:\> Set-AzureLoadBalancedEndpoint -ServiceName "ContosoWebSvc" -LBSetName "LBContosoWeb" -Protocol tcp -LocalPort 80 -ProbeProtocolTCP -ProbePort 8080 –LoadBalancerDistribution "sourceIPProtocol"

The traffic flow of a request to our application will go like this:

A client resolves www.contoso.com through their DNS server, which is a CNAME pointing to wwwcontoso.trafficmanager.net, so it has to ultimately talk to Microsoft’s root DNS servers to get the underlying IP
In the meanwhile, Traffic Manager constantly probes against your configured endpoints to determine their health and then updates Microsoft’s DNS servers accordingly. More information about this procedure could be found in http://blogs.msdn.com/b/kwill/archive/2013/09/17/windows-azure-traffic-manager-performance-impact.aspx
The client gets the resolved IP address from Microsoft’s root DNS servers.
The client sends a request to the resolved IP address, which happens to be in the North Europe datacenter.
The request hits the Azure Load Balancer. As it is a new connection it has to check the affinity table, but there’s nothing for this 2-tuple or 3-tuple. It then decides to send the traffic to one of the servers and populates the affinity table with the client’s IP address, the endpoint’s IP and the endpoint’s port.

During the whole TCP session, incoming packets won’t be checked against the affinity table because they belong to the same connection. However, any new incoming TCP session will be checked against the table in order to be able to provide a session layer for the application. If the client is caching the DNS resolution, any subsequent connections will skip steps 1-3 and go straight to step 4. The Azure Load Balancer will match the connection in the table and choose the destination server based on that match.

Azure Load Balancer’s affinity feature has a couple of characteristics to take into account when architecting your application:

a) Affinity is configurable using a 2 tuple (Source IP, Destination IP) or 3 tuple (Source IP, Destination IP, Destination Port), but as you can see it is always dependent on the Source IP in order to identify the client. If the users of the application are behind a proxy (not an uncommon scenario for an enterprise app) then we’re basically sending all our traffic to only one server.

b) As of today, when a new server is added or a server is removed from the Cloud Service, the affinity table gets wiped out and any affinity is lost. That restricts scalability changes to out of hours / low traffic hours.

It may be a good idea consider building a common session cache layer (or use Session State Providers) accessible by all the webservers. This way your application doesn’t depend on affinity, the LB does actually balance equally among servers (due to the lack of affinity!) and we can scale whenever we need to. Another option available is to use a load balancing Network Virtual Appliance among the available ones in Azure, and configure it to use cookie affinity. There are various vendors in the Gallery in Azure Portal, please check them out. Please keep in mind there could be caveats with this approach (e.g. source NAT is probably needed to make it work inside Azure), so please make sure you understand these and how they affect your application.

↧

Keeping Azure PowerShell Current

April 12, 2015, 2:35 pm

≫ Next: What is the IP address 168.63.129.16?

≪ Previous: Traffic Manager and Azure SLB - Better Together!

I have been supporting Azure for 7 years now and one of the constants is the rapid pace of change of the services offered. This means that not only do you have to continually stay abreast of the most recent changes, but you also have to make sure that your tools stay updated, too. The Visual Studio team handles this by having the Visual Studio IDE check for updates on start up. Since PowerShell doesn't have an IDE per se, it isn't quite that simple. One of my Windows on Azure teammates, Adam Conkle, worked up a simple script that scraped the GitHub site and then pulled down the release package. It worked OK, but it's reliance on scraping the page and the method it used to pull down the installer weren't overly robust. I was playing around one day and ran across some code showing that you could call into WebPI programmatically, so decided to see if I could build something more robust.

First, I had to figure out how to track the releases. As it turned out, GitHub's general structure made this pretty easy. All I had to do was issue a GET against https://api.github.com/repos/Azure/azure-powershell/releases and then look at the name value in the JSON that was returned to get the version number:

#GitHub URL for Azure PowerShell
 $url = "https://api.github.com/repos/Azure/azure-powershell/releases"

 #go get the current version on GitHub
 $results= (Invoke-RestMethod $url)
 $gitVer = $results[0].name
 $gitVerSimple=$gitVer.Replace(".","")

I should also note that PowerShell makes this really easy because it automatically converts the JSON result into an object with properties. If you really wanted to get fancy, you could also grab the "body" property to get the releases notes, but I decided not to do that in this case.

Once I had the version number from GitHub, the next step was to grab the version number from the locally installed version of PowerShell. Again, PowerShell makes this very easy:

#get the local version
 $AzurePowerShell="C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement\Azure\Azure.psd1"

 #if the module isn't installed, we will end up with the default of 0
 if (Test-Path $AzurePowerShell)
 {
 Import-Module $AzurePowerShell
 $currVer = (Get-Module Azure).Version.ToString()
 Write-Host "Current version is $currVer"
 "Current version is $currVer" | Out-File -FilePath "$TARGETDIR\History.txt" -Append
 }
 else
 {
 Write-Host "Azure PowerShell not currently installed"
 "Azure PowerShell not currently installed" | Out-File -FilePath "$TARGETDIR\History.txt" -Append
 }

Doing some simple manipulation of the text strings, I could then do a simple comparison between the two versions. The next step at this point was taking advantage of WebPI to prevent me from having to figure out the installer file, manually pull it down, and then installing it. WebPI is nice in that you can search against it to find the particular products you want. In this case, I only wanted the Azure PowerShell cmdlets, so filtered my results using "WindowsAzurePowerShellOnly""WindowsAzurePowershell" (changed in May 2015) "WindowsAzurePowerShellGet" (changed October 2015).

 #load up the assembly need to do the WebPI work
 [reflection.assembly]::LoadWithPartialName("Microsoft.Web.PlatformInstaller") | Out-Null
 $ProductManager = New-Object Microsoft.Web.PlatformInstaller.ProductManager
 $ProductManager.Load()

 #NOTE: Here's a handy way to visually see all the possibilities
 #$product = $ProductManager.Products | Where-Object { $_.ProductId -like "*PowerShell*" } | Select Title, ProductID | Out-GridView
 $product=$ProductManager.Products | Where { $_.ProductId -eq "WindowsAzurePowerShell" }
 $InstallManager = New-Object Microsoft.Web.PlatformInstaller.InstallManager
 $c = get-culture
 $Language = $ProductManager.GetLanguage($c.TwoLetterISOLanguageName)
 $installertouse = $product.GetInstaller($Language)
 $installer = New-Object 'System.Collections.Generic.List[Microsoft.Web.PlatformInstaller.Installer]'
 $installer.Add($installertouse)
 $InstallManager.Load($installer)

Each product has an installer that you can then pass to the WebPI installer engine and tell it to download and run the installer:

 #now that we have WebPI set up, go ahead and do the actual install
 $failureReason=$null
 foreach ($installerContext in $InstallManager.InstallerContexts) {
 $downloadresult=$InstallManager.DownloadInstallerFile($installerContext, [ref]$failureReason)
 Write-Host "Download result for $($installerContext.ProductName) : $downloadresult"
 "Download result for $($installerContext.ProductName) : $downloadresult" | Out-File -FilePath "$TARGETDIR\History.txt" -Append
 $InstallManager.StartSynchronousInstallation()
 if ($installerContext.ReturnCode.Status -eq "Success")
 {
 Write-Host "Upgrade complete. Log can be found at $($installerContext.LogFileDirectory)" -ForegroundColor Yellow
 "Upgrade Complete. Log can be found at $($installerContext.LogFileDirectory)" | Out-File -FilePath "$TARGETDIR\History.txt" -Append
 }
 else
 {
 Write-Host "Install failed with error $($installerContext.ReturnCode.Status). Log can be found at $($installerContext.LogFileDirectory)" -ForegroundColor Red
 "Install failed with error $($installerContext.ReturnCode) Log can be found at $($installerContext.LogFileDirectory)" | Out-File -FilePath "$TARGETDIR\History.txt" -Append
 }
 }

I did run into some trouble here at first in that the WebPI installer engine needs to be run as an administrator, so I did have to add some logic to the script to re-launch under an administrative instance of PowerShell if the initial shell wasn't run under that context. I should mention that when I first wrote this script, I defaulted to checking for administrative context when starting the script. However, I quickly got tired of having to acknowledge the request to run with administrative context every day. At that point, I moved the administrative context check to inside the version checking IF statement. This means that I only every have to interact with the script if there is actually an update to install.

Lastly, I added some basic logic to create a folder under MyDocuments to log the daily check, plus create a Scheduled Task to run the script every day. With all this in place, I can now rest assured that I will always be running the latest version of PowerShell. In fact, there are times now that my script detects the updated release on GitHub before I get the internal email notifying me about the new release!

To make it easy to share, I published the whole script at the Script Center (https://gallery.technet.microsoft.com/scriptcenter/Keep-Azure-PowerShell-478ab7ed). Feel free to borrow or improve on the script as you see fit.

↧

What is the IP address 168.63.129.16?

May 18, 2015, 3:51 pm

≫ Next: Azure DNS Server Redundancy

≪ Previous: Keeping Azure PowerShell Current

The IP address 168.63.129.16 is a virtual public IP address that is used to facilitate a communication channel to internal platform resources for the bring-your-own IP Virtual Network scenario. Because the Azure platform allow customers to define any private or customer address space, this resource must be a unique public IP address. It cannot be a private IP address as the address cannot be a duplicate of address space the customer defines. This virtual public IP address facilitates the following things:

Enables the VM Agent to communicating with the platform to signal it is in a “Ready” state
Enables communication with the DNS virtual server to provide filtered name resolution to customers that do not define custom DNS servers. This filtering ensures that customers can only resolve the hostnames of their deployment.
Enables monitoring probes from the load balancer to determine health state for VMs in a load balanced set
Enables PaaS role Guest Agent heartbeat messages

The virtual public IP address 168.63.129.16 is used in all regions and will not change. Therefore, it is recommended that this IP be allowed in any local firewall policies. It should not be considered a security risk as only the internal Azure platform can source a message from that address. Not doing so will result unexpected behavior in a variety of scenarios.

Additionally, traffic from virtual public IP address 168.63.129.16 that is communicating to the endpoint configured for a load balanced set monitor probe should not be considered attack traffic. In a non-virtual network scenario, the monitor probe is sourced from a private IP.

↧

Azure DNS Server Redundancy

May 18, 2015, 3:54 pm

≫ Next: Use Azure custom routes to enable KMS activation with forced tunneling

≪ Previous: What is the IP address 168.63.129.16?

Customers may observe that their PaaS role instances and IaaS virtual machines are only issued one DNS server IP address by DHCP. This does not mean that name resolution in Azure has a single point of failure however.

The Azure DNS infrastructure is highly redundant. The IP address that is exposed to the customer virtual machine is a virtual IP address in the Azure platform. That virtual IP address maps to a cluster of DNS servers in the same region that are behind a load balanced IP so a failure of any particular server is not a concern. In the event a DNS server cluster in the region fails, the virtual IP address exposed to customers will fail over to a DNS server cluster in a nearby region. The only impact of such a failure to customers will be a slight increase in latency.

↧

Use Azure custom routes to enable KMS activation with forced tunneling

May 20, 2015, 6:26 pm

≫ Next: Windows Server Failover Cluster on Azure IAAS VM – Part 1 (Storage)

≪ Previous: Azure DNS Server Redundancy

Previously, if customers enabled forced tunneling on their subnets, OS activations prior to Windows Server 2012 R2 would fail as they could not connect to the Azure KMS server from their cloud service VIP. However, thanks to the newly released Azure custom route feature this is no longer the case. The custom route feature can be used to route activation traffic to the Azure KMS server via the cloud service public VIP which then enables activations to succeed.

In order to configure this, use the Set-AzureRoute command (this is only doable via Azure PowerShell version 0.9.1 and higher) to add an entry for the prefix 23.102.135.246/32 and specify the next hop type as “Internet”.

Example:

PS C:\> $r = Get-AzureRouteTable -Name "WUSForcedTunnelRouteTable"
PS C:\> Set-AzureRoute -RouteName "To KMS" -AddressPrefix 23.102.135.246/32 -NextHopType Internet -RouteTable $r

Name     : WUSForcedTunnelRouteTable
Location : West US
Label    : Routing Table for Forced Tunneling
Routes   :
Name                 Address Prefix    Next hop type        Next hop IP address
----                 --------------    -------------        -------------------
defaultroute         0.0.0.0/0         VPNGateway
to kms               23.102.135.246/32 Internet

The highlighted entry is the new route added.

As you can see in my example VM, the Guest OS activates successfully.

We can also create TCP connections to the KMS server:

C:\Users\dave>psping kms.core.windows.net:1688

PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 23.102.135.246:1688:
5 iterations (warmup 1) connecting test:
Connecting to 23.102.135.246:1688 (warmup): 39.32ms
Connecting to 23.102.135.246:1688: 35.94ms
Connecting to 23.102.135.246:1688: 36.21ms
Connecting to 23.102.135.246:1688: 36.23ms
Connecting to 23.102.135.246:1688: 38.57ms

TCP connect statistics for 23.102.135.246:1688:
Sent = 4, Received = 4, Lost = 0 (0% loss),
Minimum = 35.94ms, Maximum = 38.57ms, Average = 36.74ms

As expected, attempting to connect to other Internet resources fails due to forced tunneling (or may go out via an on-prem gateway):

C:\Users\dave>psping www.msn.com:80
PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 204.79.197.203:80:
5 iterations (warmup 1) connecting test:
Connecting to 204.79.197.203:80 (warmup): This operation returned because the timeout period expired.
Connecting to 204.79.197.203:80: This operation returned because the timeout period expired.
Connecting to 204.79.197.203:80: This operation returned because the timeout period expired.
Connecting to 204.79.197.203:80: This operation returned because the timeout period expired.
Connecting to 204.79.197.203:80: This operation returned because the timeout period expired.

TCP connect statistics for 204.79.197.203:80:
Sent = 4, Received = 0, Lost = 4 (100% loss),
Minimum = 0.00ms, Maximum = 0.00ms, Average = 0.00ms

↧

Windows Server Failover Cluster on Azure IAAS VM – Part 1 (Storage)

June 17, 2015, 8:36 pm

≫ Next: Building Windows Server Failover Cluster on Azure IAAS VM – Part 2 (Network)

≪ Previous: Use Azure custom routes to enable KMS activation with forced tunneling

Hello, cluster fans. This is Mario Liu and I am a Support Escalation Engineer on the Windows High Availability team in Microsoft CSS Americas. I have a good news for you that starting in April 2015, Microsoft will support Windows Server Failover Cluster (WSFC) on Azure IAAS Virtual Machines. Here is the supportability announcement for Windows Server on Azure VMs:

Microsoft server software support for Microsoft Azure virtual machines
https://support.microsoft.com/en-us/kb/2721672

The Failover Cluster feature is part of that announcement. The above knowledge base is subject to change once more improvements for WSFC on Azure IAAS VMs are made. Please check the above link for the latest updates.

Today, I’d like to share the main differences when you deploy WSFC on-premises as compared to within Azure. First, the Azure VM operating system must be Windows Server 2008 R2, Windows Server 2012, or Windows Server 2012 R2. Please note that both Windows Server 2008 R2 and 2012 both require this hotfix to be installed.

At a higher level, the Failover Cluster feature does not change inside the VM and is still a standard Server OS feature. The challenges are outside and relate to Storage and Network. In this blog, I will be discussing Storage.

The biggest challenge to implementing Failover Clustering in Azure is that Azure does not provide native shared block storage to VMs, which is different than on-premises – Fiber Channel SAN, SAS, or iSCSI. That limits SQL Server AlwaysOn Availability Groups (AG) as the primary use case scenario in Azure as SQL AG does not utilize shared storage. Instead, it leverages its own replication at the application layer to replicate the SQL data across the Azure IaaS VMs.

Until now, we have a few more options to work around the shared storage limitation; and that is how we can expand the scenarios beyond SQL AlwaysOn.

Option 1: Application-level replication for non-shared storage

Some applications leverage replication through their own means at the application layer. SQL Server AlwaysOn Availability Groups uses this method.

Option 2: Volume-level replication for non-shared storage

In other words, 3^rdparty storage replication.

A common 3^rdparty solution is SIOS DataKeeper Cluster Edition. There are other solutions on the market, but this is just one example. For more details, please check SIOS’s website:

DataKeeper Cluster Edition: Real-Time Replication of Windows Server Environments
http://us.sios.com/products/datakeeper-cluster/

Option 3: Leverage ExpressRoute for remote iSCSI Target shared block storage for file based storage from an Azure IaaS VMs

ExpressRoute is an Azure exclusive feature. It enables you to create dedicated private connections between Azure datacenters and infrastructure that’s on your premises. It has high throughput network connectivity to guarantee that the disk performance won’t be degraded.

One of the existing examples is NetApp Private Storage (NPS). NPS exposes an iSCSI Target via ExpressRoute with Equinix to Azure IaaS VMs.

Availability on Demand - ASR with NetApp Private Storage
http://channel9.msdn.com/Blogs/Windows-Azure/Availability-on-Demand-ASR-with-NetApp-Private-Storage

For more details about ExpressRoute, please see

ExpressRoute
http://azure.microsoft.com/en-us/services/expressroute/

Option 4 (Not supported): Use an Azure VM as iSCSI Target to provide shared storage to cluster nodes

The fourth option is similar to the concepts mentioned in Option 3. However, it is much more simpler and easier when compared to the third option. In this we just have to move the iSCSI target to Azure.

We do not recommend or support this option mainly due to the performance hit. However, if you'd like to set up a cluster in Azure VMs as a proof of concept, you are welcome to do so.

Note: Please limit this option for development and lab purpose and do not use it in production.

There will be more options to present “shared storage” to Failover Clusters as new scenarios present in the future. We will update this blog along with the KB once new announcements becomes available. As long as you fix the storage, you’ve built the foundation of the cluster.

In my next blog, Part 2, I’ll go through the network part and creation of a cluster.

Stay tuned and enjoy Clustering in Azure!

Mario Liu

Support Escalation Engineer

CSS Americas | WINDOWS | HIGH AVAILABILITY

↧

Building Windows Server Failover Cluster on Azure IAAS VM – Part 2 (Network)

June 17, 2015, 8:52 pm

≫ Next: Supported IP Protocols for Azure Cloud Services

≪ Previous: Windows Server Failover Cluster on Azure IAAS VM – Part 1 (Storage)

Hello, cluster fans!

In my previous blog, Part 1, I talked about how to work around the storage blocker in order to implement Windows Server Failover Cluster on Azure IAAS VM. Now let’s discuss another important part – Network in cluster on Azure.

Before that, you should know some basic concepts of Azure networking. Here are a few Azure terms we need use to setup the Cluster.

VIP (Virtual IP address): A public IP address belongs to the cloud service. It also serves as an Azure Load Balancer which tells how network traffic should be directed before being routed to the VM.

DIP (Dynamic IP address): An internal IP assigned by Microsoft Azure DHCP to the VM.

Internal Load Balancer: It is configured to port-forward or load-balance traffic inside a VNET or cloud service to different VMs.

Endpoint: It associates a VIP/DIP + port combination on a VM with a port on either the Azure Load Balancer for public-facing traffic or the Internal Load Balancer for traffic inside a VNET (or cloud service).

You can refer to this blog for more details about those terms for Azure network:

VIPs, DIPs and PIPs in Microsoft Azure
http://blogs.msdn.com/b/cloud_solution_architect/archive/2014/11/08/vips-dips-and-pips-in-microsoft-azure.aspx

OK, enough reading, Storage is ready and we know the basics of Azure network, can we start to building the Cluster? Yes!

Instead of using Failover Cluster Manager, the preferred method is to use the New-Cluster PowerShell cmdlet and specify a static IP during Cluster creation. When doing it this way, you can add all the nodes and use the proper IP Address from the get go and not have to use the extra steps through Failover Cluster Manager.

Take the above environment as example:

New-Cluster -Name DEMOCLUSTER -Node node1,node2 -StaticAddress 10.0.0.7

Note:The Static IP Address that you appoint to the CNO is not for network communication. The only purpose is to bring the CNO online due to the dependency request. Therefore, you cannot ping that IP, cannot resolve DNS name, and cannot use the CNO for management since its IP is an unusable IP.

If for some reason you do not want to use PowerShell or you used Failover Cluster Manager instead, there are additional steps that you must take. The difference with FCM versus PowerShell is that you need create the Cluster with one node and add the other nodes as the next step. This is because the Cluster Name Object (CNO) cannot be online since it cannot acquire a unique IP Address from the Azure DHCP service. Instead, the IP Address assigned to the CNO is a duplicate address of node who owns CNO. That IP fails as a duplicate and can never be brought online. This eventually causes the Cluster to lose quorum because the nodes cannot properly connect to each other. To prevent the Cluster from losing quorum, you start with a one node Cluster. Let the CNO’s IP Address fail and then manually set up the IP address.

Example:

The CNO DEMOCLUSTER is offline because the IP Address it is dependent on is failed. 10.0.0.4 is the VM’s DIP, which is where the CNO’s IP duplicates from.

In order to fix this, we will need go into the properties of the IP Address resource and change the address to another address in the same subnet that is not currently in use, for example, 10.0.0.7.

To change the IP address, right mouse click on the resource, choose the Properties of the IP Address, and specify the new 10.0.0.7 address.

Once the address is changed, right mouse click on the Cluster Name resource and tell it to come online.

Now that these two resources are online, you can add more nodes to the Cluster.

Now you’ve successfully created a Cluster. Let’s add a highly available role inside it. For the demo purpose, I’ll use the File Server role as an example since this is the most common role that lot of us can understand.

Note:In a production environment, we do not recommend File Server Cluster in Azure because of cost and performance. Take this example as a proof of concept.

Different than Cluster on-premises, I recommend you to pause all other nodes and keep only one node up. This is to prevent the new File Server role from moving among the nodes since the file server’s VCO (Virtual Computer Object) will have a duplicated IP Address automatically assigned as the IP on the node who owns this VCO. This IP Address fails and causes the VCO not to come online on any node. This is a similar scenario as for CNO we just talked about previously.

Screenshots are more intuitive.

The VCO DEMOFS won’t come online because of the failed status of IP Address. This is expected because the dynamic IP address duplicates the IP of owner node.

Manually editing the IP to a static unused 10.0.0.8, in this example, now the whole resource group is online.

But remember, that IP Address is the same unusable IP address as the CNO’s IP. You can use it to bring the resource online but that is not a real IP for network communication. If this is a File Server, none of the VMs except the owner node of this VCO can access the File Share. The way Azure networking works is that it will loop the traffic back to the node it was originated from.

Show time starts. We need to utilize the Load Balancer in Azure so this IP Address is able to communicate with other machines in order to achieving the client-server traffic.

Load Balancer is an Azure IP resource that can route network traffic to different Azure VMs. The IP can be a public facing VIP, or internal only, like a DIP. Each VM needs have the endpoint(s) so the Load Balancer knows where the traffic should go. In the endpoint, there are two kinds of ports. The first is a Regular port and is used for normal client-server communications. For example, port 445 is for SMB file sharing, port 80 is HTTP, port 1433 is for MSSQL, etc. Another kind of port is a Probe port. The default port number for this is 59999. Probe port’s job is to find out which is the active node that hosts the VCO in the Cluster. Load Balancer sends the probe pings over TCP port 59999 to every node in the cluster, by default, every 10 seconds. When you configure a role in Cluster on an Azure VM, you need to know out what port(s) the application uses because you will need to add the port(s) to the endpoint. Then, you add the probe port to the same endpoint. After that, you need update the parameter of VCO’s IP address to have that probe port. Finally, Load Balancer will do the similar port forward task and route the traffic to the VM who owns the VCO. All the above settings need to be completed using PowerShell as the blog was written.

Note: At the time of this blog (written and posted), Microsoft only supports one resource group in cluster on Azure as an Active/Passive model only. This is because the VCO’s IP can only use the Cloud Service IP address (VIP) or the IP address of the Internal Load Balancer. This limitation is still in effect although Azure now supports the creation of multiple VIP addresses in a given Cloud Service.

Here is the diagram for Internal Load Balancer (ILB) in a Cluster which can explain the above theory better:

The application in this Cluster is a File Server. That’s why we have port 445 and the IP for VCO (10.0.0.8) the same as the ILB. There are three steps to configure this:

Step 1: Add the ILB to the Azure cloud service.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$ServiceName = "demovm1-3va468p3" # the name of the cloud service that contains the VM nodes. Your cloud service name is unique. Use Azure portal to find out service name or use get-azurevm.

$ILBName = "DEMOILB" # newly chosen name for the new ILB

$SubnetName = "Subnet-1" # subnet name that the VMs use in the VNet

$ILBStaticIP = "10.0.0.8" # static IP address for the ILB in the subnet

# Add Azure ILB using the above variables.

Add-AzureInternalLoadBalancer -InternalLoadBalancerName $ILBName -SubnetName $SubnetName -ServiceName $ServiceName -StaticVNetIPAddress $ILBStaticIP

# Check the settings.

Get-AzureInternalLoadBalancer –servicename "$ServiceName

Step 2: Configure the load balanced endpoint for each node using ILB.

Run the following powershell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$VMNodes = "DEMOVM1", “DEMOVM2" # cluster nodes’ names, separated by commas. Your nodes’ names will be different.

$EndpointName = "SMB" # newly chosen name of the endpoint

$EndpointPort = "445" # public port to use for the endpoint for SMB file sharing. If the cluster is used for other purpose, i.e., HTTP, the port number needs change to 80.

# Add endpoint with port 445 and probe port 59999 to each node. It will take a few minutes to complete. Please pay attention to ProbeIntervalInSeconds parameter. This tells how often the probe port detects which node is active.

ForEach ($node in $VMNodes)

{

Get-AzureVM -ServiceName $ServiceName -Name $node | Add-AzureEndpoint -Name $EndpointName -LBSetName "$EndpointName-LB" -Protocol tcp -LocalPort $EndpointPort -PublicPort $EndpointPort -ProbePort 59999 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName $ILBName -DirectServerReturn $true | Update-AzureVM

}

# Check the settings.

ForEach ($node in $VMNodes)

{

Get-AzureVM –ServiceName $ServiceName –Name $node | Get-AzureEndpoint | where-object {$_.name -eq "smb"}

}

Step 3: Update the parameters of VCO’s IP address with Probe Port.

Run the following powershell commands inside one of the cluster nodes.

# Define variables

$ClusterNetworkName = "Cluster Network 1" # the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0" # the IP Address resource name (Use get-clusterresource | where-object {$_.resourcetype -eq "IP Address"} or GUI to find the name)

$ILBIP = “10.0.0.8” # the IP Address of the Internal Load Balancer (ILB)

# Update cluster resource parameters of VCO’s IP address to work with ILB.

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{"Address"="$ILBIP";

"ProbePort"="59999";"SubnetMask"="255.255.255.255";"Network"="$ClusterNetworkName";

Please note the above Powershell command has been split into three lines due to page restrictions.

"OverrideAddressMatch"=1;"EnableDhcp"=0}

You should see this window:

Take the IP Address resource offline and bring it online again. Start the clustered role.

Now you have an Internal Load Balancer working with the VCO’s IP. One last task you need do is with the Windows Firewall. You need to at least open port 59999 on all nodes for probe port detection; or turn the firewall off. Then you should be all set. It may take about 10 seconds to establish the connection to the VCO the first time or after you failover the resource group to another node because of the ProbeIntervalInSeconds we set up previously.

In this example, the VCO has an Internal IP of 10.0.0.8. If you want to make your VCO public-facing, you can use the Cloud Service’s IP Address (VIP). The steps are similar and easier because you can skip Step 1 since this VIP is already an Azure Load Balancer. You just need to add the endpoint with a regular port plus the probe port to each VM (Step 2). Then update the VCO’s IP in the Cluster (Step 3). Please be aware, your Clustered resource group will be exposed to the Internet since the VCO has a public IP. You may want to protect it by planning enhanced security methods.

Great! Now you’ve completed all the steps of building a Windows Server Failover Cluster on an Azure IAAS VM. It is a bit longer journey; however, you’ll find it useful and worthwhile. Please leave me comments if you have question.

Happy Clustering!

Mario Liu

Support Escalation Engineer

CSS Americas | WINDOWS | HIGH AVAILABILITY

↧