Configuring diagnostics in Cloud Services

An Azure Cloud Service might comprise multiple instances of multiple roles. These instances all run in a remote Azure data center, typically 24/7. The ability to monitor these instances nonintrusively is essential both in detecting failure and in capacity planning.

Diagnostic data can be used to identify problems with a Cloud Service. The ability to view the data from several sources and across different instances eases the task of identifying a problem. The process to configure Azure Diagnostics is at the role level, but the diagnostics configuration is performed at the instance level. For each instance, a configuration file is stored in a XML blob in a container named wad-control-container located in the storage service account configured for Azure Diagnostics.

Tip

A best practice from both security and performance perspectives would be to host application data and diagnostic data in separate storage service accounts. Actually, there is no need for application data and diagnostics data to be located in the same storage service account.

Azure Diagnostics supports the following diagnostic data:

  • Application logs: This captures information written to a trace listener
  • Event logs: This captures the events from any configured Windows Event Log
  • Performance counters: This captures the data of any configured performance counters
  • Infrastructure logs: This captures diagnostic data produced by the Diagnostics process itself

Azure Diagnostics also supports file-based data sources. It copies new files of a specified directory to blobs in a specified container in the Azure Blob Service. The data captured by IIS Logs, IIS Failed Request Logs, and Crash Dumps is self-evident. With the custom directories data source, Azure Diagnostics supports the association of any directory on the instance. This allows for the coherent integration of third-party logs.

The Diagnostics Agent service is included as Active by default for each new Visual Studio Azure Service project.

Tip

The Diagnostics Agent would collect and transfer a user-defined set of logs. The process does not add so much overhead to the normal operations, but the more logs collected, the more delays in the running machines.

Then, it is started automatically when a role instance starts, provided the Diagnostics module has been imported into the role. This requires the placement of a file named diagnostics.wadcfg in a specific location in the role package. When an instance is started for the first time, the Diagnostic Agent reads the file and initializes the diagnostic configuration for the instance in wad-control-container with it. Initial configuration typically occurs in Visual Studio at design time, while further changes could be made either from Visual Studio by the Management API or manually.

Tip

In the past, Diagnostic initialization has been made by user code. This is not recommended due to the high volume of hardcoded directives. If needed, the class responsible for this is the DiagnosticsMonitorConfiguration class.

Azure Diagnostics supports the use of Trace to log messages. Methods of the System.Diagnostics.Trace class can be used to write error, warning, and informational messages (the Compute Emulator in the development environment adds an additional trace listener so that trace messages can be displayed in the Compute Emulator UI).

Azure Diagnostics captures diagnostic information for an instance, keeps it in a local buffer, and, periodically, it persists this data to the Azure Storage service. The Azure Diagnostics tables can be queried just like any other table in the Table service. The Diagnostics Agent persists the data mentioned earlier according to the following tables' mapping:

  • Application logs: WADLogsTable
  • Event logs: WADWindowsEventLogsTable
  • Performance counters: WADPerformanceCountersTable
  • Infrastructure logs: WADDiagnosticInfrastructureLogsTable

As the only index on a table is on PartitionKey and RowKey, it is important that PartitionKey rather than Timestamp or EventTickCount be used for time-dependent queries.

In this recipe, we see how to configure and use Diagnostic features in the role environment, collecting every available log data source and tracing info.

Getting ready

This recipe assumes that we have an empty Cloud Service and an empty Storage account. To create the first one, go to the Azure Portal and follow the wizards without deploying anything in it. To create the second one, follow the instructions of the Managing the Azure Storage Service recipe in Chapter 3, Getting Storage with Blobs in Azure.

Tip

We apologize, but as Azure building blocks are interconnected to provide complex services, it is hard to explain a topic atomically. This is the case of Diagnostics for Cloud Services, which requires knowledge of Storage basics.

How to do it…

We are going to create a simple worker role-triggering diagnostics collection using the following steps:

  1. In Visual Studio, create a new Azure Cloud Service with a worker role named Worker.
  2. Right-click on the Worker item in the Roles folder of the created project and select Properties.
  3. In the Configuration tab, perform the following actions:
    • Verify that the Enable Diagnostics checkbox is checked
    • Select Custom plan
    • In the Specify the storage account credentials for the Diagnostics results field, enter the connection string of the Diagnostics storage account (by clicking the … (more) button, a wizard could help to build this string)
  4. To customize the data collected by the Diagnostic service, click on the Edit button of the Custom plan option selected earlier.
  5. In the Diagnostics configuration window, select the logging mix, for example:
    • Application logs: Verbose level, 1-minute transfer, 1024 MB buffer size
    • Event logs: Application + System + Security with verbose level, 1-minute transfer, 1024 MB buffer size
    • Performance counters: 1-minute transfer, 1 MB buffer size, and "% processor time" metric
    • Infrastructure logs: Verbose level, 1-minute transfer, no buffer size
  6. Close the window and save the configuration.
  7. In the WorkerRole.cs file, in the Run() method, write this code:
    while (true)
    {
        Thread.Sleep(1000);
        DateTime now = DateTime.Now;
        Trace.TraceInformation("Information: "+now);
        Trace.TraceError("Error: " + now);
        Trace.TraceWarning("Warning: " + now); 
    }
  8. Right-click on the Cloud Service project and Publish it using the following steps:
    1. Select the proper subscription.
    2. In Common Settings, select the previously created empty Cloud Service.
    3. In Advanced Settings, select the previously created Storage Account.
    4. Confirm, click on Publish, and wait for a few minutes.

    Note

    Reading the collected data.

  9. In the Server Explorer window of Visual Studio, expand the Azure node and locate the proper storage account of the storage subnode.
  10. Expand the tables node, and for each table found, right-click on it and select View Table.

How it works...

From steps 1 to 3, we prepared the wrapper project to hold the worker role and configure it. By enabling the Diagnostics feature, an Import directive was placed into the service definition file. By selecting Custom plan, we told Azure to collect user-defined data into the storage account we finally specified.

In steps 4 and 5, we customized the collected data, telling the platform what to log and when to transfer to storage.

Tip

Azure instances are stateless, meaning that an instance could be taken down and be replaced by a new one seamlessly. Storing logs in the VM leads to some design issues. What happens if the instance has been recycled? How do you read service-wide logs centrally? This is why a transfer phase is involved in log capturing.

After saving in step 6, we added some tracing code in step 7 and published the Cloud Service as shown in step 8.

Tip

A more sophisticated way to trace messages is to use trace sources and trace switches to control the capture of messages. Typically, this control can be configured through the app.config file for an application.

In steps 9 and 10, we used the built-in features of the Azure SDK integration for Visual Studio to browse through the storage account elected for the Diagnostics collection.

While each table contains some properties specific to the data being logged, all of them contain the following properties:

  • EventTickCount
  • DeploymentId
  • Role
  • RoleInstance

The EventTickCount property is Int64, which represents the time in which the event was generated, to an accuracy of 100 nanoseconds. The DeploymentId property identifies the specific deployment, while the Role and RoleInstance properties specify the role instance that generated the event.

The WADPerformanceCountersTable table, for example, contains the following additional properties:

  • CounterName
  • CounterValue

    Tip

    When browsing through the collected data, note that the tables are partitioned by minute. Specifically, when a record is inserted in a table, PartitionKey is set to the tick count of the current UTC time with the seconds discarded, with the entire value prepended by a 0. Discarding the seconds has the effect of setting the last eight characters of PartitionKey to 0. The RowKey property combines the deployment ID, the role name, and the instance ID, along with a key to ensure uniqueness. Timestamp represents the time the event was inserted in the table.

There's more…

Once the deployment has been made, the Diagnostics configuration can be edited easily from Visual Studio as follows:

  1. In the Cloud Services node of the Azure main node in the Server Explorer windows, select the previously created Cloud Service.
  2. Expand the node and select the desired role in the desired slot (staging or production).
  3. By selecting Update Diagnostics Settings option, we can change the Diagnostics configuration at runtime.

Log directories

As mentioned earlier, we can transfer entire directories into the selected Storage Account, for instance, integrate a third-party tool by logging directly in the filesystem. To do this, we can open the diagnostics.wadcfg file and add this code in the <Directories> tag:

<DataSources>
  <DirectoryConfiguration container="wad-mylog" directoryQuotaInMB="128">
    <Absolute expandEnvironment="true" path="%SystemRoot%\myTool\logs" />
  </DirectoryConfiguration>      
</DataSources>

Alerts

Azure has an integrated alerting system to notify users with particular events. Despite it is not only related to Cloud Services, the following are some steps to enable it for the previously created one:

  • In the Azure Portal, go to the Management Services section and click on the Alter tab
  • Add a new rule, specifying the following:
    • The name of the rule
    • Service type: Cloud Service
    • Service name: the one previously created
    • Cloud Service deployment: Production
    • Cloud Service role: Worker
  • In the second step, choose CPU Percentage as the metric to monitor
  • Set the greater than condition to 70% with the remaining default values

This alert will notify the user that creates it and, optionally, the service administrator and co-administrators.

See also

Have a look at the following MSDN links to get additional information: