Windows Azure Role Startup Life Cycle

As you begin to deploy applications to Windows Azure you may find that the application being deployed needs more functionality than is provided on the Windows Azure Guest OS Base Image [Read: How to Change the Guest OS on Windows Azure]. This is possible to do as of Windows Azure SDK 1.3, this is called a Startup Task, also introduced with SDK 1.3 was the concept of Full IIS Support which now takes priority over the legacy [but still functional] Hosted Web Core Deployment Model [Read: New Full IIS Capabilities: Differences from Hosted Web Core].

Much like an ASP.NET Web Forms Developer Learns the ASP.NET Page Life Cycle, it’s important for a Developer that is building applications for Windows Azure to know the Role Startup Life Cycle when they are deciding between creating Startup scripts and leveraging the Windows Azure VM Role.

Windows Azure Role Startup Players

There are a few players in the Role Startup process on Windows Azure [SDK 1.3]. Please note that I am only calling out the players in the Full IIS Support Process [This is not Hosted Web Core].

Startup Tasks

In Windows Azure a startup task is a Command-Line Executable file that gets referenced in the Cloud Service Definition file (.csdef). When a deployment is uploaded to Windows Azure a component of the Windows Azure Platform called the Fabric Controller reads the Definition file to allocate the necessary resources for your Deployment. Once the environment is set up the Fabric Controller initializes the Role Startup Process.

To best understand startup tasks lets dive into the “Task” Element of the Cloud Service Definition file.

<ServiceDefinition name="MyService" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition">
   <WebRole name="WebRole1">
      <Startup>
         <Task commandLine="Startup.cmd" executionContext="limited" taskType="simple">
         </Task>
      </Startup>
   </WebRole>
</ServiceDefinition>

CommandLine Attribute

This is the name of the file to be executed when the startup task runs. How this script or executable process is run in the role is left up to the executionContext and the taskType.

ExecutionContext Attribute

The executionContext for a startup task defines the level of permissions that the script or executable file is able to run at within a particular role. There are two types of Execution Contexts:

  1. Limited: Runs the script/executable at the same permission level of the role runs at.
  2. Elevated: Runs the script/executable with Administrative privileges.

TaskType Attribute

The taskType for a startup task defines how a startup task process is executed. The TaskType is a key contributor to the length of your startup process. The reasoning behind this is the different task types execute either Synchronously or Asynchronously. Obviously the Synchronous task execution will take longer to execute so be cautious of the length of your installations as startup tasks should not execute any longer than 5 minutes. Here are the three different types of Startup tasks:

  • Simple (Default): (Synchronous) The Task is launched and the Instance is Blocked until the task completes. [Note: If the task fails, the instance is blocked and will fail to start]
  • Background: (Asynchronous) This starts the Task process and continues the Role startup. [Note: Fire and Forget. Be sure to Log these processes for debugging]
  • Foreground: (Asynchronous) The task is executed but doesn’t allow the Role to shutdown until the foreground tasks exit.

IISConfigurator.exe

The IISConfigurator process is an executable file that you’ll find in both the Compute Emulator as well as the Windows Azure Environment in the Cloud. IISConfigurator.exe is responsible for iterating through the Sites Node to add and configure the necessary Websites [, Virtual Directories & Applications], Ports and Host Headers within IIS. I wrote a post previously that talked about IISConfigurator.exe in the context of the Compute Emulator.

[Web/Worker]Role.cs : RoleEntryPoint

Within a Web or Worker Role you will find a class that inherits from RoleEntryPoint. RoleEntryPoint exposes 3 events OnStart, Run and OnStop, these methods are used to perform tasks over the lifecycle of the Role. I will explain these events in more detail later on in this post.

Windows Azure Startup Life Cycle

Let’s take a look at the Process that Windows Azure takes when it’s deploying our Hosted Service.

The Entire process is kicked off when you upload your Cloud Service Package and Cloud Service Configuration to Windows Azure. [It is a good practice to upload your deployments into Blob Storage in order to Deploy your applications. This allows you keep versions of your Deployments and keep copies on hand incase the Hosted Service needs to be rolled back to a previous version].

Next the Fabric Controller goes to work searching out the necessary Hardware for your Application as described in the Cloud Service Definition. Once the Fabric Controller has found and allocated the necessary resources for your Hosted Service, the appropriate Guest OS Image is deployed to the Server and initializes the Guest OS Boot Process.

Flow Chart: Windows Azure Startup Life Cycle

This is where things begin to get interesting.

If the current Role contains a Startup node in the Cloud Service Definition the Role begins executing each Task in the same Sequence in which they are ordered in your Cloud Service Definition. Depending on the TaskType [explained above] these startup tasks may run Synchronously [simple] or Asynchronously [background, foreground].

When the ‘Simple’ [Synchronous] startup tasks complete execution the role begins its next phase of the Role Startup Life Cycle. In this phase of the Life Cycle the IISConfigurator Process is executed Asynchronously, at the same time [or not too long after] the OnStart event is fired.

Once the OnStart event Completes, the Role should be in the Ready State and will begin accepting requests from the Load Balancer. At this point the Run event is fired. Even though the Run event isn’t present in the WebRole.cs code template you are able to override the event and provide some tasks to be executed. The Run event is most used in a worker role which is implemented as an infinite loop with a  periodic call to Thread.Sleep if there is no work to be done.

After a long period of servicing requests there will most likely come a time when the role will be recycled, or may no longer be needed and shut down. This will fire the OnStop event, to handle any Role clean up that may be required such as moving Diagnostics data or persisting files from the scratch disk to Blob Storage. Be aware that the OnStop method does have a time restriction, so it is wise to periodically transfer critical data while your role is still healthy and not attempting to shut down.

After the OnStop Event has exited [either code execution completes, or times out] the Role begins the stopping process and terminates the Job Object. Once the Job Object is terminated the role is restarted and once again follows the process explained above of the Startup Life Cycle. This is important to know as it is possible for your Startup Tasks to be executed multiple times on the same Role, for this reason it is important that your startup tasks be Idempotent.

Issues that may Arise during the Startup Life Cycle

As Software Developers we are all aware of what some call the “Happy Path” this is a term that is used to describe a process that goes exactly as planned. However, in the real world, no process is iron clad and we need to be aware of situations that may go awry.

Here is a quick list of potential issues during the Startup Life Cycle:

  • Startup Tasks may fail to Execute
  • Startup Tasks may never Complete
  • IISConfigurator.exe may overwrite Startup Task based modifications to IIS
  • IISConfigurator.exe and OnStart might end up in a race condition
  • Startup Tasks and OnStart may end up in a race condition
  • Startup Process might not be idempotent

How to Avoid Race Conditions in Role Startup

There are a number of things you can do to help avoid Race Conditions within your startup Process. This is typically done by placing checks and balances around the resources that will be affected by the specific item in the startup process. The other thing that could be done is to create a VM Role.

The VM Role is a Windows Server 2008 R2 Image on a VHD which you configure with all the components you need installed on your Role. You will also need to install the Windows Azure Integration Components and perform a generalization with sysprep.

The VM Role is a good approach if:

  1. Your startup tasks are starting to effect your deployment time
  2. Application installations are Error-prone
  3. Application installation requires manual interaction.

I will cover more on the VM Role in other blog post, as it’s a large enough topic on it’s own.

Once you have configured your VM Role instance it is uploaded to the cloud by using the csupload command-line tool, or using the Cloud Tools in Visual Studio. This new VM Role image you created can now be Provisioned in place of one of Microsoft’s Guest OS Images. It is good to note that startup tasks are not supported in the VM Role as it is assumed the image is uploaded as required.

Conclusion

Knowing the Life Cycle of what you’re working on is very important. It’s that one piece of context that you can always come back to in order to regain a little more clarity in a situation.

Happy Clouding!

Special Thanks to: Steve Marx, David Murray, Daniel Wang, Terri Schmidt, Adam Sampson & [my failover twin] Corey Sanders [from the Windows Azure Team] for taking the time out of their busy schedules to ensure this information is both accurate and complete.

[Update: May 31, 2011]

Kevin Williamson, an Engineer on the Windows Azure Cloud Engineering Team, has posted some additional guidance on the Architecture of a Windows Azure Role. This Guidance is quite detailed an may provide you with some more in-depth information on the different components of a Windows Azure Role.

[/Update]