By Michael Woloszynowicz

By Michael Woloszynowicz

Tuesday, March 27, 2012

Up and Running with Elastic Beanstalk - Part 1


Amazon’s Elastic Beanstalk has been around for some time now, but strangely there are few articles written about it. Over the course of several articles I’d like to share some of my experiences to date and as well as advice on the challenges you’ll encounter along the way. Unlike most of Amazon’s service offerings, Elastic Beanstalk isn’t a service per se. Rather it is a collection of existing Amazon services strung together to yield a low maintenance, easy to set up and highly scalable web hosting environment for JVM based applications and recently, Amazon has added support for PHP and Git deployments. Beanstalk is geared at new AWS users as well as companies that have not yet achieved a scale that requires very fine tuning of their infrastructure.Although Elastic Beanstalk is relatively simple to use, a good deal of the challenges you’ll encounter stem from the immediate requirement to make your application work under a distributed environment. For those that have been writing AWS based applications for some time, this is second nature, but for those new to the task, it can pose a number of challenges. The great thing about adhering to the restrictions imposed by Elastic Beanstalk is that it forces you to think about scalability from the outset making your applications more robust and decoupled. This can also be a deterrent as few of us want to invest precious idea validation time writing highly scalable code, but with a good game plan it proves to be less onerous than expected.

Getting Started
Before we dive into the details of implementing Elastic Beanstalk applications, there are a few setup steps that we need to get out of the way. I’ll only cover these on a high level as they are described in detail on Amazon’s website. If you’re an Eclipse IDE user, the first thing you’ll want to do is download the AWS plugin. This will help you greatly with application deployments and managing various Amazon services. In addition, it will provide you with a sample AWS application that serves as an excellent starting point for your new project. With the eclipse plugin setup, enable the elastic beanstalk service via the Amazon Management Console and create a new environment using the wizard. Typically you will setup two environments, staging and production. During the initial setup, select some reasonable defaults that you will tune later. Each environment is given a domain name in the form of something.elasticbeanstalk.com which you can map to your own domain by creating a new CNAME. With your environments setup, create a new AWS application and deploy it to the environment as described here. You’ll notice that deployment is extremely simple via the Eclipse plugin, but should you wish, you can also deploy your application via the Management Console. To do this, export your web project to a WAR file, proceed to the Elastic Beanstalk tab in the management console, select your environment, and in the actions dropdown, select “Deploy a Different Version”. Here you can upload your WAR file and give your deployment version a name. Each WAR file is stored in an S3 bucket provisioned by Elastic Beanstalk so you can go back to a different version whenever you wish. Now that you have a feel for deploying a basic application, let’s explore the finer details of writing one.

Your Server is Fleeting
The most important thing to understand about Elastic Beanstalk is that you cannot rely on the web server to remain constant. At any point, the load balancer can provision additional servers and decommission existing ones. The implication of this is that data is ephemeral so you cannot rely on any data or content stored on the web server to exist across multiple requests. In general, Amazon recommends that you not use any form of local storage when using Elastic Bleanstalk. The most immediate implication of this is for your applications configuration files.

Configuration Files
The standard practice for configuration files is to create them as an XML file, store them on the local disk and read them when the application starts. As a result of the ephemeral nature of the local storage, all configuration data must be written to another storage medium. Your two main options are SimpleDB and S3. SimpleDB seems like a logical choice but the JSON style structure of non-relational databases is not always the easiest to modify and maintain. Ultimately S3 proves to be the best choice and requires only a few helper methods to mimic what you’re used to with local storage.

The first thing you’ll realize is that you’ll need to store the names of any S3 buckets, SQS queues, SimpleDB schemas, etc. in your config file as opposed to hard coding them. The reason for this is that with local, staging, and production environments, you don’t want your data cross contaminating. For example if you use the same SQS queue for all three environments, pushing something into the queue from your local machine can cause it to be popped on the staging or production environment with potentially disastrous results. Similarly, if you use just one S3 bucket across the three environments, it will make cleanup more complicated as test data will be mixed with production data. To get around this, you’ll want to start with three separate configuration files, one for each environment. In my case I tend to name them foo.bar.local, foo.bar.staging, foo.bar.production etc. These files will all end up in a bucket called appname.config or something along those lines.

Now that we have three sets of configuration file, the next step is to read and write the correct files within the application. To load up the appropriate config file in a given environment, Amazon provides a number of environment parameters that can be passed into the application. The parameters are of the form “PARAM1”, “PARAM2”, and so forth, and can be set by going into the Management Console, Elastic Beanstalk Tab, finding the desired environment and clicking “Edit Configuration” in the dropdown. Once in the configuration dialog, proceed to the “Container” tab and there you’ll find the above “PARAM” parameter boxes. In our case we used PARAM1 as the setting for environment type and set this to ENV_STAGING, ENV_PROD, and ENV_LOCAL depending on the type of environment. The same variables can be set on your machine by setting a VM environment variable as -DPARAM1=ENV_LOCAL. With the values set, we can read them inside our application with a quick call to System.getProperty(“PARAM1”) which will tell us exactly which file the application should load from S3. A sample Scala code snipped for reading and writing String values from/to S3 can be found below. Once the data is read then simply use your preferred method to parse the XML file. The last issue to deal with is the frequency at which the files should be read. I recommend caching the configuration information in Application scope with a 2-5 minute timeout. The reason we don’t cache the information permanently is that by reading the file on a periodic basis, we can externally change the configuration file content and have the application update its settings after each timeout. This is very important as multiple servers are running at any one time so the configuration changes need to have a way of propagating to all of them without having to restart each server. A common scenario where we use this is to dynamically configure the interval of polling threads, or disable/enable them on the fly. To change a configuration file, all we have to do is proceed to the management console, download the file, change it, and re-upload it. 

//This is how we read a string from AmazonS3
def readFromS3(bucket: String, key: String, encoding: String = "ANSI_X3.4-1968") = {
  val writer = new java.io.StringWriter()
  val s3 = AmazonService.createS3Client //Our version of new AmazonS3Client(createCredentials())
  val obj = s3.getObject(bucket, key)
  org.apache.commons.io.IOUtils.copy(obj.getObjectContent(), writer, encoding); 
  writer.toString();
}


//This is how we write a string to an AmazonS3 file
def storeString(bucket: String, key: String, value: String, type: String = "text/plain"):PutObjectResult = {
  val s3 = AmazonService.createS3Client();
  val metaMap = new java.util.HashMap[String, String]();

  metaMap.put("name", key+".txt");
  metaMap.put("lastModified", ""+System.currentTimeMillis());
  val omd = new ObjectMetadata();
  omd.setContentType(type);
  omd.setContentLength(value.length());
  omd.setUserMetadata(metaMap);
  val is = new java.io.ByteArrayInputStream(value.getBytes(java.nio.charset.Charset.defaultCharset()) );
  val put = new PutObjectRequest(bucket, key, is, omd);    
  s3.putObject(put);        
}

Although the topics we’ve looked at today are fairly dry, they’ll serve as the foundation for the Elastic Beanstalk application. The next post will look at more interesting topics like writing long running services and storing session data across servers. 

If you liked this post please follow me on Twitter  and upvote it on Hacker News

No comments:

Post a Comment