Amazon’s Elastic Beanstalk has been around for some time
now, but strangely there are few articles written about it. Over the course of
several articles I’d like to share some of my experiences to date and as well as advice on the challenges you’ll encounter along the way. Unlike most of
Amazon’s service offerings, Elastic Beanstalk isn’t a service per se. Rather it
is a collection of existing Amazon services strung together to yield a low
maintenance, easy to set up and highly scalable web hosting environment for JVM
based applications and recently, Amazon has added support for PHP and Git
deployments. Beanstalk is geared at new AWS users as well as companies that
have not yet achieved a scale that requires very fine tuning of their
infrastructure.Although Elastic Beanstalk is relatively simple to use, a
good deal of the challenges you’ll encounter stem from the immediate
requirement to make your application work under a distributed environment. For
those that have been writing AWS based applications for some time, this is
second nature, but for those new to the task, it can pose a number of
challenges. The great thing about adhering to the restrictions imposed by
Elastic Beanstalk is that it forces you to think about scalability from the
outset making your applications more robust and decoupled. This can also be a
deterrent as few of us want to invest precious idea validation time writing
highly scalable code, but with a good game plan it proves to be less onerous
than expected.
Getting Started
Before we dive into the details of implementing Elastic
Beanstalk applications, there are a few setup steps that we need to get out of
the way. I’ll only cover these on a high level as they are described in detail
on Amazon’s website. If you’re an Eclipse IDE user, the first thing you’ll want
to do is download the AWS
plugin. This will help you greatly with application deployments and
managing various Amazon services. In addition, it will provide you with a
sample AWS application that serves as an excellent starting point for your new
project. With the eclipse plugin setup, enable the elastic beanstalk service
via the Amazon Management Console and create a new environment using the wizard.
Typically you will setup two environments, staging and production. During the
initial setup, select some reasonable defaults that you will tune later. Each
environment is given a domain name in the form of something.elasticbeanstalk.com
which you can map to your own domain by creating a new CNAME. With your
environments setup, create a new AWS application and deploy it to the
environment as described here.
You’ll notice that deployment is extremely simple via the Eclipse plugin, but
should you wish, you can also deploy your application via the Management
Console. To do this, export your web project to a WAR file, proceed to the
Elastic Beanstalk tab in the management console, select your environment, and
in the actions dropdown, select “Deploy a Different Version”. Here you can
upload your WAR file and give your deployment version a name. Each WAR file is
stored in an S3 bucket provisioned by Elastic Beanstalk so you can go back to a
different version whenever you wish. Now that you have a feel for deploying a basic
application, let’s explore the finer details of writing one.
Your Server is
Fleeting
The most important thing to understand about Elastic
Beanstalk is that you cannot rely on the web server to remain constant. At any
point, the load balancer can provision additional servers and decommission existing
ones. The implication of this is that data is ephemeral so you cannot rely on
any data or content stored on the web server to exist across multiple requests.
In general, Amazon recommends that you not use any form of local storage when
using Elastic Bleanstalk. The most immediate implication of this is for your
applications configuration files.
Configuration Files
The standard practice for configuration files is to create
them as an XML file, store them on the local disk and read them when the
application starts. As a result of the ephemeral nature of the local storage,
all configuration data must be written to another storage medium. Your two main
options are SimpleDB and S3. SimpleDB seems like a logical choice but the JSON
style structure of non-relational databases is not always the easiest to modify
and maintain. Ultimately S3 proves to be the best choice and requires only a
few helper methods to mimic what you’re used to with local storage.
The first thing you’ll realize is that you’ll need to store the
names of any S3 buckets, SQS queues, SimpleDB schemas, etc. in your config file
as opposed to hard coding them. The reason for this is that with local,
staging, and production environments, you don’t want your data cross
contaminating. For example if you use the same SQS queue for all three
environments, pushing something into the queue from your local machine can
cause it to be popped on the staging or production environment with potentially disastrous results. Similarly, if
you use just one S3 bucket across the three environments, it will make cleanup
more complicated as test data will be mixed with production data. To get around
this, you’ll want to start with three separate configuration files, one for
each environment. In my case I tend to name them foo.bar.local, foo.bar.staging,
foo.bar.production etc. These files will all end up in a bucket called
appname.config or something along those lines.
Now that we have three sets of configuration file, the next
step is to read and write the correct files within the application. To load up
the appropriate config file in a given environment, Amazon provides a number of
environment parameters that can be passed into the application. The parameters
are of the form “PARAM1”, “PARAM2”, and so forth, and can be set by going into
the Management Console, Elastic Beanstalk Tab, finding the desired environment
and clicking “Edit Configuration” in the dropdown. Once in the configuration
dialog, proceed to the “Container” tab and there you’ll find the above “PARAM”
parameter boxes. In our case we used PARAM1 as the setting for environment type
and set this to ENV_STAGING, ENV_PROD, and ENV_LOCAL depending on the type of
environment. The same variables can be set on your machine by setting a VM
environment variable as -DPARAM1=ENV_LOCAL. With the values set, we can read
them inside our application with a quick call to System.getProperty(“PARAM1”)
which will tell us exactly which file the application should load from S3. A
sample Scala code snipped for reading and writing String values from/to S3 can be
found below. Once the data is read then simply use your preferred method to parse
the XML file. The last issue to deal with is the frequency at which the files
should be read. I recommend caching the configuration information in
Application scope with a 2-5 minute timeout. The reason we don’t cache the
information permanently is that by reading the file on a periodic basis, we can
externally change the configuration file content and have the application
update its settings after each timeout. This is very important as multiple
servers are running at any one time so the configuration changes need to have a
way of propagating to all of them without having to restart each server. A
common scenario where we use this is to dynamically configure the interval of
polling threads, or disable/enable them on the fly. To change a configuration file, all we have to do is proceed to the management console, download the file, change it, and re-upload it.
//This is how we read a string from AmazonS3
def readFromS3(bucket: String, key: String, encoding: String = "ANSI_X3.4-1968") = {
val writer = new java.io.StringWriter()
val s3 = AmazonService.createS3Client //Our version of new AmazonS3Client(createCredentials())val obj = s3.getObject(bucket, key) org.apache.commons.io.IOUtils.copy(obj.getObjectContent(), writer, encoding); writer.toString();
}
|
//This is how we write a string to an AmazonS3 file
def storeString(bucket: String, key: String, value: String, type: String = "text/plain"):PutObjectResult = {
val s3 = AmazonService.createS3Client();
val metaMap = new java.util.HashMap[String, String]();
metaMap.put("name", key+".txt"); metaMap.put("lastModified", ""+System.currentTimeMillis()); val omd = new ObjectMetadata(); omd.setContentType(type); omd.setContentLength(value.length()); omd.setUserMetadata(metaMap); val is = new java.io.ByteArrayInputStream(value.getBytes(java.nio.charset.Charset.defaultCharset()) ); val put = new PutObjectRequest(bucket, key, is, omd); s3.putObject(put); } |
Although the topics we’ve looked at today are fairly dry,
they’ll serve as the foundation for the Elastic Beanstalk application. The next
post will look at more interesting topics like writing long running services and storing session data across
servers.
If you liked this post please follow me on Twitter and upvote it on Hacker News
If you liked this post please follow me on Twitter and upvote it on Hacker News


