By Michael Woloszynowicz

By Michael Woloszynowicz

Tuesday, March 27, 2012

Up and Running with Elastic Beanstalk - Part 1


Amazon’s Elastic Beanstalk has been around for some time now, but strangely there are few articles written about it. Over the course of several articles I’d like to share some of my experiences to date and as well as advice on the challenges you’ll encounter along the way. Unlike most of Amazon’s service offerings, Elastic Beanstalk isn’t a service per se. Rather it is a collection of existing Amazon services strung together to yield a low maintenance, easy to set up and highly scalable web hosting environment for JVM based applications and recently, Amazon has added support for PHP and Git deployments. Beanstalk is geared at new AWS users as well as companies that have not yet achieved a scale that requires very fine tuning of their infrastructure.Although Elastic Beanstalk is relatively simple to use, a good deal of the challenges you’ll encounter stem from the immediate requirement to make your application work under a distributed environment. For those that have been writing AWS based applications for some time, this is second nature, but for those new to the task, it can pose a number of challenges. The great thing about adhering to the restrictions imposed by Elastic Beanstalk is that it forces you to think about scalability from the outset making your applications more robust and decoupled. This can also be a deterrent as few of us want to invest precious idea validation time writing highly scalable code, but with a good game plan it proves to be less onerous than expected.

Getting Started
Before we dive into the details of implementing Elastic Beanstalk applications, there are a few setup steps that we need to get out of the way. I’ll only cover these on a high level as they are described in detail on Amazon’s website. If you’re an Eclipse IDE user, the first thing you’ll want to do is download the AWS plugin. This will help you greatly with application deployments and managing various Amazon services. In addition, it will provide you with a sample AWS application that serves as an excellent starting point for your new project. With the eclipse plugin setup, enable the elastic beanstalk service via the Amazon Management Console and create a new environment using the wizard. Typically you will setup two environments, staging and production. During the initial setup, select some reasonable defaults that you will tune later. Each environment is given a domain name in the form of something.elasticbeanstalk.com which you can map to your own domain by creating a new CNAME. With your environments setup, create a new AWS application and deploy it to the environment as described here. You’ll notice that deployment is extremely simple via the Eclipse plugin, but should you wish, you can also deploy your application via the Management Console. To do this, export your web project to a WAR file, proceed to the Elastic Beanstalk tab in the management console, select your environment, and in the actions dropdown, select “Deploy a Different Version”. Here you can upload your WAR file and give your deployment version a name. Each WAR file is stored in an S3 bucket provisioned by Elastic Beanstalk so you can go back to a different version whenever you wish. Now that you have a feel for deploying a basic application, let’s explore the finer details of writing one.

Your Server is Fleeting
The most important thing to understand about Elastic Beanstalk is that you cannot rely on the web server to remain constant. At any point, the load balancer can provision additional servers and decommission existing ones. The implication of this is that data is ephemeral so you cannot rely on any data or content stored on the web server to exist across multiple requests. In general, Amazon recommends that you not use any form of local storage when using Elastic Bleanstalk. The most immediate implication of this is for your applications configuration files.

Configuration Files
The standard practice for configuration files is to create them as an XML file, store them on the local disk and read them when the application starts. As a result of the ephemeral nature of the local storage, all configuration data must be written to another storage medium. Your two main options are SimpleDB and S3. SimpleDB seems like a logical choice but the JSON style structure of non-relational databases is not always the easiest to modify and maintain. Ultimately S3 proves to be the best choice and requires only a few helper methods to mimic what you’re used to with local storage.

The first thing you’ll realize is that you’ll need to store the names of any S3 buckets, SQS queues, SimpleDB schemas, etc. in your config file as opposed to hard coding them. The reason for this is that with local, staging, and production environments, you don’t want your data cross contaminating. For example if you use the same SQS queue for all three environments, pushing something into the queue from your local machine can cause it to be popped on the staging or production environment with potentially disastrous results. Similarly, if you use just one S3 bucket across the three environments, it will make cleanup more complicated as test data will be mixed with production data. To get around this, you’ll want to start with three separate configuration files, one for each environment. In my case I tend to name them foo.bar.local, foo.bar.staging, foo.bar.production etc. These files will all end up in a bucket called appname.config or something along those lines.

Now that we have three sets of configuration file, the next step is to read and write the correct files within the application. To load up the appropriate config file in a given environment, Amazon provides a number of environment parameters that can be passed into the application. The parameters are of the form “PARAM1”, “PARAM2”, and so forth, and can be set by going into the Management Console, Elastic Beanstalk Tab, finding the desired environment and clicking “Edit Configuration” in the dropdown. Once in the configuration dialog, proceed to the “Container” tab and there you’ll find the above “PARAM” parameter boxes. In our case we used PARAM1 as the setting for environment type and set this to ENV_STAGING, ENV_PROD, and ENV_LOCAL depending on the type of environment. The same variables can be set on your machine by setting a VM environment variable as -DPARAM1=ENV_LOCAL. With the values set, we can read them inside our application with a quick call to System.getProperty(“PARAM1”) which will tell us exactly which file the application should load from S3. A sample Scala code snipped for reading and writing String values from/to S3 can be found below. Once the data is read then simply use your preferred method to parse the XML file. The last issue to deal with is the frequency at which the files should be read. I recommend caching the configuration information in Application scope with a 2-5 minute timeout. The reason we don’t cache the information permanently is that by reading the file on a periodic basis, we can externally change the configuration file content and have the application update its settings after each timeout. This is very important as multiple servers are running at any one time so the configuration changes need to have a way of propagating to all of them without having to restart each server. A common scenario where we use this is to dynamically configure the interval of polling threads, or disable/enable them on the fly. To change a configuration file, all we have to do is proceed to the management console, download the file, change it, and re-upload it. 

//This is how we read a string from AmazonS3
def readFromS3(bucket: String, key: String, encoding: String = "ANSI_X3.4-1968") = {
  val writer = new java.io.StringWriter()
  val s3 = AmazonService.createS3Client //Our version of new AmazonS3Client(createCredentials())
  val obj = s3.getObject(bucket, key)
  org.apache.commons.io.IOUtils.copy(obj.getObjectContent(), writer, encoding); 
  writer.toString();
}


//This is how we write a string to an AmazonS3 file
def storeString(bucket: String, key: String, value: String, type: String = "text/plain"):PutObjectResult = {
  val s3 = AmazonService.createS3Client();
  val metaMap = new java.util.HashMap[String, String]();

  metaMap.put("name", key+".txt");
  metaMap.put("lastModified", ""+System.currentTimeMillis());
  val omd = new ObjectMetadata();
  omd.setContentType(type);
  omd.setContentLength(value.length());
  omd.setUserMetadata(metaMap);
  val is = new java.io.ByteArrayInputStream(value.getBytes(java.nio.charset.Charset.defaultCharset()) );
  val put = new PutObjectRequest(bucket, key, is, omd);    
  s3.putObject(put);        
}

Although the topics we’ve looked at today are fairly dry, they’ll serve as the foundation for the Elastic Beanstalk application. The next post will look at more interesting topics like writing long running services and storing session data across servers. 

If you liked this post please follow me on Twitter  and upvote it on Hacker News

Sunday, January 29, 2012

Your Users Won't Read

As web designers or developers we often have a tendency to fall back on text to convey a message or instructions to an end user. Our thought process is that if we provide the necessary steps as written text, how can anyone mess it up? As useful as text can be, the approach is inherently flawed as text is often ignored or at best scanned, and inference or expected behaviour prove to be a more powerful force. Even if your process is straightforward, text becomes useless if the actions on the page infer something that is counter to it.

While I've grown to appreciate this fact more and more over time, it's never become completely clear until a recent discovery with one of our applications processes. The below wireframe is a rough representation of what a fragment of our page for accepting an invitation to a company site looks like. Once a user selects that they would like to accept the invitation, we ask them if they are already users of our application, or if they wish to create a new account. We all thought the process was simple enough with little room for misinterpretation or error.


After witnessing a few users interacting with this page, we discovered that there were two main problems with this approach. The first was that we relied heavily on the "Do you already have an account" statement to be read, which turned out not to be the case. The second is that we didn't expect the "Yes" and "No" buttons to be interpreted the way they were. What ultimately happened can be seen in the pseudo heatmap below.


Users completely ignored the "Do you have an account" message and interpreted the dialog box as a confirmation box. As a result, the "Yes" and "No" text in the buttons was all that was read, and people merely saw the "Yes" as a confirmation of their intention to accept. Since most users didn't have an account, upon clicking "Yes" they were asked to provide a username and password which they viewed as "select a new username and password" as opposed to "login with your username and password". This clearly caused a good deal of frustration and confusion as the system would then tell them that their login or password were invalid when in fact they had no login to begin with. As we can see, the entire process broke down because of the expectation that one particular line of text would be read before an action was taken.

Our revised and so far (we're still testing) more effective process looks something like this:


By removing the dialog box, users no longer interpret the next step as a confirmation. Furthermore, the removal of the "Yes" and "No" terms within the buttons places the focus on the "Create new" and "Sign in" keywords that carry greater meaning. Finally, we've moved the more common action of not having an account as the first option. So from this particular example the key takeaways are:
  1. Association or inference will always overpower instructional copy.
  2. People will read text but only keywords such as the first word or two in a button.
  3. You can't write your way to a good user experience
  4. Always perform usability testing!
If you liked this post please follow me on Twitter or upvote it on Hacker News

Friday, January 20, 2012

Fast File Zipping in Amazon S3


For the last little while I’ve been working on moving some of our services to Amazon Web Services, with my most recent work focusing on document storage. One of the more interesting problems has been creating zip files for documents contained within Amazon S3. Multi-file download is one of the most commonly used tools in our application so the aim is to make this process as quick as possible, so that users spend a minimum amount of time waiting for their download to begin. After experimenting with several approaches and tools, I came across a simple solution that obliterates all others with respect to compression speed and even simplicity of implementation.

Before we get to the good stuff, let’s step back and look at the typical approach to zipping files on S3. Given that S3 has no native support for doing this, the selected files must first be downloaded into an EC2 instance, and then compressed using your toolkit of choice. Our initial steps looked something like this:

  1. Submit a post to a web service to initiate the zip process along with the corresponding S3 file keys to be compressed
  2. The request is then placed in a queue so that the next available EC2 server can process it. Initially we used SQS to queue the request but its complete lack of ordered queuing rendered it quite useless. We therefore used a list in Redis to maintain an ordered queue of zip requests. Along with the entry we generate a unique zip identifier that gets returned to the client. 
  3. Once the next available EC2 instance pops the entry off the queue, it begins downloading the files from S3 and building the final zip. Once the zip is created, the local files are cleaned up and the new zip file is pushed back to S3. 
  4. The completed request is once again pushed to Redis using the previously generated zip identifier
  5. During this time, the client has been continuously polling to check if the zip has completed (you could use long polling here as well). Once the server finds the zip identifier in Redis, it responds with the S3 download link and the user can proceed to download the file. 
  6. Clean up the zip files in S3 after a day or so using a thread or S3’s new object expiry option

First off, the toolkit you use to build your zip is absolutely crucial so invest in a good one. Since our application is written in Scala, I initially started with the native Java zip tools which were painfully slow. I then moved to the Chilkat zip tools which were an order of magnitude faster as they are written in C and accessed in Java over JNI. Chilkat has implementations for nearly every popular programming language and for only $249 for a site-wide license, it’s money well spent and serves as the basis for this solution.

So now that we have a basic framework, how can we improve on this na├»ve approach? Upon inspecting the Chilkat API, I noticed the existence of a QuickAppend method which serves to append one zip to another. I began wondering how the compression time would be affected if we pre-zipped each file in S3, in its destination directory structure, and then simply appended them all together to form the final zip. To my dismay, the difference in compression time was astonishing. Small zip files in the 100kb-300kb range saw a 2x-3x speed improvement, while those larger than 10mb saw a 10x – 15x improvement. For example, a 14mb zip with 25 files varying in size from 100kb to 8mb took a mere 120ms to compress into the final zip, while building the zip from scratch took over 1.5 seconds.

The additional benefit of this is that if your users store lots of files that compress well, there’s less data to download from S3 to EC2 to create the zip in the first place. The degree of compression for the original files also affects the speed of the QuickAppend operation, whereby highly compressed files can achieve speed improvements up to 25x. Most files in my tests were moderately or only lightly compressed as they were comprised of PDF and image documents.

The obvious downside of this approach is that you have to store two copies of each file, one in its original form, and another in compressed form. In our case the speed advantages outweigh the added storage costs.

The final architecture has to change somewhat as we have to build services to zip each file at the time of the upload and store them to S3. In this case SQS is a viable solution as the uploaded files don’t have to be processed in a strict sequence. If a user should happen to download files immediately after uploading them, your compression algorithm will also have to deal with the possibility of the zipped file not being ready. The final zip implementation becomes quite trivial:

  1. Download the first pre-zipped file from S3 to your EC2 server
  2. Iterate subsequent pre-zipped files by downloading them and appending them to the first file
  3. If a pre-zipped file is not found, download the original and zip it, then append it
  4. Upload the completed zip back to S3
There you have it, $250 and a few hours of coding later, you have a quick and simple way to build on-demand zip files.

If you any other techniques for zipping files on AWS, please share them in the comments. 


If you liked this post please follow me on Twitter  or upvote it on Hacker News and Reddit

Thursday, January 19, 2012

Be Consistent


I was recently asked to go over a document storage company’s product and offer up some advice on usability, strategy, etc. During my exploration I came across some rather pronounced consistency issues that can prove deadly to customer acquisition. The more I thought about, the more I realized that such problems were not unique to this company. I therefore implore you to make your applications copy and actions consistent, or you run the risk of looking foolish. Some examples of contradiction include:
  • Stating that all content is stored securely with the highest level of encryption but at the same time not using HTTPS on your payment collection page
  • Saying that your support is the best and burying your phone number in obscure locations, or not providing a number at all
  • Claiming to be “enterprise ready” and having no API
  • Etc.

When dealing with people, how quickly do you lose respect for someone when their actions contradict their words? The same is true of your product. If you make a bold claim, ensure that everything in your application is not only consistent with it, but serves to solidify it. 

If you liked this post please follow me on Twitter 

Saturday, January 14, 2012

JavaScript Enlightenment - Reviewed


A couple of weeks ago a copy of Cody Lindley's JavaScript Enlightenment book made it onto my desk and I approached it with a good degree of skepticism. With the plethora of JavaScript books already in print, why would we need another? Helped somewhat by the books unintimidating length I dove into the first chapter and before I knew it, I was half way through. Much to my surprise, the book managed to provide a fresh perspective on a well-covered topic and was actually enjoyable to read.

Although the author states from the outset that the book is not intended for JavaScript neophytes, I somewhat disagree. While I agree that it's not for those new to programming in general, I believe that it would serve as a good introduction to the JavaScript language. Most step-by-step guides provide a high level and broad overview of JavaScript topics in order to accelerate the process by which you can write something concrete. JS Enlightenment does a good job of presenting often confusing concepts in a clear and simple manner that would make future reading more effective and increase the depth of learning. It achieves a good deal of its clarity through a copious use of well documented examples that are simple to understand but provide a great deal of insight into the language. For those already familiar with JavaScript, there's still a lot of value to be had from this book and will serve as an excellent reference if something should slip your mind.

JS Enlightenment serves up one of the clearest treatments of the prototype chain that I've seen to date, and while it leaves out complex inheritance examples, it leaves you with a strong foundation to pursue more advanced concepts through further reading. Some other topics the book is especially good at explaining include:

  • The various forms of object construction, the constructor property and the use of typeof and instancof
  • Primitive values and object conversion during use as well as object literals
  • Complex object storage and comparison
  • Scope principles and prototype chain lookup
  • Function passing and invocation, the arguments property and basic closures
  • How "this" works and how to achieve the desired object context through call and apply

All this being said, it's by no means a complete treatment on writing JavaScript applications, but the author never intended it to be. Much like a good web application, this book is good precisely because of what it leaves out. It never pretends to be more than it is and delivers on its mission of providing “a short and digestible summary of the ECMA-262, Edition 3 specification, focused on the nature of objects in JavaScript”. When paired with a more "how-to" style book like JavaScript The Good Parts, or Eloquent JavaScript, a reader will achieve a greater breadth and depth of knowledge then if left to those books alone. Unless you're a JavaScript expert, I would encourage you to pick up a copy of the book here and prepare to be enlightened.

If you liked this post please follow me on Twitter