Amazon EC2 Cloud Computing Cost Savings

This post is a long one and is part of an on-going series of some of the benefits we’ve identified in our experience in using Cloud Computing technologies, most notably Amazon Web Services (AWS) and different VMware products.

Overview

“The cloud”, specifically Amazon Web Services, has dramatically changed the landscape of High Performance Computing and Big Data Processing in recent years. Many things are computationally possible that would not have been a few short years ago. An organization can cost-effectively setup, launch and use a seemingly limitless amount of computing resources in minutes.

Most of the news media today is focused around using Hadoop on “Big Data”. SLTI has experience with this technology, but what happens if your task data set doesn’t fit nicely into this framework?? The writeup below is how we handled one such challenge.

Business Problem

The problem SLTI was trying to solve fits into the Business Intelligence/Data Mining area in the financial industry. The problem tested different inputs for an algorithm that is the basis for a quantitative equity trading system.
The algorithm had complex mathematical calculations and processing requirements across a large and diverse data set. The problem required testing a wide range of input parameters across four dimensions. The algorithm was tested across sixty-two different data sets. A summary of the size of the problem is shown to the right – We basically have to analyze 9.9 billion data points to come up with something actionable.

While the program logic is specific to the financial trading industry, it has many common concepts shared across different industries – engineering, legal services, etc. The question to ask is simple -

How many processing tasks have you had that you wished ran faster? Can they be split into multiple pieces and run in parallel? How can this be done cost-effectively?

Information Technology and Software Solution

Cloud Computing has dramatically changed the cost landscape of IT Infrastructure, especially for prototype or short run projects like this one. In a general sense, CPU cycles and RAM are cheap compared to the highly skilled labor required to improve performance by several orders of magnitude.

Our goal was simple – make the program to run a lot faster with minimal effort.

We have a large list of projects to be completed so development time is our most precious resource so we didn’t want to re-write the entire program. We kept the software changes and technology solution simple – it’s basically an 80/20 approach to setup the infrastructure and handle the code changes that still solves the problem, albeit in a less elegant fashion.

To accomplish our goal, we modified the program  to operate on a user-defined subset of the original data set. This allows the problem to be split into many small parts and spread apart across multiple servers. We can then distribute the pieces to each server to handle the processing for that subset.  

IT Infrastructure Architecture

In staying with a 80/20 simple solution first approach, we created a solution with the following pieces:

  1. Linux based application server (Amazon EC2 Amazon Machine Image (AMI), alternatively a VMware image could be created and converted to an AMI.
  2. Highly-Available, scalable, central filestore (Amazon S3)
  3. Master configuration data stored in Amazon S3

The cluster itself is comprised of sixteen cc2.8xlarge EC2 instances. Each instance has 88 Compute Units, has 2 x Intel Xeon E5-2670 processors (16 cores per instance), 60.5GB of RAM, 3370GB storage. The cluster provided 1408 Compute Units, 256 Cores and 968 GB of RAM.

The basic logic of the program goes something like this

  1. Load all required data into Amazon S3
  2. Launch the pre-configured AMI to run the program after the server launches
    • Get a specific subset of the data for the node from the central filestore
    • Update the master configuration data to notify the other nodes what data still needs to be processed before, during and after each test run.
    • Save the results to the central filestore
    • Shutdown the node after the work is completed

Cost Analysis

This is not intended to be a totally comprehensive cost comparison but rather a quick TCO comparison using some standard costs. To quickly do this, we used the AWS EC2 Cost Comparison Calculator on the bottom of this page.

SLTI’s EC2 based approach is roughly 99.5% cheaper than an in-house solution.  There are other similar examples of the ROI of an EC2 based approach for this type of workload here 

Key Takeaways

  1. Using the cloud enables a much more adaptive and scalable data processing infrastructure than in-house IT hardware.
  2. If you’re not using AWS (or something similar), you’re overpaying for IT infrastructure, especially for short run or highly variable workloads.

This post is a short overview on some of the ways we’re using advanced cloud computing technology to help our clients improve their IT agility and reduce IT expenses. We’re currently working on a few case studies that describe these concepts in more detail. To get updated with new research just sign up using the form on the right of this page.

If you’d like to explore a specific use case for your situation – please contact us

Subscribe to the Solid Logic Blog

RinFinance Conference Thoughts

Thursday and Friday of this week our new CIO, Michael Bommarito and I were in Chicago for the R in Finance conference. The conference was great, with a lot of very well done presentations. The R project has come a long way and is growing quickly. We use the R language for some of the more complex analytical projects we work on with great results. While the conference subject matter was very specific to the quantitative finance field, there were many things that carry over into other fields. A couple of the main ones are below:

Business Intelligence Labor Market

The Business Intelligence (BI) labor pool for people with skills in Big Data, Business Intelligence and enterprise level statistics and analytics area looks to be small, relative to other IT fields, and seems to be centered within specific geographic areas (NYC, Chicago, Silicon Valley, etc.). The skilled people are in high demand now and wages are up across the board. If your company is outside of these areas or not a large operation with a large budget for BI your options are fairly limited and generally priced accordingly. This tightness is in part due to the large growth this area is seeing, but also the newness of the field and the speed of innovation currently seen.

Commercial Open Source Software

The R project, being an open source project has a wide spectrum of users. At the conference there were people from large, leading hedge funds and investment firms to students and hobbyists. This wide user base seems to be unique to open source software and even more unique is the fact that it all seems to work so well and support different requirements well. Quick example….a student doesn’t have the money to pay for an enterprise level BI software package so he/she doesn’t use it. He/she may use a free or open source alternative product. The enterprise customer generally won’t use an open source product without dedicated technical support or  community support, unless they have in-house talent to fix issues that may come up. This can leave the software project in a bind because of the disparity between the users and then people that would pay money to sustain development.

One solution to this is commercial open source software. Some easy examples of this are Red Hat Linux and MySQL (now part of Oracle). The commercial company gets a potentially large set of available labor to develop and work on the products and the students have a job market open for them with their new skills. The end client ends up with a additional layers of ‘insurance’ and support – from their chosen consulting company, the commercial company supporting the product, and the original the large open source community of free users – generally at a lower cost than in-house developed or proprietary solutions.

This is likely one of the reasons that we’ve seen large growth in the market share of commercial open source software in recent years.

Image courtesy of http://www.flickr.com/photos/opensourceway/5392982007/sizes/o/in/photostream/

Subscribe to the Solid Logic Blog

Big Data: Wall Street & Technology Article

Summary:

‘Big Data’ holds big promise for Wall Street (and everyone else), but also comes with some complications and steep learning curve and a very tight labor market for people with these skills.

SLTI Commentary:

There are a lot of lessons that can be transferred from Wall Street firms into other industries. Of course the subject matter is going to be different, but the methods used to pull actionable outcomes or architect a technology solution are not all that different.  The challenge we see a lot of companies facing is how to get started down an analytical path. This is a daunting task since all of the talk in the industry is around functionality and feature sets. The tools available are very capable, but are generally racing ahead of the buyer’s ability to internalize the changes the tools and metrics recommend.

Source:

http://www.wallstreetandtech.com/articles/232901529