Event: Amazon Web Services AWS Michigan Meetup 12/11/2012

Tomorrow (12/11/2012) we’ll be out at the monthly AWS Michigan meetup in Ann Arbor.  Jamie and Michael from RightBrain Networks will be presenting on some of things they’ve learned from their experiences running scalable WordPress sites on Amazon Web Services (AWS). They’ll also be talking about some of the things they learned from AWS re:Invent for those of us not lucky enough to attend.

If you’re interested in Cloud Computing, Amazon Web Services (AWS), Public Cloud, Private Cloud, OpenStack and other technologies and want to meet other people that are as well, check out the group - http://www.awsmichigan.org.

Event: AWS Michigan Meetup

Event Date: Tuesday, December 11th, 2012 @ 7:00pm

More info: http://www.awsmichigan.org/events/94452352/

Event: AWS Michigan Meetup (Presenting) – 10/09/12

Legal Informatics w/ CloudSearch & High-Performance Financial Market Apps

Solid Logic’s CEO, Eric Detterman, and CIO, Mike Bommarito, will be presenting at the AWS Michigan Meetup at Tech Brewery (map) in Ann Arbor, MI. I’ll be presenting on how we use Amazon Web Services (AWS) in the quantitative financial trading space with a case study and more.

Mike will be presenting on Legal Informatics using AWS CloudSearch. He will also be demonstrating an early prototype of an private enterprise information search and e-discovery application we’re creating. Mike also has a copy of his presentation available here.

Event Date: Tuesday, October 9th, 2012 @ 6:30pm

Event: AWS Michigan Meetup

More info: http://www.awsmichigan.org/events/85530922/

Below is a copy of my presentation so you can view at it at your convenience.

Subscribe to the Solid Logic Blog

Amazon EC2 Cloud Computing Cost Savings

This post is a long one and is part of an on-going series of some of the benefits we’ve identified in our experience in using Cloud Computing technologies, most notably Amazon Web Services (AWS) and different VMware products.

Overview

“The cloud”, specifically Amazon Web Services, has dramatically changed the landscape of High Performance Computing and Big Data Processing in recent years. Many things are computationally possible that would not have been a few short years ago. An organization can cost-effectively setup, launch and use a seemingly limitless amount of computing resources in minutes.

Most of the news media today is focused around using Hadoop on “Big Data”. SLTI has experience with this technology, but what happens if your task data set doesn’t fit nicely into this framework?? The writeup below is how we handled one such challenge.

Business Problem

The problem SLTI was trying to solve fits into the Business Intelligence/Data Mining area in the financial industry. The problem tested different inputs for an algorithm that is the basis for a quantitative equity trading system.
The algorithm had complex mathematical calculations and processing requirements across a large and diverse data set. The problem required testing a wide range of input parameters across four dimensions. The algorithm was tested across sixty-two different data sets. A summary of the size of the problem is shown to the right – We basically have to analyze 9.9 billion data points to come up with something actionable.

While the program logic is specific to the financial trading industry, it has many common concepts shared across different industries – engineering, legal services, etc. The question to ask is simple -

How many processing tasks have you had that you wished ran faster? Can they be split into multiple pieces and run in parallel? How can this be done cost-effectively?

Information Technology and Software Solution

Cloud Computing has dramatically changed the cost landscape of IT Infrastructure, especially for prototype or short run projects like this one. In a general sense, CPU cycles and RAM are cheap compared to the highly skilled labor required to improve performance by several orders of magnitude.

Our goal was simple – make the program to run a lot faster with minimal effort.

We have a large list of projects to be completed so development time is our most precious resource so we didn’t want to re-write the entire program. We kept the software changes and technology solution simple – it’s basically an 80/20 approach to setup the infrastructure and handle the code changes that still solves the problem, albeit in a less elegant fashion.

To accomplish our goal, we modified the program  to operate on a user-defined subset of the original data set. This allows the problem to be split into many small parts and spread apart across multiple servers. We can then distribute the pieces to each server to handle the processing for that subset.  

IT Infrastructure Architecture

In staying with a 80/20 simple solution first approach, we created a solution with the following pieces:

  1. Linux based application server (Amazon EC2 Amazon Machine Image (AMI), alternatively a VMware image could be created and converted to an AMI.
  2. Highly-Available, scalable, central filestore (Amazon S3)
  3. Master configuration data stored in Amazon S3

The cluster itself is comprised of sixteen cc2.8xlarge EC2 instances. Each instance has 88 Compute Units, has 2 x Intel Xeon E5-2670 processors (16 cores per instance), 60.5GB of RAM, 3370GB storage. The cluster provided 1408 Compute Units, 256 Cores and 968 GB of RAM.

The basic logic of the program goes something like this

  1. Load all required data into Amazon S3
  2. Launch the pre-configured AMI to run the program after the server launches
    • Get a specific subset of the data for the node from the central filestore
    • Update the master configuration data to notify the other nodes what data still needs to be processed before, during and after each test run.
    • Save the results to the central filestore
    • Shutdown the node after the work is completed

Cost Analysis

This is not intended to be a totally comprehensive cost comparison but rather a quick TCO comparison using some standard costs. To quickly do this, we used the AWS EC2 Cost Comparison Calculator on the bottom of this page.

SLTI’s EC2 based approach is roughly 99.5% cheaper than an in-house solution.  There are other similar examples of the ROI of an EC2 based approach for this type of workload here 

Key Takeaways

  1. Using the cloud enables a much more adaptive and scalable data processing infrastructure than in-house IT hardware.
  2. If you’re not using AWS (or something similar), you’re overpaying for IT infrastructure, especially for short run or highly variable workloads.

This post is a short overview on some of the ways we’re using advanced cloud computing technology to help our clients improve their IT agility and reduce IT expenses. We’re currently working on a few case studies that describe these concepts in more detail. To get updated with new research just sign up using the form on the right of this page.

If you’d like to explore a specific use case for your situation – please contact us

Subscribe to the Solid Logic Blog

Cloud Computing Cost Benefits – Cash Flow Impact

We’re heavy users of cloud computing technologies – primarily Amazon Web Services (AWS) and services built on top of it. The decision to embrace this technology has been very important to our client projects. There are many reasons behind this, but here is the first of the major reasons behind the decision (we’ll cover the others in later posts).

Cloud Computing Reduces Costs & Improves the Cash Flow of an Uncertain Project

Many of the projects we work on begin with some kind of prototype to confirm the client’s expectations or project goals. If the project goes well, the client then moves on to a larger roll-out and larger subsequent phases. If it doesn’t then the project ends. Because of this, there is a lot of variability and uncertainty in any estimates used to project infrastructure requirements or pricing models. Our preferred method of managing this risk is to use a variable cost compute model and leading services (IaaS, PaaS, SaaS, etc). While doing this, we develop everything to be standard compliant so we can switch to a better option in the future as the clients needs change.

Examples:

  • Recently we completed a prototype/proof of concept project with an expected internal user base of around 300 users. About a week after launch we ended up with 750+ internal users due to the popularity of the program. We were able to transparently scale up once there was sufficient demand (not before) with zero downtime. We also did this using a variable-cost model so the ‘estimation risk’ to the client was greatly reduced.
  • We designed the infrastructure for a site that hosted and displayed user submitted videos for a large internet based comedy contest. There was no accurate way to predict how popular the contest would be so we used an entirely variable-cost model to build and deploy the solution. The solution used a few different AWS This allowed the site to scale up and down based on actual demand and traffic, which we believed was a reasonable proxy for revenue.
  • We’re currently working on an advanced prototype project now that does some complex analysis on ~210GB of structured and unstructured data. Again, the project is a prototype with an uncertain future. Instead of taking a leap and estimating what size hardware to buy or making any other decisions based on limited information, we’re using Amazon Web Services to build a cluster of servers to test our product. Our projections show we’ll be able to build the cluster for about 80-90% less than the cost of a single server. This estimate doesn’t factor in other items necessary to make a complete comparison, but is a good starting point.

Conclusion

There are certainly some situations where ‘the cloud’ doesn’t make sense financially or operationally, but generally if the pricing model full factors everything in, and takes into account a measure of uncertainty, the benefit is clearly on the side of cloud services like AWS.We’ll go through an example cost model in the future to further clarify this. Until then, the AWS pricing tool should give some good ideas on pricing comparisons – especially under situations needing lots of computational power for a short time. Its pretty clear that for certain use cases, the cloud is the only way to go from a pricing standpoint.

Its a whole different animal to estimate the requirements and cost model for a well defined and modeled project. If the project you’re working on has a high variability (i.e. standard deviation) then its almost certainly more cost-effective to go with a cloud solution. Even if the costs are equal or slightly higher, we feel its cheap insurance and safety to go with a more adaptive solution.

Subscribe to the Solid Logic Blog

Great New Amazon Web Services (AWS) Announcement – DynamoDB

Solid Logic has been using Amazon Web Services (AWS) since 2008 now with great results.  Today was a big day for the AWS team.  They launched a new NoSQL service -DynamoDB- today around noon.  Like the other AWS offerings (EC2, S3, etc.) it is a scalable, variable cost service.  Here is the product listing page and other relevant info: http://aws.amazon.com/dynamodb/

Werner Vogels’ blog: http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html

Announcement Video:

http://www.youtube.com/watch?v=3I5PZv6vmZY

DynamoDB Overview Video:

We primarily use EC2, S3, CloudFront, and RDS (MySQL as a service).  AWS allows us to be much more agile and can reduce system complexity.  It removes SLTI from many of the day-to-day demands of setting up and managing physical infrastructure.  By using AWS, we are able to launch an internal or client application in minutes to hours, instead of days to weeks.

There are tons of great reasons to switch many use cases over to an AWS cloud-hosted infrastructure instead of a physical one.  In the future, we will work up our spin on a cost-benefit model for AWS.  There are many good ones available on the web, most notably here – http://aws.amazon.com/economics/.  Unfortunately, many of the ones available do not include any intangible benefits.  In this case, I am defining an intangible benefit as something that is “fuzzy” to put a price figure on and is subjective in nature.  Things like how it can affect focus, hiring practices, time-to-launch, required in-house skill sets, etc. would fit into this category.  These items can have a huge impact on the decision making process, especially for small-to-medium size firms or firms outside of concentrated technology areas with a large pool of qualified candidates.

We are excited about today’s announcement and look forward to using DynamoDB for a product that we have coming to market in 2012.  As we begin to work with it more, we will try to document our findings and include them in the blog.

Subscribe to the Solid Logic Blog