Amazon SimpleDB Batched PUTs Usage and Performance

One of the most important factors in getting optimal performance out of Amazon's SimpleDB is keeping the total number of requests to a minimum and making the most out of the ones you make. At one time this was tricky from a write perspective because only a single item could be updated in a PUT operation.

Last spring Amazon eased the pain a little by allowing us to batch PUT operations into a single command.

In order to demonstrate the use of this feature and analyze its performance I'll use C# and Amazon's .Net SimpleDB library.

Single PUTs

To establish a baseline I'm going to run a test with some sample data (Sun Micro's stock data from 1/3/05 to 7/9/2009 obtained from Yahoo! finance) and write it into a "StockPrices" domain one item at a time. This data is in CSV form and contains 1137 rows.

I conducted three trials locally and three trials on a small EC2 instance.

AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE", 
"ENTER YOUR SECRET KEY HERE");

using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
    PutAttributesRequest putRequest;
  
    /* read column names */
    stockStreamReader.ReadLine();
      
    while (!stockStreamReader.EndOfStream)
    {
        putRequest = new PutAttributesRequest();
        putRequest.DomainName = "StockPrices";

        string line = stockStreamReader.ReadLine();
        string[] tokens = line.Split(',');

        putRequest.ItemName = string.Format("{0}_{1}", tokens[0], ticker);

        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Ticker", Value = ticker });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Date", Value = tokens[0] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Open", Value = tokens[1] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "High", Value = tokens[2] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Low", Value = tokens[3] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Close", Value = tokens[4] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Volume", Value = tokens[5] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "AdjustedClose", Value = tokens[6] });

        service.PutAttributes(putRequest);
    }  
}

Resulting in the following times (in seconds):

LocalEC2
Trial 15434
Trial 24932
Trial 35732
Avg53.332.6

As you can see there was a significant improvement simply by executing the code on Amazon's equipment (minimizing connection latency) but it's hard to argue that the performance was qualitatively bad even without something to compare it to.

Batched PUTs

Then I conducted a similar test using batched PUT operations as such:

AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE", 
"ENTER YOUR SECRET KEY HERE");

using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
  BatchPutAttributesRequest batchPutRequest = new BatchPutAttributesRequest();
  batchPutRequest.DomainName = "StockPrices";

  while (!stockStreamReader.EndOfStream)
  {
    ReplaceableItem item = new ReplaceableItem();

    string line = stockStreamReader.ReadLine();
    string[] tokens = line.Split(',');

    item.ItemName = string.Format("{0}_{1}", tokens[0], ticker);

    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Ticker", Value = ticker });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Date", Value = tokens[0] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Open", Value = tokens[1] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "High", Value = tokens[2] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Low", Value = tokens[3] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Close", Value = tokens[4] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Volume", Value = tokens[5] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "AdjustedClose", Value = tokens[6] });

    batchPutRequest.Item.Add(item);

    /* Amazon limites batches to 25 items */
    if (batchPutRequest.Item.Count == 25)
    {
      service.BatchPutAttributes(batchPutRequest);
      batchPutRequest = new BatchPutAttributesRequest();
      batchPutRequest.DomainName = "StockPrices";
    }
  }

  /* send any that remain */
  if (batchPutRequest.Item.Count > 0)
    service.BatchPutAttributes(batchPutRequest);
}

resulting in:

LocalEC2
Trial 176
Trial 266
Trial 366
Avg6.36

Which was a marked improvement that effectively nullified the advantage the single writes got on EC2.

Conclusion

The overall comparison of the two approaches is as follows:

TypeLocalEC2
Single53.332.6
Batched6.36

Batched PUT operations offer a clear performance benefit regardless of where they're executed from. You just have to keep in mind you're limited to batch sizes of 25 items and 1 MB per request.

Created on 2009-07-10 12:29:15
Share on Facebook Facebook
Comment Feed
Add a Comment: (HTML not accepted. URLs will automatically be converted to links)
Body
Nickname (Login || Register)
Home Page
Email Addy(kept private)
Are you human?
Tags:
linq .Net performance sql 2008 sql server powershell indexes scripting reporting services filestream ruby ironruby entity framework EF testing .net framework 4.0 ADO.NET SSRS rs setpolicies vb cte c# podcasts webdav exchange server data warehousing Data Services Web Services Astoria jQuery database object oriented cql refactoring remoting simpledb cloud HTML GObject GNOME Vala BI couchdb django ORM python erlang functional C curl stackless concurrency Groovy Java JVM dynamic tools windows ironpython dlr systems programming go CAPTCHA appengine natural language full-text rails lucene wave clr parallel virtualization Oracle iPhone xml Objective-C Haiku security cocoa touch C++ BeOS Operating Systems Lucene monitoring Solr lisp VS 2010
Blog History:
Solrnet, a Solr Client Library for .Net - 03/08/2010
Monitoring Solr with LucidGaze - 02/21/2010
Haiku, an Open Source Continuation of BeOS - 02/10/2010
Basic Authentication with a NSURLRequest in Cocoa Touch - 01/24/2010
Asynchronous Programming in Cocoa Touch - 01/17/2010
NSXML-like XPath Support in Cocoa Touch with TouchXML - 01/03/2010
Using Solr in Django for Full-Text Searching via Solango - 01/01/2010
Using Entity Framework with Oracle - 12/22/2009
Solutions to Common VirtualBox Problems - 12/20/2009
Parallel Programming with the Task Parallel Library and PLINQ in .Net 4.0 - 12/14/2009
Clojure, A Lisp for the JVM and CLR - 12/13/2009
Google Wave Robots in Java - 12/07/2009
Employing Solr/Lucene with SQL Server for Full-Text Searching - 12/05/2009
Full-Text Indexing in Ruby Using Ferret - 11/28/2009
Home-Brewing a Full-Text Search in Google's AppEngine - 11/22/2009
Using reCAPTCHA With Django - 11/21/2009
Phat Go Code Launched - 11/19/2009
A Little More of Google's Go - 11/17/2009
First Impressions of Go, Google's New Systems Language - 11/14/2009
Scripting Your .Net Applications with IronPython - 11/03/2009
Windows Services in Python - 11/02/2009
My Tool List - 10/26/2009
Groovy: Dynamic Language for the JVM... Groovy! - 10/23/2009
Easy Concurrency with Stackless Python - 10/03/2009
C from erlang via linked-in driver - 09/16/2009
Templating with NDjango - 09/06/2009
A little bit o' Erlang - 08/23/2009
Tale of a Website, from Rails to ASP.NET to Django - 08/20/2009
Now in Django - 08/19/2009
Stored Procedures in Django - 08/09/2009
CouchDBExtension - 08/06/2009
POCO Entities in ADO.NET 4.0 - 07/30/2009
Accessing SimpleDB from SSRS - 07/22/2009
Easy GNOME Development with the Vala Programming Language - 07/16/2009
HTML Parsing with Ruby and Nokogiri - 07/12/2009
Amazon SimpleDB Batched PUTs Usage and Performance - 07/10/2009
PowerShell 2.0 Out-GridView, ISE and ScriptCmdlets - 07/05/2009
Asynchronous and remote execution with powershell 2 ctp3 - 06/30/2009
Understanding Source Code with NDepend and CQL - 06/22/2009
Object Oriented Databases with db4o - 06/07/2009
ADO.Net Data Services with jQuery - 05/29/2009
Exchange webdav automation - 05/26/2009
Podcasts - 05/26/2009
Linq to Object Performance - 05/11/2009
SQL 2008 and powershell - 01/25/2009
SQL 2008 filtered indexes - 06/11/2008
SQL 2008's table valued parameters - 05/11/2008
SQL 2008's MERGE statement - 04/22/2008
ironruby - 04/11/2008
SSRS scripting with RS.EXE - 11/20/2007
SQL 2008 FILESTREAM - 08/04/2007
CTE Concatenation - 01/01/2007