Chris Umbel

Amazon SimpleDB Batched PUTs Usage and Performance

One of the most important factors in getting optimal performance out of Amazon's SimpleDB is keeping the total number of requests to a minimum and making the most out of the ones you make. At one time this was tricky from a write perspective because only a single item could be updated in a PUT operation.

Last spring Amazon eased the pain a little by allowing us to batch PUT operations into a single command.

In order to demonstrate the use of this feature and analyze its performance I'll use C# and Amazon's .Net SimpleDB library.

Single PUTs

To establish a baseline I'm going to run a test with some sample data (Sun Micro's stock data from 1/3/05 to 7/9/2009 obtained from Yahoo! finance) and write it into a "StockPrices" domain one item at a time. This data is in CSV form and contains 1137 rows.

I conducted three trials locally and three trials on a small EC2 instance.

AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE", 
"ENTER YOUR SECRET KEY HERE");

using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
    PutAttributesRequest putRequest;
  
    /* read column names */
    stockStreamReader.ReadLine();
      
    while (!stockStreamReader.EndOfStream)
    {
        putRequest = new PutAttributesRequest();
        putRequest.DomainName = "StockPrices";

        string line = stockStreamReader.ReadLine();
        string[] tokens = line.Split(',');

        putRequest.ItemName = string.Format("{0}_{1}", tokens[0], ticker);

        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Ticker", Value = ticker });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Date", Value = tokens[0] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Open", Value = tokens[1] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "High", Value = tokens[2] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Low", Value = tokens[3] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Close", Value = tokens[4] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Volume", Value = tokens[5] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "AdjustedClose", Value = tokens[6] });

        service.PutAttributes(putRequest);
    }  
}

Resulting in the following times (in seconds):

LocalEC2
Trial 15434
Trial 24932
Trial 35732
Avg53.332.6

As you can see there was a significant improvement simply by executing the code on Amazon's equipment (minimizing connection latency) but it's hard to argue that the performance was qualitatively bad even without something to compare it to.

Batched PUTs

Then I conducted a similar test using batched PUT operations as such:

AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE", 
"ENTER YOUR SECRET KEY HERE");

using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
  BatchPutAttributesRequest batchPutRequest = new BatchPutAttributesRequest();
  batchPutRequest.DomainName = "StockPrices";

  while (!stockStreamReader.EndOfStream)
  {
    ReplaceableItem item = new ReplaceableItem();

    string line = stockStreamReader.ReadLine();
    string[] tokens = line.Split(',');

    item.ItemName = string.Format("{0}_{1}", tokens[0], ticker);

    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Ticker", Value = ticker });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Date", Value = tokens[0] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Open", Value = tokens[1] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "High", Value = tokens[2] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Low", Value = tokens[3] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Close", Value = tokens[4] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Volume", Value = tokens[5] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "AdjustedClose", Value = tokens[6] });

    batchPutRequest.Item.Add(item);

    /* Amazon limites batches to 25 items */
    if (batchPutRequest.Item.Count == 25)
    {
      service.BatchPutAttributes(batchPutRequest);
      batchPutRequest = new BatchPutAttributesRequest();
      batchPutRequest.DomainName = "StockPrices";
    }
  }

  /* send any that remain */
  if (batchPutRequest.Item.Count > 0)
    service.BatchPutAttributes(batchPutRequest);
}

resulting in:

LocalEC2
Trial 176
Trial 266
Trial 366
Avg6.36

Which was a marked improvement that effectively nullified the advantage the single writes got on EC2.

Conclusion

The overall comparison of the two approaches is as follows:

TypeLocalEC2
Single53.332.6
Batched6.36

Batched PUT operations offer a clear performance benefit regardless of where they're executed from. You just have to keep in mind you're limited to batch sizes of 25 items and 1 MB per request.

Fri Jul 10 2009 12:29:15 GMT+0000 (UTC)

Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google