One of the most important factors in getting optimal performance out of Amazon's SimpleDB is keeping the total
number of requests to a minimum and making the most out of the ones you make.
At one time this was tricky from a write perspective because only a single item could be updated in a PUT operation.
Last spring Amazon eased the pain a little by allowing us to batch PUT operations into a single command.
In order to demonstrate the use of this feature and analyze its performance I'll use C# and Amazon's .Net SimpleDB library.
Single PUTs
To establish a baseline I'm going to run a test with some sample data (Sun Micro's stock data from 1/3/05 to 7/9/2009 obtained from Yahoo! finance) and write it into a "StockPrices" domain one item at a time. This data is in CSV form and contains 1137 rows.
I conducted three trials locally and three trials on a small EC2 instance.
AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE",
"ENTER YOUR SECRET KEY HERE");
using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
PutAttributesRequest putRequest;
/* read column names */
stockStreamReader.ReadLine();
while (!stockStreamReader.EndOfStream)
{
putRequest = new PutAttributesRequest();
putRequest.DomainName = "StockPrices";
string line = stockStreamReader.ReadLine();
string[] tokens = line.Split(',');
putRequest.ItemName = string.Format("{0}_{1}", tokens[0], ticker);
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "Ticker", Value = ticker });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "Date", Value = tokens[0] });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "Open", Value = tokens[1] });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "High", Value = tokens[2] });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "Low", Value = tokens[3] });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "Close", Value = tokens[4] });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "Volume", Value = tokens[5] });
putRequest.Attribute.Add(new ReplaceableAttribute()
{ Name = "AdjustedClose", Value = tokens[6] });
service.PutAttributes(putRequest);
}
}
Resulting in the following times (in seconds):
| Local | EC2 | |
| Trial 1 | 54 | 34 |
| Trial 2 | 49 | 32 |
| Trial 3 | 57 | 32 |
| Avg | 53.3 | 32.6 |
As you can see there was a significant improvement simply by executing the code on Amazon's equipment (minimizing connection latency) but it's hard to argue that the performance was qualitatively bad even without something to compare it to.
Batched PUTs
Then I conducted a similar test using batched PUT operations as such:
AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE",
"ENTER YOUR SECRET KEY HERE");
using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
BatchPutAttributesRequest batchPutRequest = new BatchPutAttributesRequest();
batchPutRequest.DomainName = "StockPrices";
while (!stockStreamReader.EndOfStream)
{
ReplaceableItem item = new ReplaceableItem();
string line = stockStreamReader.ReadLine();
string[] tokens = line.Split(',');
item.ItemName = string.Format("{0}_{1}", tokens[0], ticker);
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "Ticker", Value = ticker });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "Date", Value = tokens[0] });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "Open", Value = tokens[1] });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "High", Value = tokens[2] });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "Low", Value = tokens[3] });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "Close", Value = tokens[4] });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "Volume", Value = tokens[5] });
item.Attribute.Add(new ReplaceableAttribute()
{ Name = "AdjustedClose", Value = tokens[6] });
batchPutRequest.Item.Add(item);
/* Amazon limites batches to 25 items */
if (batchPutRequest.Item.Count == 25)
{
service.BatchPutAttributes(batchPutRequest);
batchPutRequest = new BatchPutAttributesRequest();
batchPutRequest.DomainName = "StockPrices";
}
}
/* send any that remain */
if (batchPutRequest.Item.Count > 0)
service.BatchPutAttributes(batchPutRequest);
}
resulting in:
| Local | EC2 | |
| Trial 1 | 7 | 6 |
| Trial 2 | 6 | 6 |
| Trial 3 | 6 | 6 |
| Avg | 6.3 | 6 |
Which was a marked improvement that effectively nullified the advantage the single writes got on EC2.
Conclusion
The overall comparison of the two approaches is as follows:
| Type | Local | EC2 |
| Single | 53.3 | 32.6 |
| Batched | 6.3 | 6 |
Batched PUT operations offer a clear performance benefit regardless of where they're executed from. You just have to keep in mind you're limited to batch sizes of 25 items and 1 MB per request.

Digg it
Reddit
Delicous
Facebook








