Chris Umbel

Accessing SimpleDB from SSRS

Amazon Web Services (AWS)Largely for my own amusement recently I became interested in the idea of accessing Amazon's SimpleDB from SQL Server Reporting Services. After searching around the web for a while I realized the only solution I'd be happy with was to write my own data processing extension for SSRS that can access SimpleDB.

For those of you not familiar with SSRS data processing extensions they're simply libraries that teach reporting services how to talk to data sources. They merely require you to implement a few interfaces such as IDbCommand, IDbConnection and IDataReader from Microsoft.ReportingServices.DataProcessing. A great introduction to developing these extensions can be found here.

After a few days of work I got a decent start on a library written in C# 2008 named SimpleDBExtension that I've published here on CodePlex. At the time of writing it's still in early development so please be forgiving (Update 2009/12/07: it's in it's first beta.). There's certainly a major refactoring on the way and several features I'd like to add.

To develop reports in Visual Studio against SimpleDB with this library there's just a few simple steps which are outlined below:

Step 1: Build and install SimpleDBExtension library

After building the SimpleDBExtension project copy the resultant SimpleDBExtension.dll into the directory Visual Studio uses for the Report Designer. On my system that is: C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies

Step 2: Modify RSReportDesigner.config

Add the following node to the data extensions section in RSReportDesigner.config which is located in the aforementioned directory.

<Extension Name="SIMPLEDB" Type="SimpleDBExtension.SimpleDBConnection,SimpleDBExtension"/>

Step 3: Modify RSPreviewPolicy.config

Add the following node to the code groups section in RSPreviewPolicy.config which is also located in the aforementioned directory. You'll have to modify the URI attribute to reflect the directory you're working in.

<CodeGroup class="UnionCodeGroup"
   version="1"
   PermissionSetName="FullTrust"
   Name="SimpleDBExtension"
   Description="Code group for SimpleDb data processing extension">
	  <IMembershipCondition class="UrlMembershipCondition"
		 version="1"
		 Url="C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\SimpleDBExtension.dll"
	   />
</CodeGroup>

Step 4: Develop

Since Visual Studio's Report Designer is prepared we can get down to business. Get started in the typical fashion: create a report project and a report.

Here's where it gets different. An additional data source type will now be available named "SIMPLEDB" which will employ the extension you just installed. Choose that and and enter your access key and secret separated by a semicolon in the connection string.

Now you can create a DataSet using the newly created Data Source. Supply a SimpleDB select query and you're all set. Your SimpleDB data is now accessible to SSRS.

Next Steps

Thus far we've just installed the extension into Visual Studio but this process can be used to deploy the extension to a Report Server.

Again, at the time of writing this project is just a few days old. Any help, guidance, or comments are appreciated.

Wed Jul 22 2009 19:07:46 GMT+0000 (UTC)

Comments

Easy GNOME Development with the Vala Programming Language

GNOMEI've generally not had a reason to work with the GObject type system despite appreciating its fruits through GNOME for years. Then the other day I ran across a language called Vala which intrigued me enough to start hacking away.

Vala's claim to fame is that it simplifies GObject development by exposing it in a C#/Java like language. Unlike C# and Java Vala is translated to C and then compiled to a native binary. Presumably this leads to performant execution and a tight memory footprint compared to CLI and Java bytecode.

The GObject type system and Vala are new to me so I'm in no position to kick knowledge, but I'll share some of what I've written early in my learning process.

Example 1 Hello World:

using GLib;

public class HelloWorld : Object {
       public void run() {
              stdout.printf("Hello World\n");
       }

       public static int main(string[] args) {
              HelloWorld hellower = new HelloWorld();
              hellower.run();
              return 0;
       }
}

producing the output:

Hello World

Example 2 Getting Twitter Status XML:

using GLib;

public class Twitter {
  static int main (string[] args) {
    /* get the username from the command line */
    string username = args[1];

    /* format the URL to use the username as the filename */
    string url = "http://twitter.com/users/%s.xml".printf(username);

    stdout.printf("Getting status for %s\n".printf(username));

    /* create an HTTP session to twitter */
    Soup.SessionAsync session = new Soup.SessionAsync();
    Soup.Message message = new Soup.Message ("GET", url);

    /* send the HTTP request */
    session.send_message(message);

    /* output the XML to stdout */
    stdout.printf(message.response_body.data);

    return 0;
  }
}

This example will retrieves a twitter user's status via the REST API and outputs the XML response to the console. The Soup library is employed for the HTTP communication.

Example 3 XML parsing and XPath queries:

/* parse the xml into a document object */
Xml.Doc* status_doc = Parser.parse_memory(
  message.response_body.data,
  (int)message.response_body.length);

/* create the basic plumbing for XPath */
XPathContext* xpath = new XPathContext(status_doc);

/* execute an xpath query */
XPathObject* result = xpath->eval_expression("/user/status/text");

/* slap the result in a string */
string status = result->nodesetval->item(0)->get_content();

stdout.printf("%s\n", status);

The above code could be grafted into example 2 (requires slapping in a "using Xml;" directive) and would actually pull the status out of the XML response.

Conclusion

Vala is a young language, but an interesting one. It certainly seems like it could make native GNOME development a bit more accessible to C#/Java developers.

A few things that justify it's use over C++ that I haven't covered here are its support for "modern" features such as assisted memory management, the foreach construct, and exception handling.

Thu Jul 16 2009 17:36:47 GMT+0000 (UTC)

Comments

HTML Parsing with Ruby and Nokogiri

RubyParsing HTML is a frequent and somewhat annoying task programmers are commissioned with occasionally. Activities such as screen-scraping have become rare since the advent of RSS, but still... There's always content out there that you have to get at that leaves you no choice but to parse it out yourself.

One of the more elegant bits that I've seen for this purpose is Nokogiri which is a Ruby library that supports querying HTML content by both an XPath and CSS selector syntax.

XPath

First I'll demonstrate how to parse some content out of a page via the XPath syntax. This code uses the ruby documentation for the Bignum class as a parsing medium and essentially extracts the method names.

require 'nokogiri' 
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.ruby-doc.org/core/classes/Bignum.html'))

doc.xpath('//span[@class="method-name"]').each do | method_span |
	puts method_span.content
	puts method_span.path
	puts
end

The above code simply iterates through a set of Node objects that represent every span tag with the CSS class "method-name" applied. It prints out the inner text and absolute XPath via the "content" and "path" properties respectively. Below is a sample of the output:

power!
/html/body/div[3]/div/div[24]/div[1]/span[1]

big.quo(numeric) => float
/html/body/div[3]/div/div[25]/div[1]/a/span

quo
/html/body/div[3]/div/div[26]/div[1]/a/span[1]

rdiv
/html/body/div[3]/div/div[27]/div[1]/span[1]

big.remainder(numeric)    => number
/html/body/div[3]/div/div[28]/div[1]/a/span

rpower
/html/body/div[3]/div/div[29]/div[1]/a/span[1]

CSS

Nokogiri also supports querying by way of CSS selector syntax. The following example iterates over every link that displays a javascript popup in the Bignum document used above and outputs its absolute css selector path and the text of the "onclick" attribute.

doc.css('a[onclick]').each do | popup_link |
  puts popup_link.css_path
  puts popup_link.attributes['onclick']
end

Practical

A real life use of this library and HTML parsing in general is Anemone which is a web spidering framework for Ruby. Like most things in Ruby it's programmer friendly and delivers quite a bit of power without much work.

The following Anemone example uses Nokogiri under the covers to crawl all links on this site and print out the URLs of articles.

require 'anemone'
require 'open-uri'

# crawl this page
Anemone.crawl("http://www.chrisumbel.com") do | anemone |
  # only process pages in the article directory
  anemone.on_pages_like(/article\/[^?]*$/) do | page |
    puts "#{page.url} indexed."
  end
end

Also, the WebRat DSL (which powers the Cucumber web acceptance testing framework) employs Nokogiri.

Conclusion

While the need for screen-scraping and HTML parsing has diminished over time the need still exists. It's nice to know that when we do have to do it the process is made simple by libraries like Nokogiri.

Sun Jul 12 2009 11:07:11 GMT+0000 (UTC)

Comments

Amazon SimpleDB Batched PUTs Usage and Performance

One of the most important factors in getting optimal performance out of Amazon's SimpleDB is keeping the total number of requests to a minimum and making the most out of the ones you make. At one time this was tricky from a write perspective because only a single item could be updated in a PUT operation.

Last spring Amazon eased the pain a little by allowing us to batch PUT operations into a single command.

In order to demonstrate the use of this feature and analyze its performance I'll use C# and Amazon's .Net SimpleDB library.

Single PUTs

To establish a baseline I'm going to run a test with some sample data (Sun Micro's stock data from 1/3/05 to 7/9/2009 obtained from Yahoo! finance) and write it into a "StockPrices" domain one item at a time. This data is in CSV form and contains 1137 rows.

I conducted three trials locally and three trials on a small EC2 instance.

AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE", 
"ENTER YOUR SECRET KEY HERE");

using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
    PutAttributesRequest putRequest;
  
    /* read column names */
    stockStreamReader.ReadLine();
      
    while (!stockStreamReader.EndOfStream)
    {
        putRequest = new PutAttributesRequest();
        putRequest.DomainName = "StockPrices";

        string line = stockStreamReader.ReadLine();
        string[] tokens = line.Split(',');

        putRequest.ItemName = string.Format("{0}_{1}", tokens[0], ticker);

        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Ticker", Value = ticker });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Date", Value = tokens[0] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Open", Value = tokens[1] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "High", Value = tokens[2] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Low", Value = tokens[3] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Close", Value = tokens[4] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "Volume", Value = tokens[5] });
        putRequest.Attribute.Add(new ReplaceableAttribute() 
            { Name = "AdjustedClose", Value = tokens[6] });

        service.PutAttributes(putRequest);
    }  
}

Resulting in the following times (in seconds):

LocalEC2
Trial 15434
Trial 24932
Trial 35732
Avg53.332.6

As you can see there was a significant improvement simply by executing the code on Amazon's equipment (minimizing connection latency) but it's hard to argue that the performance was qualitatively bad even without something to compare it to.

Batched PUTs

Then I conducted a similar test using batched PUT operations as such:

AmazonSimpleDB service = new AmazonSimpleDBClient("ENTER YOUR KEY HERE", 
"ENTER YOUR SECRET KEY HERE");

using (stockStreamReader = new StreamReader(File.OpenRead(@"JAVA.csv")))
{
  BatchPutAttributesRequest batchPutRequest = new BatchPutAttributesRequest();
  batchPutRequest.DomainName = "StockPrices";

  while (!stockStreamReader.EndOfStream)
  {
    ReplaceableItem item = new ReplaceableItem();

    string line = stockStreamReader.ReadLine();
    string[] tokens = line.Split(',');

    item.ItemName = string.Format("{0}_{1}", tokens[0], ticker);

    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Ticker", Value = ticker });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Date", Value = tokens[0] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Open", Value = tokens[1] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "High", Value = tokens[2] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Low", Value = tokens[3] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Close", Value = tokens[4] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "Volume", Value = tokens[5] });
    item.Attribute.Add(new ReplaceableAttribute() 
      { Name = "AdjustedClose", Value = tokens[6] });

    batchPutRequest.Item.Add(item);

    /* Amazon limites batches to 25 items */
    if (batchPutRequest.Item.Count == 25)
    {
      service.BatchPutAttributes(batchPutRequest);
      batchPutRequest = new BatchPutAttributesRequest();
      batchPutRequest.DomainName = "StockPrices";
    }
  }

  /* send any that remain */
  if (batchPutRequest.Item.Count > 0)
    service.BatchPutAttributes(batchPutRequest);
}

resulting in:

LocalEC2
Trial 176
Trial 266
Trial 366
Avg6.36

Which was a marked improvement that effectively nullified the advantage the single writes got on EC2.

Conclusion

The overall comparison of the two approaches is as follows:

TypeLocalEC2
Single53.332.6
Batched6.36

Batched PUT operations offer a clear performance benefit regardless of where they're executed from. You just have to keep in mind you're limited to batch sizes of 25 items and 1 MB per request.

Fri Jul 10 2009 12:29:15 GMT+0000 (UTC)

Comments
< 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >
Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google