Chris Umbel

Natural Language Processing in node.js with "natural"

Over the last few years I've developed a bit of an interest in natural-language processing. It's never been the focus of my work, but when you're exposed to as many enterprise-class data storage/search systems as I have you have no choice but to absorb some details. Several hobby projects, sometimes involving home-brewed full-text searching, have also popped up requiring at least a cursory understanding of stemming and phonetic algorithms. Another recurring theme in my hobby projects has been classification for custom spam filtering and analyzing twitter sentiment. node.js logo

In general, accomplishing these goals simply required the use of someone else's hard work, wether it be having Solr/ Lucene to stem my corpora at the office, using the Ruby classifier gem to analyze tweets about stocks or using the Python Natural Language Toolkit for... Well, pretty much anything.

Recent months have brought a new platform into my hobby work, node.js, which, while stable, still has maturing to do. Like so many things I work with anymore the need for natural-language facilities arose and I found the pickings pretty slim. I have to be honest. That's *exactly* what I was hoping for; an opportunity to sink my teeth into the algorithms themselves.

Thus I began work on "natural", a module of natural languages algorithms for node.js. The idea is loosely based on the Python NLTK in that all algorithms are in the same package, however it will likely never be anywhere near as complete. I'd be lucky for "natural" to ever do 1/2 of what NLTK does without plenty of help. As of version 0.0.17 it has two stemmers (Porter and Lancaster), one classifier (Naive Bayes), two phonetic algorithms (Metaphone and SoundEx) and an inflector.

The strategy was to cast a wide enough net to see how the algorithms might fit together in terms of interface and dependancies first. Making them performant and perfectly accurate is step two, which admittedly will still require some work. At the time of writing "natural" is in version 0.0.17 and everything seems to work (not in an official beta of any kind) but until the version ticks 0.1.0 it's subject to significant internal change. Hopefully the interfaces will stay the same.

With the exception of the Naive Bayes classifier (to which you can supply tokens of your own stemming) all of these algorithms have no real applicability outside of English. This is a problem I'd like to rectify after solidifying a 0.1.0 release and would love to get some more people involved to accomplish it.

Installing

In order to use "natural" you have to install it... naturally. Like most node modules "natural" is packaged up in an NPM and can be install from the command line as such:

npm install natural

If you want to install from source (which can be found here on github), pull it and install the npm from the source directory.

git clone git://github.com/NaturalNode/natural.git
cd natural
npm install .

Stemming

The first class of algorithms I'd like to outline is stemming. As stated above the Lancaster and Porter algorithms are supported as of 0.0.17. Here's a basic example of stemming a word with a Porter Stemmer.

var natural = require('natural'),
    stemmer = natural.PorterStemmer;

var stem = stemmer.stem('stems');
console.log(stem);
stem = stemmer.stem('stemming');
console.log(stem);
stem = stemmer.stem('stemmed');
console.log(stem);
stem = stemmer.stem('stem');
console.log(stem);

Above I simply required-up the main "natural" module and grabbed the PorterStemmer sub-module from within. Calling the "stem" function takes an arbitrary string and returns the stem. The above code returns the following output:

stem
stem
stem
stem

For convenience stemmers can patch String with methods to simplify the process by calling the attach method. String objects will then have a stem method.

stemmer.attach();
stem = 'stemming'.stem();
console.log(stem);

Generally you'd be interested in stemming an entire corpus. The attach method provides a tokenizeAndStem method to accomplish this. It breaks the owning string up into an array of strings, one for each word, and stems them all. For example:

var stems = 'stems returned'.tokenizeAndStem();
console.log(stems);

produces the output:

[ 'stem', 'return' ]

Note that the tokenizeAndStem method will omit certain words by default that are considered irrelevant (stop words) from the return array. To instruct the stemmer to not omit stop words pass a true in to tokenizeAndStem for the keepStops parameter. Consider:

console.log('i stemmed words.'.tokenizeAndStem());
console.log('i stemmed words.'.tokenizeAndStem(true));
outputting:
[ 'stem', 'word' ]
[ 'i', 'stem', 'word' ]

All of the code above would also work with a Lancaster stemmer by requiring the LancasterStemmer module instead, like:

var natural = require('natural'),
    stemmer = natural.LancasterStemmer;

Of course the actual stems produced could be different depending on the algorithm chosen.

Phonetics

Phonetic algorithms are also provided to determine what words sound like and compare them accordingly. The old (and I mean old... like 1918 old) SoundEx and the more modern Metaphone algorithm are supported as of 0.0.17.

The following example compares the string "phonetics" and the intentional misspelling "fonetix" and determines they sound alike according to the Metaphone algorithm.

var natural = require('natural'),
    phonetic = natural.Metaphone;

var wordA = 'phonetics';
var wordB = 'fonetix';

if(phonetic.compare(wordA, wordB))
    console.log('they sound alike!');

The raw code the phonetic algorithm produces can be retrieved with the process method:

var phoneticCode = phonetic.process('phonetics');
console.log(phoneticCode);

resulting in:

FNTKS

Like the stemming implementations the phonetic modules have an attach method that patches String with shortcut methods, most notably soundsLike for comparison:

phonetic.attach();

if(wordA.soundsLike(wordB))
    console.log('they sound alike!');

attach also patches in a phonetics and tokenizeAndPhoneticize methods to retrieve the phonetic code for a single word and an entire corpus respectively.

console.log('phonetics'.phonetics());
console.log('phonetics rock'.tokenizeAndPhoneticize());

which outputs:

FNTKS
[ 'FNTKS', 'RK' ]

The above could could also use SoundEx by substituting the following in for the require.

var natural = require('natural'),
    phonetic = natural.SoundEx;

Inflector

Basic inflectors are in place to convert nouns between plural and singular forms and to turn integers into string counters (i.e. '1st', '2nd', '3rd', '4th 'etc.).

The following example converts the word "radius" into its plural form "radii".

var natural = require('natural'),
    nounInflector = new natural.NounInflector();

var plural = nounInflector.pluralize('radius');
console.log(plural);

Singularization follows the same pattern as is illustrated in the following example wich converts the word "beers" to its singular form, "beer".

var singular = nounInflector.singularize('beers');
console.log(singular);

Just like the stemming and phonetic modules an attach method is provided to patch String with shortcut methods.

nounInflector.attach();
console.log('radius'.pluralizeNoun());
console.log('beers'.singularizeNoun()); 

A NounInflector instance can do custom conversion if you provide expressions via the addPlural and addSingular methods. Because these conversion aren't always symmetric (sometimes more patterns may be required to singularize forms than pluralize) there needn't be a one-to-one relationship between addPlural and addSingular calls.

nounInflector.addPlural(/(code|ware)/i, '$1z');
nounInflector.addSingular(/(code|ware)z/i, '$1');

console.log('code'.pluralizeNoun());
console.log('ware'.pluralizeNoun());

console.log('codez'.singularizeNoun());
console.log('warez'.singularizeNoun());

which would result in:

codez
warez
code
ware

Here's an example of using the CountInflector module to produce string counter for integers.

var natural = require('natural'),
    countInflector = natural.CountInflector;

console.log(countInflector.nth(1));
console.log(countInflector.nth(2));
console.log(countInflector.nth(3));
console.log(countInflector.nth(4));
console.log(countInflector.nth(10));
console.log(countInflector.nth(11));
console.log(countInflector.nth(12));
console.log(countInflector.nth(13));
console.log(countInflector.nth(100));
console.log(countInflector.nth(101));
console.log(countInflector.nth(102));
console.log(countInflector.nth(103));
console.log(countInflector.nth(110));
console.log(countInflector.nth(111));
console.log(countInflector.nth(112));
console.log(countInflector.nth(113));

producing:

1st
2nd
3rd
4th
10th
11th
12th
13th
100th
101st
102nd
103rd
110th
111th
112th
113th

Classification

At the moment classification is supported only by the Naive Bayes algorithm. There are two basic steps involved in using the classifier: training and classification.

The following example requires-up the classifier and trains it with data. The train method accepts an array of objects containing the name of the classification and the sample corpus.

var natural = require('natural'),
classifier = new natural.BayesClassifier();
classifier.addDocument("my unit-tests failed.", 'software');
classifier.addDocument("tried the program, but it was buggy.", 'software');
classifier.addDocument("the drive has a 2TB capacity.", 'hardware');
classifier.addDocument("i need a new power supply.", 'hardware');
classifier.train();

By default the classifier will tokenize the corpus and stem it with a LancasterStemmer. You can use a PorterStemmer by passing it in to the BayesClassifier constructor as such:

var natural = require('natural'),
stemmer = natural.PorterStemmer,
classifier = new natural.BayesClassifier(stemmer);

With the classifier trained it can now classify documents via the classify method:

console.log(classifier.classify('did the tests pass?'));
console.log(classifier.classify('did you buy a new drive?'));

resulting in the output:

software
hardware

Similarly the classifier can be trained on arrays rather than strings, bypassing tokenization and stemming. This allows the consumer to perform custom tokenization and stemming if any at all. This is especially useful in a non-natural language scenario.

classifier.addDocument( ['unit', 'test'], 'software');
classifier.addDocument( ['bug', 'program'], 'software');
classifier.addDocument(['drive', 'capacity'], 'hardware');
classifier.addDocument(['power', 'supply'], 'hardware');

classifier.train();

It's possible to persist and recall the results of a training via the save method:

var natural = require('natural'),
classifier = new natural.BayesClassifier();

classifier.addDocument( ['unit', 'test'], 'software');
classifier.addDocument( ['bug', 'program'], 'software');
classifier.addDocument(['drive', 'capacity'], 'hardware');
classifier.addDocument(['power', 'supply'], 'hardware');

classifier.train();

classifier.save('classifier.json', function(err, classifier) {
    // the classifier is saved to the classifier.json file!
 });

The training could then be recalled later with the load method:

var natural = require('natural'),
    classifier = new natural.BayesClassifier();

natural.BayesClassifier.load('classifier.json', null, function(err, classifier) {
    console.log(classifier.classify('did the tests pass?'));
});

Conclusion

This concludes the current state of "natural". Like I said in the introduction, there are certainly potential improvements in both terms of accuracy and performance. Now that 0.0.17 has been released features are frozen while I focus on improving both for 0.1.0.

Post-0.1.0 I intend to make "natural" more complete; slowly staring to match the NLTK with additional algorithms of all classifications and hopefully for additional languages. For that I humbly ask assistance:)

Sun May 22 2011 22:29:06 GMT+0000 (UTC)

Comments

Mirah, Ruby Syntax on the JVM

Although languages like Java and C# have soured with me over the last few years I still believe their runtimes (the JVM and CLR respectively) are sound. A sizable portion of the code I write for my day job is in JRuby. We get a number of advantages from that. We can use the industrial-strength infrastructure components Java brings to the table and leverage mountains of existing, production-quality java libraries. We can also benefit from the expressiveness and dynamism of Ruby. carbungles

Still, JRuby isn't perfect. While its performance is fine for the average application or script you'd have a hell of a time writing something like, say, Lucene in JRuby as it's impractical from a performance point of view.

I suppose the lesson there is that Java is the systems language of the JVM and JRuby/Jython/Groovy are fine for abstract implementations.

There is a specific alternative to the Java language, though, that's piqued my interest lately. It was not only born from JRuby but employs JRuby in its compilation: Mirah.

Mirah is developed primarily by Charles Nutter, a principal developer of JRuby, and is an attempt applying Ruby syntax at the raw JVM. It's statically-typed, equally performant to Java and doesn't require a runtime library a la JRuby.

Basically the idea is that you could use it as a drop-in replacement for the Java language itself.

Here I'll walk you through some examples I wrote while acquainting myself with Mirah.

Example 1: Hello, World!

Ah, the canonical "Hello, World!" example. This is very Ruby-like. No class, no static main method, just good old "puts". The compiled bytecode will include a class and static main method, but Mirah takes care of that for us.

puts "hello, World!"

Example 2: Java Objects

Now this will look a little more Java-like and include familiar classes to a Java developer. The actual iteration looks far more Ruby-like, however.

import java.util.ArrayList

list = ArrayList.new([3, 9, 5])

list.each do |x|
  puts x
end

Example 3: Basic Class

Of course this is the JVM we're talking about so classes are a core component as is demonstrated here. This is also the first example that shows the strong-typing in action. The "name" parameter of the constructor is followed by ":String" indicating that it's of type java.lang.String.

class Person
  def name
    @name
  end

  def initialize(name:String)
    @name = name
  end
end

person = Person.new('Gerald McGoo')
puts person.name

Example 4: Inheritance

Inheritance follows the typical Ruby syntax.

class Person
  def name
    @name
  end

  def initialize(name:String)
    @name = name
  end
end

class Programmer < Person
  def favorite_language
    @favorite_language
  end

  def initialize(name:String, favorite_language:String)
    super(name)
    @favorite_language = favorite_language
  end
end

programmer = Programmer.new('Gerald McGoo', 'Mirah')
puts "#{programmer.name} loves #{programmer.favorite_language}"

Example 5: Swing and Blocks

This example, while demonstrating swing, illustrates a much more important point. See the ruby-block-ish implementation of the action listener closure? Now that's clean! Also note that clicked_count is not final.

This is also a good example to demonstrate the type inference of Mirah. Sure, Mirah is strongly-typed, but I'm not explicitly declaring the type of "frame". Mirah sees that I'm assigning it to a new instance of JFrame and goes with that.

import javax.swing.JFrame
import javax.swing.JButton
import javax.swing.JOptionPane

frame = JFrame.new 'Click Counter'
frame.setSize 200, 100
frame.setVisible true

button = JButton.new 'Click me'
frame.add button
clicked_count = 0

button.addActionListener do |e|
  clicked_count += 1
  JOptionPane.showMessageDialog nil, String.valueOf(clicked_count)
end

frame.show

Performance Comparison

As I stated above one of the primary intentions of Mirah was to maintain an identical performance profile to Java. I admit my attempt to benchmark it here isn't overly scientific and is rather rough but certainly illustrates the point that bytecode resultant from Mirah performs similarly to strait Java.

Also here you'll see some of the more strongly-typed characteristics of Mirah such as casting [(type)variableName in Java becomes type(variable) in Mirah].

These examples essentially perform internet checksums quite crudely on 8 byte chunks of a 10MB data file storing the results in an ArrayList. A little computation, a little IO (probably too much) and a little usage of typical Java collections.

Java

import java.io.FileInputStream;
import java.util.ArrayList;

public class Main {
    public static int inetChecksum(byte buff[]) {
        long sum = 0;
        int datum = 0;

        for(int i = 0; i < buff.length; i += 2) {
            datum = (0xffff & buff[i] << 8) | (0xff & buff[i + 1]);
            sum += datum;
        }

        while((sum >> 16) > 0)
            sum = (sum >> 16) + (sum & 0xffff);

        return ~(int)sum & 0xffff;
    }

    public static void main(String[] args) {
        byte[] data = new byte[8];
        ArrayList sums = new ArrayList();

        try {
            long start = System.currentTimeMillis();
            FileInputStream fis = new FileInputStream("test.dat");

            while(fis.read(data) > 0) {
                sums.add(new Integer(inetChecksum(data)));
            }
            
            fis.close();
            System.out.println(System.currentTimeMillis() - start);            
        } catch (Exception ex) {            
        }
    }
}

Mirah

import java.io.FileInputStream
import java.util.ArrayList

def inet_checksum(buff:byte[]):int
  sum = long(0)
  datum = 0
  i = 0

  while i < buff.length
    datum = (0xffff & buff[i] << 8) | (0xff & buff[i += 1])
    sum += datum
    i += 1
  end

  sum = (int(sum) >> 16) + (sum & 0xffff) while (int(sum) >> 16) > 0

  ~int(sum) & 0xffff
end

data = byte[8]

begin
  start = System.currentTimeMillis
  sums = ArrayList.new
  fis = FileInputStream.new 'test.dat'
  
  sums.add Integer.new(inet_checksum(data)) while fis.read(data) > 0

  fis.close
  puts System.currentTimeMillis - start
rescue
end

I ran three trials resulting in:

Java

2127 ms
2088 ms
2119 ms
AVG: 2111 ms

Mirah

2231 ms
2031 ms
2043 ms
AVG: 2101 ms

Let's just call that about even.

Doesn't Haves

Thus far I've yammered on about what Mirah does. Now here's some notes on what Mirah doesn't do.

  • Generics - At this point Mirah doesn't do generics like Java (as of version 5 I believe). I spoke with Charles Nutter about this and he believes it'll likely be included in the future, however.
  • Ranges and other Ruby goodness - Mirah isn't Ruby so a few facilities a Ruby developer might be used to are missing such as ranges. In other words the following Ruby code isn't valid Mirah: (0..10).step(2) {|i| puts i}

Final Thoughts

Like I mentioned in the introduction I see great value in the modern runtimes but am not particularly thrilled with the typical languages used to program them. I'll admit that a lot of that has to do with just my personal taste but I truly believe languages like Mirah offer some additional conciseness and features that enhance productivity.

I, for one, and excited about it.

Tue Mar 29 2011 02:00:52 GMT+0000 (UTC)

Comments

GPU-Based Computing in C with NVIDIA's CUDA

NVIDIA CUDA Logo At work we do plenty of video and image manipulation, particularly video encoding. While it's certainly not a specialty of mine I've had plenty of exposure to enterprise-level transcoding for projects like transcode.it, our free public transcoding system; uEncode, a RESTful encoding service; and our own in-house solutions (we're DVD Empire, BTW).

Of course my exposure is rather high-level and large portions of the process still elude me but I've certainly developed an appreciation for boxes with multiple GPUs chugging away performing complex computation.

For a while now I've been hoping to dedicate some time to peer into the inner workings of GPUs and explore the possibility of using them for general-purpose, highly-parallel computing.

Well, I finally got around to it.

Most of the machines I use on a regular basis have some kind of NVIDIA card so I decided to see what resources they had to offer for general-purpose work. Turns out they set you up quite well!

They offer an architecture called CUDA which does a great job of rather directly exposing the compute resources of GPUs to developers. It supports Windows, Linux and Macs equally well as far as I can tell and while it has bindings for many higher-levels languages it's primarily accessible via a set of C extensions.

Like I said, I'm relatively new to this so I'm in no position to profess, but figured I might as well share one of the first experiments I did while familiarizing myself with CUDA.

Also, I'd like some feedback as I'm just getting my feet wet as well.

Getting Started & Documentation

Before going any further check out the "Getting Started Guide" for your platform on the CUDA download page. It will indicate what you specifically have to download and how to install it. I've only done so on Macs but the process was simple, I assure you.

Example

Ok, here's a little example C program that performs two parallel executions (the advantage of using GPUs is parallelism after all) of the "Internet Checksum" algorithm on some hard-coded sample data.

First I'll blast you with the full source then I'll walk through it piece by piece.

#include <stdio.h>
#include <cuda_runtime.h>

/* "kernel" to compute an "internet checksum" */
__global__ void inet_checksum(unsigned char *buff, size_t pitch,
    int len, unsigned short *checksums) {
  int i;
  long sum = 0;

  /* advance to where this threads data starts. the pitch
     ensured optimal alignment. */
  buff += threadIdx.x * pitch;
  unsigned short datum;

  for(i = 0; i < len / 2; i++) {
    datum = *buff++ << 8;
    datum |= *buff++;
    sum += datum;
  }

  while (sum >> 16)
    sum = (sum & 0xffff) + (sum >> 16);

  sum = ~sum;
  /* write data back for host */
  checksums[threadIdx.x] = (unsigned short)sum;
}

int main (int argc, char **argv) {
  int device_count;
  int size = 8;
  int count = 2;
  unsigned short checksums[count];
  int i; 

  unsigned char data[16] = {
           /* first chunk */
           0xe3, 0x4f, 0x23, 0x96, 0x44, 0x27, 0x99, 0xf3,
           /* second chunk */
           0xe4, 0x50, 0x24, 0x97, 0x45, 0x28, 0x9A, 0xf4};

  /* ask cuda how many devices it can find */
  cudaGetDeviceCount(&device_count);

  if(device_count < 1) {
    /* if it couldn't find any fail out */
    fprintf(stderr, "Unable to find CUDA device\n");
  } else {
    /* for the sake of this example just use the first one */
    cudaSetDevice(0);

    unsigned short *gpu_checksum;
    /* create a place for the results be stored in the GPU's
       memory space.  */
    cudaMalloc((void **)&gpu_checksum, count * sizeof(short));

    unsigned char *gpu_buff;
    size_t gpu_buff_pitch;

    /* create a 2d pointer in the GPUs memory space */
    cudaMallocPitch((void**)&gpu_buff, &gpu_buff_pitch,
      size * sizeof(unsigned char), count);

    /* copy our hard-coded data from above into the the GPU's
       memory spacing correctly alligned for 2d access. */
    cudaMemcpy2D(gpu_buff, gpu_buff_pitch, &data,
      sizeof(unsigned char) * size,
      size, count, cudaMemcpyHostToDevice);

    /* execute the checksum operation. two threads
       of execution will be executed due to the count param. */
    inet_checksum<<<1, count>>>(gpu_buff, gpu_buff_pitch, size,
      gpu_checksum);

    /* copy the results from the GPU's memory to the host's */
    cudaMemcpy(&checksums, gpu_checksum,
      count * sizeof(short), cudaMemcpyDeviceToHost);

    /* clean up the GPU's memory space */
    cudaFree(gpu_buff);
    cudaFree(gpu_checksum);

    for(i = 0; i < count; i++)
        printf("Checksum #%d 0x%x\n", i + 1, checksums[i]);
  }

  return 0;
}

Dissection

Phew, alright. There wasn't really all that much to it, but I'm sure many of you will appreciate some explanation.

I'm sure you know the first directive. The second is obviously the inclusion of CUDA.

#include <stdio.h>
#include <cuda_runtime.h>

The following is what's referred to as a "kernel" in CUDA. It's basically a function that can execute on a GPU. Note the function is __global__ and has no return type. The details of the function really aren't the subject of the article. In this case it calculates the "internet checksum" of the incoming buff but here's where you'd put your highly-parallelizable, computationally intensive code.

The pitch will make more sense later as it helps to deal with memory alignment of multi-dimensional data which is what the buff turns out to be despite being a one dimensional vector. One dimension per thread, each with eight bytes.

Also have a look at the threadIdx.x. That's how you can determine which thread you are and can use it to read/write from the correct indexes in vectors, etc.

/* "kernel" to compute an "internet checksum" */
__global__ void inet_checksum(unsigned char *buff, size_t pitch,
    int len, unsigned short *checksums) {
  int i;
  long sum = 0;

  /* advance to where this threads data starts. the pitch
     ensured optimal alignment. */
  buff += threadIdx.x * pitch;
  unsigned short datum;

  for(i = 0; i < len / 2; i++) {
    datum = *buff++ << 8;
    datum |= *buff++;
    sum += datum;
  }

  while (sum >> 16)
    sum = (sum & 0xffff) + (sum >> 16);

  sum = ~sum;
  /* write data back for host */
  checksums[threadIdx.x] = (unsigned short)sum;
}

Getting the party started. Note that this indicates that we have two elements of eight bytes a piece with the size and count variables. They'll carve up our hard-coded data.

int main (int argc, char **argv) {
  int device_count;
  int size = 8;
  int count = 2;
  unsigned short checksums[count];
  int i; 

Now here's the data we're going to checksum. We'll actually be treating this as two distinct values later. The first eight bytes will be checksummed while the second eight bytes are checksummed on another GPU thread.

  unsigned char data[16] = {
           /* first chunk */
           0xe3, 0x4f, 0x23, 0x96, 0x44, 0x27, 0x99, 0xf3,
           /* second chunk */
           0xe4, 0x50, 0x24, 0x97, 0x45, 0x28, 0x9A, 0xf4};

The comment says it all. We're just asking CUDA how many devices it can find. We could then use that information later to distribute load to GPUs.

  /* ask cuda how many devices it can find */
  cudaGetDeviceCount(&device_count);

For the most part we'll ignore it however. We will make sure at least one was found as there's not point to all this if we can't slap our load on a GPU! Assuming a GPU was found we'll call cudaSetDevice to direct CUDA to run our GPU routines there.

  if(device_count < 1) {
    /* if it couldn't find any fail out */
    fprintf(stderr, "Unable to find CUDA device\n");
  } else {
    /* for the sake of this example just use the first one */
    cudaSetDevice(0);

Now I'll create a vector for the checksum's to be written in to by our "kernel". Think of the cudaMalloc as a typical malloc call except the memory is reserved in the GPU's space. We wont' directly access that memory. Instead we'll copy in and out of it. The use of count indicats that it'll have room for two unsigned short values.

    unsigned short *gpu_checksum;
    /* create a place for the results be stored in the GPU's
       memory space.  */
    cudaMalloc((void **)&gpu_checksum, count * sizeof(short));

Here's some more allocation but in this case it's using a pitch. This is for the memory we'll write our workload into. We're using cudaMallocPitch because this data is essentially two dimensional and the pitch facilitates optimal alignment in memory. It's basically allocating two rows of eight byte columns.

    unsigned char *gpu_buff;
    size_t gpu_buff_pitch;

    /* create a 2d pointer in the GPUs memory space */
    cudaMallocPitch((void**)&gpu_buff, &gpu_buff_pitch,
      size * sizeof(unsigned char), count);

Now cudaMemcpy2D will shove the workload into the two-dimensial buffer we allocated above. Think memcpy for the GPU. Care is take to specify the dimensions of the data with the pitch, size and count. The cudaMemcpyHostToDevice parameter directs the data to the GPUs memory space rather than from it.

    /* copy our hard-coded data from above into the the GPU's
       memory spacing correctly alligned for 2d access. */
    cudaMemcpy2D(gpu_buff, gpu_buff_pitch, &data,
      sizeof(unsigned char) * size,
      size, count, cudaMemcpyHostToDevice);

Here's the money. See the <<<..., ...>>> business? The first argument is "blocks per grid" but I'll leave NVIDIA to explain that one to you in the CUDA C Programming Guide. The second argument indicates how many threads will be spawned. Like I said, this is all about parallelism. Consider our inet_checksum "kernel" hereby invoked twice in parallel!

    /* execute the checksum operation. two threads
       of execution will be executed due to the count param. */
    inet_checksum<<<1, count>>>(gpu_buff, gpu_buff_pitch, size,
      gpu_checksum);

Now the "kernel" executions are done. We've successfully executed our logic on a GPU! The results are still sitting in the GPU's memory space, however. We'll simply copy it out with cudaMemcpy while specifying cudaMemcpyDeviceToHost for the direction. The results are then in the checksums vector.

    /* copy the results from the GPU's memory to the host's */
    cudaMemcpy(&checksums, gpu_checksum,
      count * sizeof(short), cudaMemcpyDeviceToHost);

CUDA has its own allocating, and copying and of course its own clean-up. We'll be good citizens and use it here.

    /* clean up the GPU's memory space */
    cudaFree(gpu_buff);
    cudaFree(gpu_checksum);

Might as well let the user know the results, no?

    for(i = 0; i < count; i++)
        printf("Checksum #%d 0x%x\n", i + 1, checksums[i]);
  }

  return 0;
}

Compiling and Execution

Assuming you've installed the CUDA SDK according to the documentation you can compile with:

> nvcc -o yourprogram yoursourcefile.cu

and execution produces:

> ./yourprogram
Checksum #1 0x1aff
Checksum #2 0x16fb

.cu being the preferred extension to be used with the CUDA pre-processor.

Conclusion

There you have it. Execution of your own logic on a GPU.

Where to go from here? Well, this barely scratched the surface but NVIDIA's CUDA Zone site is the starting point to much more.

GPGPU.org is also a more platform independent source of general-purpose GPU computing.

Sun Jan 09 2011 03:01:45 GMT+0000 (UTC)

Comments

Solr Data Access in Ruby with SolrMapper

Skunkworx LogoRecently my employer (The Skunkworx) released our first open source project, SolrMapper. Like the readme says, it's a Ruby Object Document Mapper for the Apache Foundation's Solr search platform. It's loosely patterned after ActiveRecord and MongoMapper so it should feel somewhat familiar.

What differentiates SolrMapper from many other Solr libraries is that it's not necessarily dependent on another form of persistence a la acts_as_solr. It could certainly allow you to use Solr as a stand-alone, general purpose data store if that floats your boat.

Installing

gem install solr_mapper

Examples

I might as well get started with a simple example of a model. This example assumes a solr index located at http://localhost:8080/solr/article with an abbreviated schema of:

<fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="title" type="text" indexed="true" stored="true"/>
  <field name="content" type="text" indexed="true" stored="false" multiValued="true"/>
</fields>

and the model

require 'solr_mapper'

class Article
  include SolrMapper::SolrDocument

  bind_service_url  'http://localhost:8080/solr/article'
end

Not much to it. bind_service_url simply indicates the URL of the Solr instance housing our data. There's no real schema definition here. SolrDocument is just mixed-in and the model is pointed at an index.

Creating

I could then create an article as such:

id = UUID.new().generate()

article = Article.new(
  :_id => id,
  :title => 'This is a sample article',
  :content => 'Here is some sample content for our sample article.'
)

article.save()

Note the field identified by the symbol :_id. This field is somewhat special. SolrMapper prepends the underscore for this field only as it maps to "id" in solr. The remainder of the fields map to their respective fields in Solr strait-away.

Querying

I could then retrieve the article we just saved like:

article = Article.find(id)

#print out the article's title
puts article.title

producing the output

This is a sample article

Of course the whole point of Solr is rich, full-text searches. To demonstrate I'll need some additional sample data.

Article.new(
  :_id => UUID.new().generate(),
  :title => 'Yet another article',
  :content => 'Have another one.'
).save

Article.new(
  :_id => UUID.new().generate(),
  :title => 'Number three',
  :content => 'The third in a string of three.'
).save

The simplest way of accomplish this in SolrMapper is via a model's query method which can accept a typical Solr query string.

articles = Article.query('title:article')

articles.each do |article|
  puts article.title
end

producing

This is a sample article
Yet another article

It's also possible to pass in the query in a more structured fashion via a hash.

articles = Article.query({:title => 'article'})

articles.each do |article|
  puts article.title
end

producing the same results as the search example above.

Additional Solr parameters can be passed in as well such as this title sort.

articles = Article.query({:title => 'article'}, {:sort => 'title desc'})

Updating

Updating a portion of a live record can be accomplished with the familiar update_attributes method:

article.update_attributes(:title => 'An updated title')

Pagination

Generally speaking pretty much any traditional web app that lists or searches entities needs paging. SolrMapper integrates will_paginate to accomplish this.

articles_page = Article.paginate({:title => 'article'})

On to page 2 with:

articles_page = Article.paginate({:title => 'article'}, {:page => 2})

The page size can be modified with:

articles_page = Article.paginate({:title => 'article'}, {:rows => 5, :page => 2})

Relationships

Solr itself isn't relational and separate indexes are, well, separate. SolrMapper allows you to support rudimentary ActiveRecord-esque relationships. Remember though, there is no low-level joining going on. This all happens at the ruby level so you must be judicious.

class Biography
  include SolrMapper::SolrDocument
  has_many :articles

  bind_service_url 'http://localhost:8080/solr/bio'
end

class Article
  include SolrMapper::SolrDocument
  belongs_to :biography
  
  bind_service_url 'http://localhost:8080/solr/article'
end

Biography.find('SOME UUID').articles.each do |article|
  puts article.title
end

Auto-Generated IDs

As of 0.1.9 solr_mapper can auto-generate the id field with a UUID when you define your model as follows.

class Item
  include SolrDocument

  # tells solr_mapper to generate the id
  auto_generate_id

  bind_service_url 'http://localhost:8080/solr/item'
end

Conclusion

SolrMapper is intended to be simple and familiar and I hope we've accomplished that. We're looking for help either in patches or feedback, especially because we're just getting started here. If you have either please contact us at github.

Tue Oct 26 2010 02:08:14 GMT+0000 (UTC)

Comments
< 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >
Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google