Chris Umbel

The Sylvester Matrix Library Comes to Node.js

Before some recent machine learning implementations in node I set out to find a reasonable matrix math/linear algebra library for node.js. The pickings were slim but I managed to dig up a general JavaScript matrix math library written by James Coglan called sylvester. Clearly, sylvester had to be node-ified.

With the help of Rob Ellis (a collaborator on natural) it's been wrapped up into a node project titled node-sylvester & NPM and has had some features added such as element-wise multiplication, QR, LU, SVD decompositions and basic solving of systems of linear equations.

In this post I'll cover some of the basic structures and operations supported by sylvester, but it will by no means be complete. I'll focus solely on the Matrix and Vector prototypes, but sylvester also supports Line, Plane, and Polygon. Also within the covered prototypes I'll only demonstrate a small, but useful subset of their functionality.

Currently, the only reasonable sources of documentation for functionality existing only in the node port are in the README while general sylvester functionality is covered in its API docs. In time I will (hopefully with the help of the community) provide some more complete documentation.

Installation

Getting ready to use the node port of sylvester is what you'd expect, a standard NPM install.

npm install sylvester

You can then require-up sylvester in your node code and use the prototypes within.

var sylvester = require('sylvester'),
  Matrix = sylvester.Matrix,
  Vector = sylvester.Vector;

Vector and Matrix Prototypes

Matrices and vectors are abstracted by the Matrix and Vector prototypes.

Instances can be created using their create functions and passing in an array of values. Vector.create accepts a one dimensional array of numbers and Matrix.create accepts multiple dimensions.

var x = Vector.create([1, 2, 3]);
var a = Matrix.create([[1, 2], [3, 4]]);

representing:

Global shortcuts exist to clone Vector and Matrix prototypes with $V and $M respectively. The following is the semantic equivalent of the previous example.

var x = $V([1, 2, 3]);
var a = $M([[1, 2], [3, 4]]);

Matrix's Ones function will create a matrix of specified dimensions with all elements set to 1.

var ones = Matrix.Ones(3, 2);

Matrix's Zeros function does the same except with all elements set to 0.

var zeros = Matrix.Zeros(3, 2);

An identity matrix of a given size can be created with Matrix's I function.

var eye = Matrix.I(4);

Data Access

Values can be retrieved from within a Matrix or Vector using the e method.

a.e(2, 1); // returns 3
x.e(3); // returns 3

The entire set of values can be retrieved/manipulated with the elements member of Matrix and Vector. elements is an array-of-arrays-of-numbers in the case of matrices and it is a simple array-of-numbers for vectors.

var a_elements = a.elements; // [[1, 2], [3, 4]]
var x_elements = x.elements; // [1, 2, 3]

Basic Math

Many standard mathematic operations are supported in Vector and Matrix clones. In general the arguments can either be scalar values or properly-dimensioned Matrix/Vector clones. While not covered here element-wise versions of many multiplicative/specialized operations are also available.

Matrices and vectors can be added-to and subtracted-from by scalar values using the add and subtract functions.

var b = a.add(1);
var d = a.subtract(1);

symbolizing:

Naturally, like-dimensioned matrices and vectors can be added to and subtracted from each other.

var e = a.add($M([[2, 3], [4, 5]]));
var f = a.subtract($M([[0, 2], [3, 3]]));

meaning:

Matrices/vectors can be multiplied with matrices/vectors of appropriate dimensions.

var g = a.x($V([3, 4]));

The dot-product of two vectors can be computed with Vector's dot method.

var y = x.dot($V([4, 5, 6]));

depicting:

Transposing

A Matrix can be transposed with the transpose method.

var at = a.transpose();

where at represents the matrix:

Segmentation/Augmentation

The first n elements can be removed from a Vector with the chomp method.

var n = 2;
var xa = x.chomp(n);

In contrast the first n elements can be retrieved with the top method.

var xb = x.top(n);

Vectors can have a list of values appended to the end with the augment method.

var xc = x.augment([4, 5]);

Matrices can have a column appended to the right with Matrix's augment method.

var m = a.augment($V([3, 5]));

A sub-block of a Matrix clone can be retrieved with the slice method. slice accepts a starting row, ending row, starting column and ending column as parameters.

var aa = $M([[1, 2, 3], [4, 5, 6], [7, 8, 9]]);
var ab = aa.slice(1, 2, 1, 2);

The code above produces a matrix ab shaped like:

Decompositions/Advanced functions

A small number of decompositions are currently supported. These have all been recently added and at the time of writing are, while functional, not necessarily as computationally efficient as they could be. Also their stability cannot be guaranteed at this time.

Note that these were all implemented in pure JavaScript. My personal hope is to keep node-sylvester 100% functional in pure JavaScript to maintain the best compatibility across all platforms. In the long run, however, I hope to employ time-tested, computationally-efficient native libraries to perform these operations if the host machine allows.

A QR decomposition is made possible via Matrix's qr method. An object is returned containing both the Q and the R matrix.

var qr = a.qr();

resulting in the object:

{ Q: 
   [-0.316227766016838, -0.9486832980505138]
   [-0.9486832980505138, 0.3162277660168381],
  R: 
   [-3.162277660168379, -4.427188724235731]
   [5.551115123125783e-16, -0.6324555320336751] }

Singular Value Decompositions (SVD) can be produced via Matrix's aptly named svd method. An object is returned containing the U (left singular), S (singular diagonal), and V (right singular) matrices.

var svd = a.svd();

which returns:

{ U: 
   [0.40455358483359943, 0.9145142956773742]
   [0.9145142956773744, -0.40455358483359943],
  S: 
   [5.4649857042190435, 0]
   [0, 0.36596619062625785],
  V: 
   [0.5760484367663301, -0.8174155604703566]
   [0.8174155604703568, 0.5760484367663302] }

Principal Component Analysis can be performed (dimensionality reduced) and reversed (dimensionality restored with approximated values) using Matrix's pcaProject and pcaRecover methods respectively.

pcaProject accepts the number of target dimensions as a parameter and returns an object with the reduced data Z and the covariant eigenvectors U.

var pca = a.pcaProject(1);
{ Z: 
   [2.2108795577070475]
   [4.997807552180416],
  U: 
   [0.5760484367663208, -0.8174155604703633]
   [0.8174155604703633, 0.5760484367663208] }

and pcaRecover approximately recovers the data to the original dimensionality.

pca.Z.pcaRecover(pca.U);   

which produces:

Solving Linear Equations

Simple linear equations in the form of

can be solved using the suitability-named solve method.

var A = $M([
    [2, 4],
    [2, 3],
]);

var b = $V([2, 2]);
var x= M.solve(b);

which is representative of

Conclusion

Thanks to James Coglan's hard work on sylvester plenty of matrix math functionality has been exposed to JavaScript. With a little extra work we've been able to expose quite a bit more to node. There's still more work to do not only in optimization but also in capabilities. In time hopefully node-sylvester will become fast and complete.

As a reminder, the functionality outlined above is not representative of the full list of current capabilities. Between the README in the source and the sylvester API documentation you should be able to get a reasonably clear picture. Still, if you're looking for ways to help with the project a concerted documentation effort would be wonderful and appreciated.

Sat Dec 17 2011 05:00:00 GMT+0000 (UTC)

0 Comments - Comment Feed - Permalink - Add Comment

Io, A Beautiful, Prototype-Based Language

I recently started reading Seven Languages in Seven Weeks by Bruce A. Tate and intended to ultimately post a review. Something went wrong along the way, however. On the second language covered in the book I became so intrigued that I had to dedicate a single post to it immediately. That language is Io, a prototype-based language created by Steve Dekorte.

Now, you have to keep in mind that I've only been hacking Io for about three days. I'm not an expert. I can't promise any of the examples I'm going to provide are idiomatic, robust or performant. I can't promise any of my advice will be prudent or that this article can even properly tutor you on Io.

I can, however, make the strong recommendation that you give Io a shot especially if you're a novice JavaScript programmer. Being prototype-based and syntactically simple Io can help an aspiring JavaScripter truly understand the patterns without the baggage that comes along with JavaScript. I can only promise an honest attempt at whetting your appetite.

Here I'll just outline some examples rather than deliver thorough instruction. For that Seven Languages in Seven Weeks does a fine job (most of these examples are adapted from its exercises) and the Io Guide can be a wonderful help as well.

Prototype-based

As I mentioned several times already Io is prototype-based. There are no classes yet there are objects. Objects can clone other objects providing an inheritance mechanism. Consider the following example.

Thing := Object clone

That code created a prototype named "Thing" from the super-prototype "Object". Note the ":=" assignment operator. That essentially creates the destination if it does not exist yet. Had "Thing" already been assigned to "=" would have sufficed as it can only assign an existing slot (slots are the named locations in an object where data members and methods can be stored).

Now lets add a method to the "Thing" prototype. The following code will add a method into the "printMessage" slot of the "Thing" prototype.

Thing printMessage := method(
    writeln("Hello, thing!")
)

We can clone the "Thing" prototype into an instance and call the "printMessage" as such.

thing := Thing clone
thing printMessage

which would output

Hello, thing!

Io thinks of things in terms of message-passing. Rather than saying, "we just called thing's printMessage method" we should say, "we just sent a printMessage message to thing."

Note that we cloned into a lower-case "thing". When you're defining a pure prototype you start the identifier with a capital and instances with lower case.

Now lets add a data member to the instance and method to the prototype to demonstrate parameters and encapsulation.

Thing calculatePrice := method(markup,
    self price + markup
)

thing price := 10
thing calculatePrice(2) println

which outputs

12

That example also demonstrates chaining messages together. calculatePrice returned a Number and we in turn pass a println message to the number (as an alternative to passing the number itself to writeln).

Metaprogramming

One of the exercises covered in Seven Languages in Seven Weeks that I found interesting was changing the behavior of Io itself so that division by zero would result in a 0 not infinity. Of course, you'd probably never want to do that, but it's a wonderful example of how much control you have over the runtime itself.

For instance if you execute the following in Io without any additional intervention

(6 / 0) println

you get

inf

Now let's reach into Io and change how division works.

Number oldDiv := Number getSlot("/")

Number / := method(d,  
  if(d == 0, 0, self oldDiv(d))
)

(6 / 2) println
(6 / 0) println

outputting

3
0

The first output was done just to illustrate that we didn't break division entirely. The second illustrates that we sure did change how division by 0 works, rather than "inf" we got "0".

Let's break it down line by line.

Number oldDiv := Number getSlot("/")

That may look reasonable to a Rubyist who would "alias" in a situation like this. We're basically copying the division operator's (the method in slot "/") logic into another slot named "oldDiv". Since we're going to rewrite "/" we'll want to keep the functionality around for later use and "oldDiv" is a fine place.

Number / := method(d,  
  if(d == 0, 0, self oldDiv(d))
)

Now we've changed the "/" method of Number. If the denominator (the lone parameter) is zero we will return zero. Otherwise we rely on the "oldDiv" to perform normal division.

DSLs/Structured Data Parsing

As a Rubyist by trade I always chuckle when I hear .Net or Java programmers claim that Domain Specific Languages are a waste of time. From their perspective the time investment is far greater than any value that might be yielded. Nine times out of ten they're probably right. They work with tools with a fixed idea of what instructions should look like.

Io gives you a great deal of control over the language's parser itself. Rather than writing a parser or implementing a DSL within the confines of the host language you can teach Io to parse and evaluate your DSL within its own interpreter!

We're going to write a little JSON parser here. Sure, there are probably better ways of parsing JSON with Io but it provides an effective example that's easy to relate to.

Take a file named "test.json" with the following content as given:

{
  "name": "Chris Umbel",
  "lucky_numbers": [6, 13],
  "job"	: {
    "title": "Software Developer"
  }
}

The following code will parse the JSON data, albeit liberally. This is meant to be demonstrative, not robust.

OperatorTable addAssignOperator(":", "atPutNumber")

curlyBrackets := method(
  data := Map clone
  call message arguments foreach(arg,
    data doMessage(arg))
  data
)

squareBrackets := method(
  arr := list()

  call message arguments foreach(arg,
    arr push(call sender doMessage(arg)))

  arr
)

Map atPutNumber := method(
  self atPut(
    # strip off leading and trailing quotes
    call evalArgAt(0) asMutable removePrefix("\"") removeSuffix("\""),
    call evalArgAt(1)
  )
)

s := File with("test.json") openForReading contents
json := doString(s)

json at("name") println
json at("lucky_numbers") println
json at("job") at("title") println

which will output

Chris Umbel
list(6, 13)
Software Developer

Now to break it down.

OperatorTable addAssignOperator(":", "atPutNumber")

Here we told Io to accept a brand spanking new assignment operator with the text ":". It will then pass the argument along to the target object via the "atPutNumber" message.

curlyBrackets := method(
  data := Map clone
  call message arguments foreach(arg,
    data doMessage(arg))
  data
)

That instructs Io what to do when it comes across a curly bracket when parsing code. In our case it it creates a Map and begins to fill it. "call" performs reflection on the argument data passed to the method. "call message arguments" accesses the list of all arguments recieved.

squareBrackets := method(
  arr := list()
  call message arguments foreach(arg,
    arr push(call sender doMessage(arg)))
  arr
)

Here we instructed Io how to deal with JSON arrays. Per the slot name it builds a list of all elements enclosed in square brackets.

Map atPutNumber := method(
  self atPut(
    # strip off leading and trailing quotes
    call evalArgAt(0) asMutable removePrefix("\"") removeSuffix("\""),
    call evalArgAt(1)
  )
)

The Map prototype has been given a atPutNumber method that will strip quote off of the element names and slap the value specified by the JSON into the corresponding value in the Map. "evalArgAt" grabs the argument data at a specified index.

s := File with("test.json") openForReading contents
json := doString(s)

Now there's the money. We loaded up the JSON document and slapped its contents in a string. We then essentially eval it with "doString" letting the Io interpreter do the dirty work.

Message Forwarding

In Ruby something called "method_missing" is relied upon to handle methods of arbitrary names at runtime. The Io equivalent is "forward". The following example supplies a simple mechanism to build XML like:

<movies>
  <movie>
    <title>
      The Thing
    </title>
    <genre>
      Horror
    </genre>
  </movie>
</movies>

with code like:

builder := Builder clone
builder movies(
  movie(
    title("The Thing"),
    genre("Horror")
  )
)

Here's the code to make it happen.

Builder := Object clone do(
  depth := 0
)

Builder indent := method(
  depth repeat(
    write("  ")
  )
)

Builder emit := method(
  indent

  call message arguments foreach(arg,
    write(call sender doMessage(arg))
  )

  writeln
)

Builder emitStart := method(
  emit("<", call evalArgs join, ">")
)

Builder emitEnd := method(text,
  emitStart("/", text)
)

# handles messages for non-existant methods. 
Builder forward := method(
  emitStart(call message name)
  depth = depth + 1

  call message arguments foreach(arg,
      content := self doMessage(arg)
      if(content type == "Sequence",
        emit(content)
      )
  )

  depth = depth - 1
  emitEnd(call message name)
)

builder := Builder clone
builder movies(
  movie(
    title("The Thing"),
    genre("Horror")
  )
)

Now that's a big example, but let me extract the section of most relevance.

# handles messages for non-existant methods. 
Builder forward := method(
  emitStart(call message name)
  depth = depth + 1

  call message arguments foreach(arg,
      content := self doMessage(arg)
      if(content type == "Sequence",
        emit(content)
      )
  )

  depth = depth - 1
  emitEnd(call message name)
)

By implementing a method in the "forward" slot we allow the Builder prototype to accept messages of any name. Our "forward" method then obtains the name of the message sent with "call message name" and handles it accordingly (making a node of that name).

Next Steps

One thing not covered here is the concurrency story, and Io really shines there. Easy coroutines, actors and futures are available to provide refreshing simplicity to what's typically a tricky problem.

There are also reasonable libraries available for common tasks like networking and XML parsing.

Check out the Io Guide for information on these and other topics. Of course Seven Languages in Seven Weeks is wonderful as well.

Tue Sep 06 2011 03:00:00 GMT+0000 (UTC)

0 Comments - Comment Feed - Permalink - Add Comment

The node.js Natural Language Story

In early May of 2011 I started work on natural, a general Natural Language Processing module for node.js. I was loosely basing the idea off of the ever-popular Natural Language ToolKit (NLTK) for python. I wanted to create a one-stop shop for NLP but for the node.js platform.

I'm excited to see that I'm not the only one with an interest in NLP under noedejs. Considering there's no way I can be totally comprehensive with natural it's imperative that the community is hacking away, building a great NLP story for node.

Here I'm going to outline the interesting node NLP projects that I've found so far.

Projects

natural - In some shameless self-promotion I'll list myself first:) Like I mentioned above, natural is a general natural language facility for node.js written by yours truly. Stemming, classification, phonetics, n-grams, tf-idf, WordNet, and some inflection are currently supported.

pos-js - Here's an excellent part of speech tagger by Percy Wegmann and Gerad Suyderhoud. It's a port of Mark Watson's FastTag Part of Speech Tagger for Java which in turn uses Eric Brill's POS ruleset.

glossary - Here's an auto tagger written by Heather Arthur which can extract keywords from text.

reds - a Redis Full-text search implementation by the prolific TJ Holowaychuk.

tfidf - an easy to use text frequency-inverse document frequency library for Node.js by Linus G Thiel of Hansson & Larsson.

Lingo - a general linguistics module by TJ Holowaychuk which does inflection, translation, and some casing.

nlp-node - rule-based NLP tools for node including date extraction and inflection by Spencer (not sure he wants his last name given).

Know of any others? Contact Me!

Help Me!

And finally I'd like to ask for help with natural. I'd love to make it as comprehensive as possible and there are a mountain of algorithms to implement for English alone. Also, I'm interested in supporting algorithms for other languages as well. If you have the capacity and interest let me know.

Sat Aug 20 2011 04:00:00 GMT+0000 (UTC)

3 Comments - Comment Feed - Permalink - Add Comment

Natural Language Processing in node.js with "natural"

Over the last few years I've developed a bit of an interest in natural-language processing. It's never been the focus of my work, but when you're exposed to as many enterprise-class data storage/search systems as I have you have no choice but to absorb some details. Several hobby projects, sometimes involving home-brewed full-text searching, have also popped up requiring at least a cursory understanding of stemming and phonetic algorithms. Another recurring theme in my hobby projects has been classification for custom spam filtering and analyzing twitter sentiment. node.js logo

In general, accomplishing these goals simply required the use of someone else's hard work, wether it be having Solr/ Lucene to stem my corpora at the office, using the Ruby classifier gem to analyze tweets about stocks or using the Python Natural Language Toolkit for... Well, pretty much anything.

Recent months have brought a new platform into my hobby work, node.js, which, while stable, still has maturing to do. Like so many things I work with anymore the need for natural-language facilities arose and I found the pickings pretty slim. I have to be honest. That's *exactly* what I was hoping for; an opportunity to sink my teeth into the algorithms themselves.

Thus I began work on "natural", a module of natural languages algorithms for node.js. The idea is loosely based on the Python NLTK in that all algorithms are in the same package, however it will likely never be anywhere near as complete. I'd be lucky for "natural" to ever do 1/2 of what NLTK does without plenty of help. As of version 0.0.17 it has two stemmers (Porter and Lancaster), one classifier (Naive Bayes), two phonetic algorithms (Metaphone and SoundEx) and an inflector.

The strategy was to cast a wide enough net to see how the algorithms might fit together in terms of interface and dependancies first. Making them performant and perfectly accurate is step two, which admittedly will still require some work. At the time of writing "natural" is in version 0.0.17 and everything seems to work (not in an official beta of any kind) but until the version ticks 0.1.0 it's subject to significant internal change. Hopefully the interfaces will stay the same.

With the exception of the Naive Bayes classifier (to which you can supply tokens of your own stemming) all of these algorithms have no real applicability outside of English. This is a problem I'd like to rectify after solidifying a 0.1.0 release and would love to get some more people involved to accomplish it.

Installing

In order to use "natural" you have to install it... naturally. Like most node modules "natural" is packaged up in an NPM and can be install from the command line as such:

npm install natural

If you want to install from source (which can be found here on github), pull it and install the npm from the source directory.

git clone git://github.com/NaturalNode/natural.git
cd natural
npm install .

Stemming

The first class of algorithms I'd like to outline is stemming. As stated above the Lancaster and Porter algorithms are supported as of 0.0.17. Here's a basic example of stemming a word with a Porter Stemmer.

var natural = require('natural'),
    stemmer = natural.PorterStemmer;

var stem = stemmer.stem('stems');
console.log(stem);
stem = stemmer.stem('stemming');
console.log(stem);
stem = stemmer.stem('stemmed');
console.log(stem);
stem = stemmer.stem('stem');
console.log(stem);

Above I simply required-up the main "natural" module and grabbed the PorterStemmer sub-module from within. Calling the "stem" function takes an arbitrary string and returns the stem. The above code returns the following output:

stem
stem
stem
stem

For convenience stemmers can patch String with methods to simplify the process by calling the attach method. String objects will then have a stem method.

stemmer.attach();
stem = 'stemming'.stem();
console.log(stem);

Generally you'd be interested in stemming an entire corpus. The attach method provides a tokenizeAndStem method to accomplish this. It breaks the owning string up into an array of strings, one for each word, and stems them all. For example:

var stems = 'stems returned'.tokenizeAndStem();
console.log(stems);

produces the output:

[ 'stem', 'return' ]

Note that the tokenizeAndStem method will omit certain words by default that are considered irrelevant (stop words) from the return array. To instruct the stemmer to not omit stop words pass a true in to tokenizeAndStem for the keepStops parameter. Consider:

console.log('i stemmed words.'.tokenizeAndStem());
console.log('i stemmed words.'.tokenizeAndStem(true));
outputting:
[ 'stem', 'word' ]
[ 'i', 'stem', 'word' ]

All of the code above would also work with a Lancaster stemmer by requiring the LancasterStemmer module instead, like:

var natural = require('natural'),
    stemmer = natural.LancasterStemmer;

Of course the actual stems produced could be different depending on the algorithm chosen.

Phonetics

Phonetic algorithms are also provided to determine what words sound like and compare them accordingly. The old (and I mean old... like 1918 old) SoundEx and the more modern Metaphone algorithm are supported as of 0.0.17.

The following example compares the string "phonetics" and the intentional misspelling "fonetix" and determines they sound alike according to the Metaphone algorithm.

var natural = require('natural'),
    phonetic = natural.Metaphone;

var wordA = 'phonetics';
var wordB = 'fonetix';

if(phonetic.compare(wordA, wordB))
    console.log('they sound alike!');

The raw code the phonetic algorithm produces can be retrieved with the process method:

var phoneticCode = phonetic.process('phonetics');
console.log(phoneticCode);

resulting in:

FNTKS

Like the stemming implementations the phonetic modules have an attach method that patches String with shortcut methods, most notably soundsLike for comparison:

phonetic.attach();

if(wordA.soundsLike(wordB))
    console.log('they sound alike!');

attach also patches in a phonetics and tokenizeAndPhoneticize methods to retrieve the phonetic code for a single word and an entire corpus respectively.

console.log('phonetics'.phonetics());
console.log('phonetics rock'.tokenizeAndPhoneticize());

which outputs:

FNTKS
[ 'FNTKS', 'RK' ]

The above could could also use SoundEx by substituting the following in for the require.

var natural = require('natural'),
    phonetic = natural.SoundEx;

Inflector

Basic inflectors are in place to convert nouns between plural and singular forms and to turn integers into string counters (i.e. '1st', '2nd', '3rd', '4th 'etc.).

The following example converts the word "radius" into its plural form "radii".

var natural = require('natural'),
    nounInflector = new natural.NounInflector();

var plural = nounInflector.pluralize('radius');
console.log(plural);

Singularization follows the same pattern as is illustrated in the following example wich converts the word "beers" to its singular form, "beer".

var singular = nounInflector.singularize('beers');
console.log(singular);

Just like the stemming and phonetic modules an attach method is provided to patch String with shortcut methods.

nounInflector.attach();
console.log('radius'.pluralizeNoun());
console.log('beers'.singularizeNoun()); 

A NounInflector instance can do custom conversion if you provide expressions via the addPlural and addSingular methods. Because these conversion aren't always symmetric (sometimes more patterns may be required to singularize forms than pluralize) there needn't be a one-to-one relationship between addPlural and addSingular calls.

nounInflector.addPlural(/(code|ware)/i, '$1z');
nounInflector.addSingular(/(code|ware)z/i, '$1');

console.log('code'.pluralizeNoun());
console.log('ware'.pluralizeNoun());

console.log('codez'.singularizeNoun());
console.log('warez'.singularizeNoun());

which would result in:

codez
warez
code
ware

Here's an example of using the CountInflector module to produce string counter for integers.

var natural = require('natural'),
    countInflector = natural.CountInflector;

console.log(countInflector.nth(1));
console.log(countInflector.nth(2));
console.log(countInflector.nth(3));
console.log(countInflector.nth(4));
console.log(countInflector.nth(10));
console.log(countInflector.nth(11));
console.log(countInflector.nth(12));
console.log(countInflector.nth(13));
console.log(countInflector.nth(100));
console.log(countInflector.nth(101));
console.log(countInflector.nth(102));
console.log(countInflector.nth(103));
console.log(countInflector.nth(110));
console.log(countInflector.nth(111));
console.log(countInflector.nth(112));
console.log(countInflector.nth(113));

producing:

1st
2nd
3rd
4th
10th
11th
12th
13th
100th
101st
102nd
103rd
110th
111th
112th
113th

Classification

At the moment classification is supported only by the Naive Bayes algorithm. There are two basic steps involved in using the classifier: training and classification.

The following example requires-up the classifier and trains it with data. The train method accepts an array of objects containing the name of the classification and the sample corpus.

var natural = require('natural'),
classifier = new natural.BayesClassifier();
classifier.addDocument("my unit-tests failed.", 'software');
classifier.addDocument("tried the program, but it was buggy.", 'software');
classifier.addDocument("the drive has a 2TB capacity.", 'hardware');
classifier.addDocument("i need a new power supply.", 'hardware');
classifier.train();

By default the classifier will tokenize the corpus and stem it with a LancasterStemmer. You can use a PorterStemmer by passing it in to the BayesClassifier constructor as such:

var natural = require('natural'),
stemmer = natural.PorterStemmer,
classifier = new natural.BayesClassifier(stemmer);

With the classifier trained it can now classify documents via the classify method:

console.log(classifier.classify('did the tests pass?'));
console.log(classifier.classify('did you buy a new drive?'));

resulting in the output:

software
hardware

Similarly the classifier can be trained on arrays rather than strings, bypassing tokenization and stemming. This allows the consumer to perform custom tokenization and stemming if any at all. This is especially useful in a non-natural language scenario.

classifier.addDocument( ['unit', 'test'], 'software');
classifier.addDocument( ['bug', 'program'], 'software');
classifier.addDocument(['drive', 'capacity'], 'hardware');
classifier.addDocument(['power', 'supply'], 'hardware');

classifier.train();

It's possible to persist and recall the results of a training via the save method:

var natural = require('natural'),
classifier = new natural.BayesClassifier();

classifier.addDocument( ['unit', 'test'], 'software');
classifier.addDocument( ['bug', 'program'], 'software');
classifier.addDocument(['drive', 'capacity'], 'hardware');
classifier.addDocument(['power', 'supply'], 'hardware');

classifier.train();

classifier.save('classifier.json', function(err, classifier) {
    // the classifier is saved to the classifier.json file!
 });

The training could then be recalled later with the load method:

var natural = require('natural'),
    classifier = new natural.BayesClassifier();

natural.BayesClassifier.load('classifier.json', null, function(err, classifier) {
    console.log(classifier.classify('did the tests pass?'));
});

Conclusion

This concludes the current state of "natural". Like I said in the introduction, there are certainly potential improvements in both terms of accuracy and performance. Now that 0.0.17 has been released features are frozen while I focus on improving both for 0.1.0.

Post-0.1.0 I intend to make "natural" more complete; slowly staring to match the NLTK with additional algorithms of all classifications and hopefully for additional languages. For that I humbly ask assistance:)

Sun May 22 2011 22:29:06 GMT+0000 (UTC)

3 Comments - Comment Feed - Permalink - Add Comment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 >
Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google