Chris Umbel

The node.js Natural Language Story

In early May of 2011 I started work on natural, a general Natural Language Processing module for node.js. I was loosely basing the idea off of the ever-popular Natural Language ToolKit (NLTK) for python. I wanted to create a one-stop shop for NLP but for the node.js platform.

I'm excited to see that I'm not the only one with an interest in NLP under noedejs. Considering there's no way I can be totally comprehensive with natural it's imperative that the community is hacking away, building a great NLP story for node.

Here I'm going to outline the interesting node NLP projects that I've found so far.


natural - In some shameless self-promotion I'll list myself first:) Like I mentioned above, natural is a general natural language facility for node.js written by yours truly. Stemming, classification, phonetics, n-grams, tf-idf, WordNet, and some inflection are currently supported.

pos-js - Here's an excellent part of speech tagger by Percy Wegmann and Gerad Suyderhoud. It's a port of Mark Watson's FastTag Part of Speech Tagger for Java which in turn uses Eric Brill's POS ruleset.

glossary - Here's an auto tagger written by Heather Arthur which can extract keywords from text.

reds - a Redis Full-text search implementation by the prolific TJ Holowaychuk.

tfidf - an easy to use text frequency-inverse document frequency library for Node.js by Linus G Thiel of Hansson & Larsson.

Lingo - a general linguistics module by TJ Holowaychuk which does inflection, translation, and some casing.

nlp-node - rule-based NLP tools for node including date extraction and inflection by Spencer (not sure he wants his last name given).

Know of any others? Contact Me!

Help Me!

And finally I'd like to ask for help with natural. I'd love to make it as comprehensive as possible and there are a mountain of algorithms to implement for English alone. Also, I'm interested in supporting algorithms for other languages as well. If you have the capacity and interest let me know.

Sat Aug 20 2011 04:00:00 GMT+0000 (UTC)

Follow Chris
RSS Feed