Introduction

wink-pos-tagger

English Part-of-speech (POS) tagger

Build Status Coverage Status Inline docs dependencies Status devDependencies Status

Perform part-of-speech tagging of english sentences using wink-pos-tagger. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.

Installation

Use npm to install:

npm install wink-pos-tagger --save

Getting Started

The code below illustrates the steps required to pos tag a sentence:

// Load wink-pos-tagger.
var posTagger = require( 'wink-pos-tagger' );

// Create an instance of the pos tagger.
var tagger = posTagger();

// Tag the sentence using the tag sentence api.
tagger.tagSentence( 'He is trying to fish for fish in the lake.' );
// -> [ { value: 'He', tag: 'word', normal: 'he', pos: 'PRP' },
//      { value: 'is', tag: 'word', normal: 'is', pos: 'VBZ', lemma: 'be' },
//      { value: 'trying', tag: 'word', normal: 'trying', pos: 'VBG', lemma: 'try' },
//      { value: 'to', tag: 'word', normal: 'to', pos: 'TO' },
//      { value: 'fish', tag: 'word', normal: 'fish', pos: 'VB', lemma: 'fish' },
//      { value: 'for', tag: 'word', normal: 'for', pos: 'IN' },
//      { value: 'fish', tag: 'word', normal: 'fish', pos: 'NN', lemma: 'fish' },
//      { value: 'in', tag: 'word', normal: 'in', pos: 'IN' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'lake', tag: 'word', normal: 'lake', pos: 'NN', lemma: 'lake' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]

Notice the way instances of the word "fish" have been tagged as verb and noun.

Documentation

Check out the pos tagger API documentation to learn more.

Need Help?

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.

Copyright & License

wink-pos-tagger is copyright 2017-18 GRAYPE Systems Private Limited.

It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.

Creating an Instance

posTagger

Creates an instance of wink-pos-tagger.

posTagger(): methods
Returns
methods: object conatining set of API methods for pos-tagging — tag and for updating lexicon — updateLexicon .
Example
// Load wink tokenizer.
var tagger = require( 'wink-pos-tagger' );
// Create your instance of wink tokenizer.
var myTagger = tagger();

API Methods

defineConfig

This API has no effect. It has been maintained for compatibility purpose. The wink-tokenizer will now always add lemma and normal forms. Note, lemmas are added only for nouns (excluding proper noun), verbs and adjectives.

defineConfig(): object
Returns
object: always as { lemma: true, normal: true } .
Example
// There will not be any effect:
var myTagger.defineConfig( { lemma: false } );
// -> { lemma: true, normal: true }

tagSentence

Tags the input sentence with their pos.

tagSentence(sentence: string): Array<object>
Parameters
sentence (string) — to be pos tagged.
Returns
Array<object>: pos tagged tokens.
Throws
  • Error: if sentence is not a valid string.
Example
myTagger.tagSentence( 'A bear just crossed the road.' );
// -> [ { value: 'A', tag: 'word', normal: 'a', pos: 'DT' },
//      { value: 'bear', tag: 'word', normal: 'bear', pos: 'NN', lemma: 'bear' },
//      { value: 'just', tag: 'word', normal: 'just', pos: 'RB' },
//      { value: 'crossed', tag: 'word', normal: 'crossed', pos: 'VBD', lemma: 'cross' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'road', tag: 'word', normal: 'road', pos: 'NN', lemma: 'road' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]
//
//
myTagger.tagSentence( 'I will bear all the expenses.' );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'will', tag: 'word', normal: 'will', pos: 'MD', lemma: 'will' },
//      { value: 'bear', tag: 'word', normal: 'bear', pos: 'VB', lemma: 'bear' },
//      { value: 'all', tag: 'word', normal: 'all', pos: 'PDT' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'expenses', tag: 'word', normal: 'expenses', pos: 'NNS', lemma: 'expense' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]

tag

Tags the input tokens with their pos.

tag(tokens: Array<object>): Array<object>
Parameters
tokens (Array<object>) — to be pos tagged. They are array of objects and must follow the wink-tokenizer standard.
Returns
Array<object>: pos tagged tokens .
Example
// Get `tokenizer` method from the instance of `wink-tokenizer`.
var tokenize = require( 'wink-tokenizer' )().tokenize;
// Tag the tokenized sentence.
myTagger.tag( tokenize( 'I ate the entire pizza as I was feeling hungry.' ) );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'ate', tag: 'word', normal: 'ate', pos: 'VBD', lemma: 'eat' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'entire', tag: 'word', normal: 'entire', pos: 'JJ', lemma: 'entire' },
//      { value: 'pizza', tag: 'word', normal: 'pizza', pos: 'NN', lemma: 'pizza' },
//      { value: 'as', tag: 'word', normal: 'as', pos: 'IN' },
//      { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'was', tag: 'word', normal: 'was', pos: 'VBD', lemma: 'be' },
//      { value: 'feeling', tag: 'word', normal: 'feeling', pos: 'VBG', lemma: 'feel' },
//      { value: 'hungry', tag: 'word', normal: 'hungry', pos: 'JJ', lemma: 'hungry' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]

updateLexicon

Updates the internal lexicon using the input lexicon. If a word/pos pair is found in the internal lexicon then it's value is updated with the new pos; otherwise it added.

updateLexicon(lexicon: object): undefined
Parameters
lexicon (object) — containing word/pos pairs to be added to or replaced in the existing lexicon. The pos should be an array containing pos tags, with the first one as the most frequently used POS. The word is normalized before updating the internal lexicon.
Returns
undefined: Nothing!
Throws
  • Error: if lexicon is not a valid JS object.
Example
myTagger.updateLexicon( { Obama: [ 'NNP' ] } );