Tagger

Tagger class

Methods

defineConfig

defineConfig() → {object}

This API has no effect. It has been maintained for compatibility purpose. The wink-tokenizer will now always add lemma and normal forms. Note, lemmas are added only for nouns (excluding proper noun), verbs and adjectives.

Example
// There will not be any effect:
var myTagger.defineConfig( { lemma: false } );
// -> { lemma: true, normal: true }
Returns

always as { lemma: true, normal: true }.

Type
object

tag

tag(tokens) → {Array.<object>}

Tags the input tokens with their pos. It has another alias – tagTokens().

In order to pos tag a sentence directly, use tagSentence API instead.

Example
// Get `tokenizer` method from the instance of `wink-tokenizer`.
var tokenize = require( 'wink-tokenizer' )().tokenize;
// Tag the tokenized sentence.
myTagger.tag( tokenize( 'I ate the entire pizza as I was feeling hungry.' ) );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'ate', tag: 'word', normal: 'ate', pos: 'VBD', lemma: 'eat' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'entire', tag: 'word', normal: 'entire', pos: 'JJ', lemma: 'entire' },
//      { value: 'pizza', tag: 'word', normal: 'pizza', pos: 'NN', lemma: 'pizza' },
//      { value: 'as', tag: 'word', normal: 'as', pos: 'IN' },
//      { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'was', tag: 'word', normal: 'was', pos: 'VBD', lemma: 'be' },
//      { value: 'feeling', tag: 'word', normal: 'feeling', pos: 'VBG', lemma: 'feel' },
//      { value: 'hungry', tag: 'word', normal: 'hungry', pos: 'JJ', lemma: 'hungry' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]
Parameters
Name Type Description
tokens Array.<object>

to be pos tagged. They are array of objects and must follow the wink-tokenizer standard.

Returns

pos tagged tokens.

Type
Array.<object>

tagRawTokens

tagRawTokens(rawTokens) → {Array.<object>}

Tags the raw tokens with their pos. Note, it only categorizes each token in to one of the following 3-categories (a) word, or (b) punctuation, or (c) number.

In order to pos tag a sentence directly, use tagSentence API instead.

Example
var rawTokens = [ 'I', 'ate', 'the', 'entire', 'pizza', 'as', 'I', 'was', 'feeling', 'hungry', '.' ];
// Tag the raw tokens.
myTagger.tagRawTokens( rawTokens );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'ate', tag: 'word', normal: 'ate', pos: 'VBD', lemma: 'eat' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'entire', tag: 'word', normal: 'entire', pos: 'JJ', lemma: 'entire' },
//      { value: 'pizza', tag: 'word', normal: 'pizza', pos: 'NN', lemma: 'pizza' },
//      { value: 'as', tag: 'word', normal: 'as', pos: 'IN' },
//      { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'was', tag: 'word', normal: 'was', pos: 'VBD', lemma: 'be' },
//      { value: 'feeling', tag: 'word', normal: 'feeling', pos: 'VBG', lemma: 'feel' },
//      { value: 'hungry', tag: 'word', normal: 'hungry', pos: 'JJ', lemma: 'hungry' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]
Parameters
Name Type Description
rawTokens Array.<string>

to be pos tagged. They are simple array of string.

Returns

pos tagged tokens.

Type
Array.<object>

tagSentence

tagSentence(sentence) → {Array.<object>}

Tags the input sentence with their pos.

Example
myTagger.tagSentence( 'A bear just crossed the road.' );
// -> [ { value: 'A', tag: 'word', normal: 'a', pos: 'DT' },
//      { value: 'bear', tag: 'word', normal: 'bear', pos: 'NN', lemma: 'bear' },
//      { value: 'just', tag: 'word', normal: 'just', pos: 'RB' },
//      { value: 'crossed', tag: 'word', normal: 'crossed', pos: 'VBD', lemma: 'cross' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'road', tag: 'word', normal: 'road', pos: 'NN', lemma: 'road' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]
//
//
myTagger.tagSentence( 'I will bear all the expenses.' );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'will', tag: 'word', normal: 'will', pos: 'MD', lemma: 'will' },
//      { value: 'bear', tag: 'word', normal: 'bear', pos: 'VB', lemma: 'bear' },
//      { value: 'all', tag: 'word', normal: 'all', pos: 'PDT' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'expenses', tag: 'word', normal: 'expenses', pos: 'NNS', lemma: 'expense' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]
Parameters
Name Type Description
sentence string

to be pos tagged.

Throws

if sentence is not a valid string.

Type
Error
Returns

pos tagged tokens.

Type
Array.<object>

updateLexicon

updateLexicon(lexicon) → {undefined}

Updates the internal lexicon using the input lexicon. If a word/pos pair is found in the internal lexicon then it's value is updated with the new pos; otherwise it added.

Example
myTagger.updateLexicon( { Obama: [ 'NNP' ] } );
Parameters
Name Type Description
lexicon object

containing word/pos pairs to be added to or replaced in the existing lexicon. The pos should be an array containing pos tags, with the first one as the most frequently used POS. The word is normalized before updating the internal lexicon.

Throws

if lexicon is not a valid JS object.

Type
Error
Returns

Nothing!

Type
undefined