wink-pos-tagger - Wink JS

Methods

defineConfig

defineConfig() → {object}

This API has no effect. It has been maintained for compatibility purpose. The wink-tokenizer will now always add lemma and normal forms. Note, lemmas are added only for nouns (excluding proper noun), verbs and adjectives.

Example

// There will not be any effect:
var myTagger.defineConfig( { lemma: false } );
// -> { lemma: true, normal: true }

Returns

always as { lemma: true, normal: true }.

Type: object

tag

tag(tokens) → {Array.<object>}

Tags the input tokens with their pos. It has another alias – tagTokens().

In order to pos tag a sentence directly, use tagSentence API instead.

Example

// Get `tokenizer` method from the instance of `wink-tokenizer`.
var tokenize = require( 'wink-tokenizer' )().tokenize;
// Tag the tokenized sentence.
myTagger.tag( tokenize( 'I ate the entire pizza as I was feeling hungry.' ) );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'ate', tag: 'word', normal: 'ate', pos: 'VBD', lemma: 'eat' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'entire', tag: 'word', normal: 'entire', pos: 'JJ', lemma: 'entire' },
//      { value: 'pizza', tag: 'word', normal: 'pizza', pos: 'NN', lemma: 'pizza' },
//      { value: 'as', tag: 'word', normal: 'as', pos: 'IN' },
//      { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'was', tag: 'word', normal: 'was', pos: 'VBD', lemma: 'be' },
//      { value: 'feeling', tag: 'word', normal: 'feeling', pos: 'VBG', lemma: 'feel' },
//      { value: 'hungry', tag: 'word', normal: 'hungry', pos: 'JJ', lemma: 'hungry' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]

Parameters

Name	Type	Description
tokens	Array.<object>	to be pos tagged. They are array of objects and must follow the `wink-tokenizer` standard.

Returns

pos tagged tokens.

Type: Array.<object>

tagRawTokens

tagRawTokens(rawTokens) → {Array.<object>}

Tags the raw tokens with their pos. Note, it only categorizes each token in to one of the following 3-categories (a) word, or (b) punctuation, or (c) number.

In order to pos tag a sentence directly, use tagSentence API instead.

Example

var rawTokens = [ 'I', 'ate', 'the', 'entire', 'pizza', 'as', 'I', 'was', 'feeling', 'hungry', '.' ];
// Tag the raw tokens.
myTagger.tagRawTokens( rawTokens );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'ate', tag: 'word', normal: 'ate', pos: 'VBD', lemma: 'eat' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'entire', tag: 'word', normal: 'entire', pos: 'JJ', lemma: 'entire' },
//      { value: 'pizza', tag: 'word', normal: 'pizza', pos: 'NN', lemma: 'pizza' },
//      { value: 'as', tag: 'word', normal: 'as', pos: 'IN' },
//      { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'was', tag: 'word', normal: 'was', pos: 'VBD', lemma: 'be' },
//      { value: 'feeling', tag: 'word', normal: 'feeling', pos: 'VBG', lemma: 'feel' },
//      { value: 'hungry', tag: 'word', normal: 'hungry', pos: 'JJ', lemma: 'hungry' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]

Parameters

Name	Type	Description
rawTokens	Array.<string>	to be pos tagged. They are simple array of string.

Returns

pos tagged tokens.

Type: Array.<object>

tagSentence

tagSentence(sentence) → {Array.<object>}

Tags the input sentence with their pos.

Example

myTagger.tagSentence( 'A bear just crossed the road.' );
// -> [ { value: 'A', tag: 'word', normal: 'a', pos: 'DT' },
//      { value: 'bear', tag: 'word', normal: 'bear', pos: 'NN', lemma: 'bear' },
//      { value: 'just', tag: 'word', normal: 'just', pos: 'RB' },
//      { value: 'crossed', tag: 'word', normal: 'crossed', pos: 'VBD', lemma: 'cross' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'road', tag: 'word', normal: 'road', pos: 'NN', lemma: 'road' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]
//
//
myTagger.tagSentence( 'I will bear all the expenses.' );
// -> [ { value: 'I', tag: 'word', normal: 'i', pos: 'PRP' },
//      { value: 'will', tag: 'word', normal: 'will', pos: 'MD', lemma: 'will' },
//      { value: 'bear', tag: 'word', normal: 'bear', pos: 'VB', lemma: 'bear' },
//      { value: 'all', tag: 'word', normal: 'all', pos: 'PDT' },
//      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
//      { value: 'expenses', tag: 'word', normal: 'expenses', pos: 'NNS', lemma: 'expense' },
//      { value: '.', tag: 'punctuation', normal: '.', pos: '.' } ]

Parameters

Name	Type	Description
sentence	string	to be pos tagged.

Throws

if sentence is not a valid string.
Type Error

Returns

pos tagged tokens.

Type: Array.<object>

updateLexicon

updateLexicon(lexicon) → {undefined}

Updates the internal lexicon using the input lexicon. If a word/pos pair is found in the internal lexicon then it's value is updated with the new pos; otherwise it added.

Example

myTagger.updateLexicon( { Obama: [ 'NNP' ] } );

Parameters

Name	Type	Description
lexicon	object	containing `word/pos` pairs to be added to or replaced in the existing lexicon. The `pos` should be an array containing pos tags, with the first one as the most frequently used POS. The `word` is normalized before updating the internal lexicon.

Throws

if lexicon is not a valid JS object.
Type Error

Returns

Nothing!

Type: undefined