wink-nlp-utils - Wink JS

Tokens

Methods

appendBigrams

appendBigrams(tokens) → {Array.<string>}

Generates bigrams from the input tokens and appends them to the input tokens.

Example

appendBigrams( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'he',
//      'acted',
//      'decisively',
//      'today',
//      'he_acted',
//      'acted_decisively',
//      'decisively_today' ]

Parameters

Name	Type	Description
tokens	Array.<string>	the input tokens.

Returns

the input tokens appended with their bigrams.

Type: Array.<string>

bagOfWords

bagOfWords(tokens, logCountsopt, ifnopt, idxopt) → {object}

Generates the bag of words from the input string. By default it uses word count as it's frequency; but if logCounts parameter is set to true then it will use log2( word counts + 1 ) as it's frequency. It also has an alias bow().

Example

bagOfWords( [ 'rain', 'rain', 'go', 'away' ] );
// -> { rain: 2, go: 1, away: 1 }
bow( [ 'rain', 'rain', 'go', 'away' ], true );
// -> { rain: 1.584962500721156, go: 1, away: 1 }

Parameters

Name	Type	Attributes	Default	Description
tokens	Array.<string>			the input tokens.
logCounts	number	<optional>	false	a true value flags the use of `log2( word count + 1 )` instead of just `word count` as frequency.
ifn	function	<optional>		a function to build index; it is called for every unique occurrence of word in `tokens`; and it receives the word and the `idx` as input arguments. The `build()` function of helper.returnIndexer may be used as `ifn`. If `undefined` then index is not built.
idx	number	<optional>		the index; passed as the second argument to the `ifn` function.

Returns

bag of words from tokens.

Type: object

bigrams

bigrams(tokens) → {Array.<string>}

Generates bigrams from the input tokens.

Example

bigrams( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ [ 'he', 'acted' ],
//      [ 'acted', 'decisively' ],
//      [ 'decisively', 'today' ] ]

Parameters

Name	Type	Description
tokens	Array.<string>	the input tokens.

Returns

the bigrams.

Type: Array.<string>

phonetize

phonetize(tokens) → {Array.<string>}

Phonetizes input tokens using using an algorithmic adaptation of Metaphone.

Example

phonetize( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'h', 'aktd', 'dssvl', 'td' ]

Parameters

Name	Type	Description
tokens	Array.<string>	the input tokens.

Returns

phonetized tokens.

Type: Array.<string>

propagateNegations

propagateNegations(tokens, uptoopt) → {Array.<string>}

It looks for negation tokens in the input array of tokens and propagates negation to subsequent upto tokens by prefixing them by a !. It is useful in handling text containing negations during tasks like similarity detection, classification or search.

Example

propagateNegations( [ 'mary', 'is', 'not', 'feeling', 'good', 'today' ] );
// -> [ 'mary', 'is', 'not', '!feeling', '!good', 'today' ]

Parameters

Name	Type	Attributes	Default	Description
tokens	Array.<string>			the input tokens.
upto	number	<optional>	2	number of tokens to be negated after the negation token. Note, tokens are only negated either `upto` tokens or up to the token preceeding the `, . ; : ! ?` punctuations.

Returns

tokens with negation propagated.

Type: Array.<string>

removeWords

removeWords(tokens, stopWordsopt) → {Array.<string>}

Removes the stop words from the input array of tokens.

Example

removeWords( [ 'this', 'is', 'a', 'cat' ] );
// -> [ 'cat' ]

Parameters

Name	Type	Attributes	Default	Description
tokens	Array.<string>			the input tokens.
stopWords	wordsFilter	<optional>	defaultStopWords	default stop words are loaded from `stop_words.json` located under the `src/dictionaries/` directory. Custom stop words can be created using helper.returnWordsFilter .

Returns

balance tokens.

Type: Array.<string>

setOfWords

setOfWords(tokens, ifnopt, idxopt) → {set}

Generates the set of words from the input string. It also has an alias sow().

Example

setOfWords( [ 'rain', 'rain', 'go', 'away' ] );
// -> Set { 'rain', 'go', 'away' }

Parameters

Name	Type	Attributes	Description
tokens	Array.<string>		the input tokens.
ifn	function	<optional>	a function to build index; it is called for every member word of the set ; and it receives the word and the `idx` as input arguments. The `build()` function of helper.returnIndexer may be used as `ifn`. If `undefined` then index is not built.
idx	number	<optional>	the index; passed as the second argument to the `ifn` function.

Returns

of words from tokens.

Type: set

soundex

soundex(tokens) → {Array.<string>}

Generates the soundex coded tokens from the input tokens.

Example

soundex( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'H000', 'A233', 'D221', 'T300' ]

Parameters

Name	Type	Description
tokens	Array.<string>	the input tokens.

Returns

soundex coded tokens.

Type: Array.<string>

stem

stem(tokens) → {Array.<string>}

Stems input tokens using Porter Stemming Algorithm Version 2.

Example

stem( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'he', 'act', 'decis', 'today' ]

Parameters

Name	Type	Description
tokens	Array.<string>	the input tokens.

Returns

stemmed tokens.

Type: Array.<string>