Tokens
Methods
appendBigrams
Generates bigrams from the input tokens and appends them to the input tokens.
Example
appendBigrams( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'he',
// 'acted',
// 'decisively',
// 'today',
// 'he_acted',
// 'acted_decisively',
// 'decisively_today' ]
Parameters
Name | Type | Description |
---|---|---|
tokens | Array.<string> | the input tokens. |
Returns
the input tokens appended with their bigrams.
- Type
- Array.<string>
bagOfWords
Generates the bag of words from the input string. By default it
uses word count
as it's frequency; but if logCounts
parameter is set to true then
it will use log2( word counts + 1 )
as it's frequency. It also has an alias bow()
.
Example
bagOfWords( [ 'rain', 'rain', 'go', 'away' ] );
// -> { rain: 2, go: 1, away: 1 }
bow( [ 'rain', 'rain', 'go', 'away' ], true );
// -> { rain: 1.584962500721156, go: 1, away: 1 }
Parameters
Name | Type | Attributes | Default | Description |
---|---|---|---|---|
tokens | Array.<string> | the input tokens. |
||
logCounts | number |
<optional> |
false | a true value flags the use of |
ifn | function |
<optional> |
a function to build index; it is called for
every unique occurrence of word in |
|
idx | number |
<optional> |
the index; passed as the second argument to the |
Returns
bag of words from tokens.
- Type
- object
bigrams
Generates bigrams from the input tokens.
Example
bigrams( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ [ 'he', 'acted' ],
// [ 'acted', 'decisively' ],
// [ 'decisively', 'today' ] ]
Parameters
Name | Type | Description |
---|---|---|
tokens | Array.<string> | the input tokens. |
Returns
the bigrams.
- Type
- Array.<string>
phonetize
Phonetizes input tokens using using an algorithmic adaptation of Metaphone.
Example
phonetize( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'h', 'aktd', 'dssvl', 'td' ]
Parameters
Name | Type | Description |
---|---|---|
tokens | Array.<string> | the input tokens. |
Returns
phonetized tokens.
- Type
- Array.<string>
propagateNegations
It looks for negation tokens in the input array of tokens and propagates
negation to subsequent upto
tokens by prefixing them by a !
. It is useful
in handling text containing negations during tasks like similarity detection,
classification or search.
Example
propagateNegations( [ 'mary', 'is', 'not', 'feeling', 'good', 'today' ] );
// -> [ 'mary', 'is', 'not', '!feeling', '!good', 'today' ]
Parameters
Name | Type | Attributes | Default | Description |
---|---|---|---|---|
tokens | Array.<string> | the input tokens. |
||
upto | number |
<optional> |
2 | number of tokens to be negated after the negation
token. Note, tokens are only negated either |
Returns
tokens with negation propagated.
- Type
- Array.<string>
removeWords
Removes the stop words from the input array of tokens.
Example
removeWords( [ 'this', 'is', 'a', 'cat' ] );
// -> [ 'cat' ]
Parameters
Name | Type | Attributes | Default | Description |
---|---|---|---|---|
tokens | Array.<string> | the input tokens. |
||
stopWords | wordsFilter |
<optional> |
defaultStopWords | default stop words are
loaded from |
Returns
balance tokens.
- Type
- Array.<string>
setOfWords
Generates the set of words from the input string. It also has an alias sow()
.
Example
setOfWords( [ 'rain', 'rain', 'go', 'away' ] );
// -> Set { 'rain', 'go', 'away' }
Parameters
Name | Type | Attributes | Description |
---|---|---|---|
tokens | Array.<string> | the input tokens. |
|
ifn | function |
<optional> |
a function to build index; it is called for
every **member word of the set **; and it receives the word and the |
idx | number |
<optional> |
the index; passed as the second argument to the |
Returns
of words from tokens.
- Type
- set
soundex
Generates the soundex coded tokens from the input tokens.
Example
soundex( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'H000', 'A233', 'D221', 'T300' ]
Parameters
Name | Type | Description |
---|---|---|
tokens | Array.<string> | the input tokens. |
Returns
soundex coded tokens.
- Type
- Array.<string>
stem
Stems input tokens using Porter Stemming Algorithm Version 2.
Example
stem( [ 'he', 'acted', 'decisively', 'today' ] );
// -> [ 'he', 'act', 'decis', 'today' ]
Parameters
Name | Type | Description |
---|---|---|
tokens | Array.<string> | the input tokens. |
Returns
stemmed tokens.
- Type
- Array.<string>