wink-nlp-utils - Wink JS

String

Methods

amplifyNotElision

amplifyNotElision(str) → {string}

Amplifies the not elision by converting it into not; for example isn't becomes is not.

Example

amplifyNotElision( "someone's wallet, isn't it?" );
// -> "someone's wallet, is not it?"

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after not elision amplification.

Type: string

bagOfNGrams

bagOfNGrams(str, sizeopt, ifnopt, idxopt) → {object}

Generates the bag of ngrams of size from the input string. The default size is 2, which means it will generate bag of bigrams by default. It also has an alias bong().

Example

bagOfNGrams( 'mama' );
// -> { ma: 2, am: 1 }
bong( 'mamma' );
// -> { ma: 2, am: 1, mm: 1 }

Parameters

Name	Type	Attributes	Default	Description
str	string			the input string.
size	number	<optional>	2	ngram size.
ifn	function	<optional>		a function to build index; it is called for every unique occurrence of ngram of `str`; and it receives the ngram and the `idx` as input arguments. The `build()` function of helper.returnIndexer may be used as `ifn`. If `undefined` then index is not built.
idx	number	<optional>		the index; passed as the second argument to the `ifn` function.

Returns

bag of ngrams of size from str.

Type: object

composeCorpus

composeCorpus(str) → {Array.<string>}

Generates all possible sentences from the input argument string. The string s must follow a special syntax as illustrated in the example below:
'[I] [am having|have] [a] [problem|question]'

Each phrase must be quoted between [ ] and each possible option of phrases (if any) must be separated by a | character. The corpus is composed by computing the cartesian product of all the phrases.

Example

composeCorpus( '[I] [am having|have] [a] [problem|question]' );
// -> [ 'I am having a problem',
//      'I am having a question',
//      'I have a problem',
//      'I have a question' ]

Parameters

Name	Type	Description
str	string	the input string.

Returns

of all possible sentences.

Type: Array.<string>

edgeNGrams

edgeNGrams(str, minopt, maxopt, deltaopt, ifnopt, idxopt) → {Array.<string>}

Generates the edge ngrams from the input string.

Example

edgeNGrams( 'decisively' );
// -> [ 'de', 'deci', 'decisi', 'decisive' ]
edgeNGrams( 'decisively', 8, 10, 1 );
// -> [ 'decisive', 'decisivel', 'decisively' ]

Parameters

Name	Type	Attributes	Default	Description
str	string			the input string.
min	number	<optional>	2	size of ngram generated.
max	number	<optional>	8	size of ngram is generated.
delta	number	<optional>	2	edge ngrams are generated in increments of this value.
ifn	function	<optional>		a function to build index; it is called for every edge ngram of `str`; and it receives the edge ngram and the `idx` as input arguments. The `build()` function of helper.returnIndexer may be used as `ifn`. If `undefined` then index is not built.
idx	number	<optional>		the index; passed as the second argument to the `ifn` function.

Returns

of edge ngrams.

Type: Array.<string>

extractPersonsName

extractPersonsName(str) → {string}

Attempts to extract person's name from input string. It assmues the following name format:
[<salutations>] <name part as FN [MN] [LN]> [<degrees>]
Entities in square brackets are optional. Note, it is not a named entity detection mechanism.

Example

extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
// -> 'Sarah Connor'

Parameters

Name	Type	Description
str	string	the input string.

Returns

extracted name.

Type: string

extractRunOfCapitalWords

extractRunOfCapitalWords(str) → {Array.<string>}

Extracts the array of text appearing as Title Case or in ALL CAPS from the input string.

Example

extractRunOfCapitalWords( 'In The Terminator, Sarah Connor is in Los Angeles' );
// -> [ 'In The Terminator', 'Sarah Connor', 'Los Angeles' ]

Parameters

Name	Type	Description
str	string	the input string.

Returns

of text appearing in Title Case or in ALL CAPS; if no such text is found then null is returned.

Type: Array.<string>

lowerCase

lowerCase(str) → {string}

Converts the input string to lower case.

Example

lowerCase( 'Lower Case' );
// -> 'lower case'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string in lower case.

Type: string

marker

marker(str) → {string}

Generates marker of the input string; it is defined as 1-gram, sorted and joined back as a string again. Marker is a quick and aggressive way to detect similarity between short strings. Its aggression may lead to more false positives such as Meter and Metre or no melon and no lemon.

Example

marker( 'the quick brown fox jumps over the lazy dog' );
// -> ' abcdefghijklmnopqrstuvwxyz'

Parameters

Name	Type	Description
str	string	the input string.

Returns

the marker.

Type: string

ngram

ngram(str, sizeopt) → {Array.<string>}

Generates an array of ngrams of a specified size from the input string. The default size is 2, which means it will generate bigrams by default.

Example

ngram( 'FRANCE' );
// -> [ 'FR', 'RA', 'AN', 'NC', 'CE' ]
ngram( 'FRENCH' );
// -> [ 'FR', 'RE', 'EN', 'NC', 'CH' ]
ngram( 'FRANCE', 3 );
// -> [ 'FRA', 'RAN', 'ANC', 'NCE' ]

Parameters

Name	Type	Attributes	Default	Description
str	string			the input string.
size	number	<optional>	2	ngram's size.

Returns

ngrams of size from str.

Type: Array.<string>

phonetize

phonetize(word) → {string}

Phonetizes the input string using an algorithmic adaptation of Metaphone; It is not an exact implementation of Metaphone.

Example

phonetize( 'perspective' );
// -> 'prspktv'
phonetize( 'phenomenon' );
// -> 'fnmnn'

Parameters

Name	Type	Description
word	string	the input word.

Returns

phonetic code of word.

Type: string

removeElisions

removeElisions(str) → {string}

Removes basic elisions found in the input string. Typical example of elisions are it's, let's, where's, I'd, I'm, I'll, I've, and Isn't etc. Note it retains apostrophe used to indicate possession.

Example

removeElisions( "someone's wallet, isn't it?" );
// -> "someone's wallet, is it?"

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after removal of elisions.

Type: string

removeExtraSpaces

removeExtraSpaces(str) → {string}

Removes leading, trailing and any extra in-between whitespaces from the input string.

Example

removeExtraSpaces( '   Padded   Text    ' );
// -> 'Padded Text'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after removal of leading, trailing and extra whitespaces.

Type: string

removeHTMLTags

removeHTMLTags(str) → {string}

Removes each HTML tag by replacing it with a whitespace.

Extra spaces, if required, may be removed using string.removeExtraSpaces function.

Example

removeHTMLTags( '<p>Vive la France&nbsp;&#160;!</p>' );
// -> ' Vive la France  ! '

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after removal of HTML tags.

Type: string

removePunctuations

removePunctuations(str) → {string}

Removes each punctuation mark by replacing it with a whitespace. It looks for the following punctuations — .,;!?:"!'... - () [] {}.

Extra spaces, if required, may be removed using string.removeExtraSpaces function.

Example

removePunctuations( 'Punctuations like "\'\',;!?:"!... are removed' );
// -> 'Punctuations like               are removed'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after removal of punctuations.

Type: string

removeSplChars

removeSplChars(str) → {string}

Removes each special character by replacing it with a whitespace. It looks for the following special characters — ~@#%^*+=.

Extra spaces, if required, may be removed using string.removeExtraSpaces function.

Example

removeSplChars( '4 + 4*2 = 12' );
// -> '4   4 2   12'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after removal of special characters.

Type: string

retainAlphaNums

retainAlphaNums(str) → {string}

Retains only apha, numerals, and removes all other characters from the input string, including leading, trailing and extra in-between whitespaces.

Example

retainAlphaNums( ' This, text here, has  (other) chars_! ' );
// -> 'This text here has other chars'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after removal of non-alphanumeric characters, leading, trailing and extra whitespaces.

Type: string

sentences

sentences(paragraph) → {Array.<string>}

Detects the sentence boundaries in the input paragraph and splits it into an array of sentence(s).

Example

sentences( 'AI Inc. is focussing on AI. I work for AI Inc. My mail is r2d2@yahoo.com' );
// -> [ 'AI Inc. is focussing on AI.',
//      'I work for AI Inc.',
//      'My mail is r2d2@yahoo.com' ]

sentences( 'U.S.A is my birth place. I was born on 06.12.1924. I climbed Mt. Everest.' );
// -> [ 'U.S.A is my birth place.',
//      'I was born on 06.12.1924.',
//      'I climbed Mt. Everest.' ]

Parameters

Name	Type	Description
paragraph	string	the input string.

Returns

of sentences.

Type: Array.<string>

setOfChars

setOfChars(str, ifnopt, idxopt) → {string}

Creates a set of chars from the input string s. This is useful in even more aggressive string matching using Jaccard or Tversky compared to marker(). It also has an alias soc().

Example

setOfChars( 'the quick brown fox jumps over the lazy dog' );
// -> ' abcdefghijklmnopqrstuvwxyz'

Parameters

Name	Type	Attributes	Description
str	string		the input string.
ifn	function	<optional>	a function to build index; it receives the first character of `str` and the `idx` as input arguments. The `build()` function of helper.returnIndexer may be used as `ifn`. If `undefined` then index is not built.
idx	number	<optional>	the index; passed as the second argument to the `ifn` function.

Returns

the soc.

Type: string

setOfNGrams

setOfNGrams(str, sizeopt, ifnopt, idxopt) → {set}

Generates the set of ngrams of size from the input string. The default size is 2, which means it will generate set of bigrams by default. It also has an alias song().

Example

setOfNGrams( 'mama' );
// -> Set { 'ma', 'am' }
song( 'mamma' );
// -> Set { 'ma', 'am', 'mm' }

Parameters

Name	Type	Attributes	Default	Description
str	string			the input string.
size	number	<optional>	2	ngram size.
ifn	function	<optional>		a function to build index; it is called for every unique occurrence of ngram of `str`; and it receives the ngram and the `idx` as input arguments. The `build()` function of helper.returnIndexer may be used as `ifn`. If `undefined` then index is not built.
idx	number	<optional>		the index; passed as the second argument to the `ifn` function.

Returns

of ngrams of size of str.

Type: set

soundex

soundex(word, maxLengthopt) → {string}

Produces the soundex code from the input word.

Example

soundex( 'Burroughs' );
// -> 'B620'
soundex( 'Burrows' );
// -> 'B620'

Parameters

Name	Type	Attributes	Default	Description
word	string			the input word.
maxLength	number	<optional>	4	of soundex code to be returned.

Returns

soundex code of word.

Type: string

splitElisions

splitElisions(str) → {string}

Splits basic elisions found in the input string. Typical example of elisions are it's, let's, where's, I'd, I'm, I'll, I've, and Isn't etc. Note it does not touch apostrophe used to indicate possession.

Example

splitElisions( "someone's wallet, isn't it?" );
// -> "someone's wallet, is n't it?"

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string after splitting of elisions.

Type: string

stem

stem(word) → {string}

Stems an inflected word using Porter2 stemming algorithm.

Example

stem( 'consisting' );
// -> 'consist'

Parameters

Name	Type	Description
word	string	to be stemmed.

Returns

the stemmed word.

Type: string

tokenize

tokenize(sentence, detailedopt) → {Array.<string>|Array.<object>}

Tokenizes the input sentence according to the value of detailed flag. Any occurance of ... in the sentence is converted to ellipses. In detailed = true mode, it tags every token with its type; the supported tags are word, number, url, email, mention, hashtag, emoji, emoticon, time, ordinal, currency, punctuation, symbol, and tabCFLF.

Example

tokenize( "someone's wallet, isn't it? I'll return!" );
// -> [ 'someone', '\'s', 'wallet', ',', 'is', 'n\'t', 'it', '?',
//      'I', '\'ll', 'return', '!' ]

tokenize( 'For details on wink, check out http://winkjs.org/ URL!', true );
// -> [ { value: 'For', tag: 'word' },
//      { value: 'details', tag: 'word' },
//      { value: 'on', tag: 'word' },
//      { value: 'wink', tag: 'word' },
//      { value: ',', tag: 'punctuation' },
//      { value: 'check', tag: 'word' },
//      { value: 'out', tag: 'word' },
//      { value: 'http://winkjs.org/', tag: 'url' },
//      { value: 'URL', tag: 'word' },
//      { value: '!', tag: 'punctuation' } ]

Parameters

Name	Type	Attributes	Default	Description
sentence	string			the input string.
detailed	boolean	<optional>	false	if true, each token is a object cotaining `value` and `tag` of each token; otherwise each token is a string. It's default value of false ensures compatibility with previous version.

Returns

an array of strings if detailed is false otherwise an array of objects.

Type: Array.<string> Array.<object>

tokenize0

tokenize0(str) → {Array.<string>}

Tokenizes by splitting the input string on non-words. This means tokens would consists of only alphas, numerals and underscores; all other characters will be stripped as they are treated as separators. It also removes all elisions; however negations are retained and amplified.

Example

tokenize0( "someone's wallet, isn't it?" );
// -> [ 'someone', 's', 'wallet', 'is', 'not', 'it' ]

Parameters

Name	Type	Description
str	string	the input string.

Returns

of tokens.

Type: Array.<string>

trim

trim(str) → {string}

Trims leading and trailing whitespaces from the input string.

Example

trim( '  Padded   ' );
// -> 'Padded'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string with leading & trailing whitespaces removed.

Type: string

upperCase

upperCase(str) → {string}

Converts the input string to upper case.

Example

upperCase( 'Upper Case' );
// -> 'UPPER CASE'

Parameters

Name	Type	Description
str	string	the input string.

Returns

input string in upper case.

Type: string