string

string

Source:

String

Methods

amplifyNotElision(str) → {string}

Source:

Amplifies the not elision by converting it into not; for example isn't becomes is not.

Example
amplifyNotElision( "someone's wallet, isn't it?" );
// -> "someone's wallet, is not it?"
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after not elision amplification.

Type
string

bagOfNGrams(str, sizeopt, ifnopt, idxopt) → {object}

Source:

Generates the bag of ngrams of size from the input string. The default size is 2, which means it will generate bag of bigrams by default. It also has an alias bong().

Example
bagOfNGrams( 'mama' );
// -> { ma: 2, am: 1 }
bong( 'mamma' );
// -> { ma: 2, am: 1, mm: 1 }
Parameters:
Name Type Attributes Default Description
str string

the input string.

size number <optional>
2

ngram size.

ifn function <optional>

a function to build index; it is called for every unique occurrence of ngram of str; and it receives the ngram and the idx as input arguments. The build() function of helper.returnIndexer may be used as ifn. If undefined then index is not built.

idx number <optional>

the index; passed as the second argument to the ifn function.

Returns:

bag of ngrams of size from str.

Type
object

composeCorpus(str) → {Array.<string>}

Source:

Generates all possible sentences from the input argument string. The string s must follow a special syntax as illustrated in the example below:
'[I] [am having|have] [a] [problem|question]'

Each phrase must be quoted between [ ] and each possible option of phrases (if any) must be separated by a | character. The corpus is composed by computing the cartesian product of all the phrases.

Example
composeCorpus( '[I] [am having|have] [a] [problem|question]' );
// -> [ 'I am having a problem',
//      'I am having a question',
//      'I have a problem',
//      'I have a question' ]
Parameters:
Name Type Description
str string

the input string.

Returns:

of all possible sentences.

Type
Array.<string>

edgeNGrams(str, minopt, maxopt, deltaopt, ifnopt, idxopt) → {Array.<string>}

Source:

Generates the edge ngrams from the input string.

Example
edgeNGrams( 'decisively' );
// -> [ 'de', 'deci', 'decisi', 'decisive' ]
edgeNGrams( 'decisively', 8, 10, 1 );
// -> [ 'decisive', 'decisivel', 'decisively' ]
Parameters:
Name Type Attributes Default Description
str string

the input string.

min number <optional>
2

size of ngram generated.

max number <optional>
8

size of ngram is generated.

delta number <optional>
2

edge ngrams are generated in increments of this value.

ifn function <optional>

a function to build index; it is called for every edge ngram of str; and it receives the edge ngram and the idx as input arguments. The build() function of helper.returnIndexer may be used as ifn. If undefined then index is not built.

idx number <optional>

the index; passed as the second argument to the ifn function.

Returns:

of edge ngrams.

Type
Array.<string>

extractPersonsName(str) → {string}

Source:

Attempts to extract person's name from input string. It assmues the following name format:
[<salutations>] <name part as FN [MN] [LN]> [<degrees>]
Entities in square brackets are optional.

Example
extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
// -> 'Sarah Connor'
Parameters:
Name Type Description
str string

the input string.

Returns:

extracted name.

Type
string

extractRunOfCapitalWords(str) → {Array.<string>}

Source:

Extracts the array of text appearing as Title Case or in ALL CAPS from the input string.

Example
extractRunOfCapitalWords( 'In The Terminator, Sarah Connor is in Los Angeles' );
// -> [ 'In The Terminator', 'Sarah Connor', 'Los Angeles' ]
Parameters:
Name Type Description
str string

the input string.

Returns:

of text appearing in Title Case or in ALL CAPS; if no such text is found then null is returned.

Type
Array.<string>

lowerCase(str) → {string}

Source:

Converts the input string to lower case.

Example
lowerCase( 'Lower Case' );
// -> 'lower case'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string in lower case.

Type
string

marker(str) → {string}

Source:

Generates marker of the input string; it is defined as 1-gram, sorted and joined back as a string again. Marker is a quick and aggressive way to detect similarity between short strings. Its aggression may lead to more false positives such as Meter and Metre or no melon and no lemon.

Example
marker( 'the quick brown fox jumps over the lazy dog' );
// -> ' abcdefghijklmnopqrstuvwxyz'
Parameters:
Name Type Description
str string

the input string.

Returns:

the marker.

Type
string

ngram(str, sizeopt) → {Array.<string>}

Source:

Generates an array of ngrams of a specified size from the input string. The default size is 2, which means it will generate bigrams by default.

Example
ngram( 'FRANCE' );
// -> [ 'FR', 'RA', 'AN', 'NC', 'CE' ]
ngram( 'FRENCH' );
// -> [ 'FR', 'RE', 'EN', 'NC', 'CH' ]
ngram( 'FRANCE', 3 );
// -> [ 'FRA', 'RAN', 'ANC', 'NCE' ]
Parameters:
Name Type Attributes Default Description
str string

the input string.

size number <optional>
2

ngram's size.

Returns:

ngrams of size from str.

Type
Array.<string>

phonetize(word) → {string}

Source:

Phonetizes the input string using an algorithmic adaptation of Metaphone; It is not an exact implementation of Metaphone.

Example
phonetize( 'perspective' );
// -> 'prspktv'
phonetize( 'phenomenon' );
// -> 'fnmnn'
Parameters:
Name Type Description
word string

the input word.

Returns:

phonetic code of word.

Type
string

removeElisions(str) → {string}

Source:

Removes basic elisions found in the input string. Typical example of elisions are it's, let's, where's, I'd, I'm, I'll, I've, and Isn't etc. Note it retains apostrophe used to indicate possession.

Example
removeElisions( "someone's wallet, isn't it?" );
// -> "someone's wallet, is it?"
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after removal of elisions.

Type
string

removeExtraSpaces(str) → {string}

Source:

Removes leading, trailing and any extra in-between whitespaces from the input string.

Example
removeExtraSpaces( '   Padded   Text    ' );
// -> 'Padded Text'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after removal of leading, trailing and extra whitespaces.

Type
string

removeHTMLTags(str) → {string}

Source:

Removes each HTML tag by replacing it with a whitespace.

Extra spaces, if required, may be removed using string.removeExtraSpaces function.

Example
removeHTMLTags( '<p>Vive la France&nbsp;&#160;!</p>' );
// -> ' Vive la France  ! '
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after removal of HTML tags.

Type
string

removePunctuations(str) → {string}

Source:

Removes each punctuation mark by replacing it with a whitespace. It looks for the following punctuations — .,;!?:"!'... - () [] {}.

Extra spaces, if required, may be removed using string.removeExtraSpaces function.

Example
removePunctuations( 'Punctuations like "\'\',;!?:"!... are removed' );
// -> 'Punctuations like               are removed'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after removal of punctuations.

Type
string

removeSplChars(str) → {string}

Source:

Removes each special character by replacing it with a whitespace. It looks for the following special characters — ~@#%^*+=.

Extra spaces, if required, may be removed using string.removeExtraSpaces function.

Example
removeSplChars( '4 + 4*2 = 12' );
// -> '4   4 2   12'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after removal of special characters.

Type
string

retainAlphaNums(str) → {string}

Source:

Retains only apha, numerals, and removes all other characters from the input string, including leading, trailing and extra in-between whitespaces.

Example
retainAlphaNums( ' This, text here, has  (other) chars_! ' );
// -> 'This text here has other chars'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after removal of non-alphanumeric characters, leading, trailing and extra whitespaces.

Type
string

sentences(paragraph) → {Array.<string>}

Source:

Detects the sentence boundaries in the input paragraph and splits it into an array of sentence(s).

Example
sentences( 'AI Inc. is focussing on AI. I work for AI Inc. My mail is r2d2@yahoo.com' );
// -> [ 'AI Inc. is focussing on AI.',
//      'I work for AI Inc.',
//      'My mail is r2d2@yahoo.com' ]

sentences( 'U.S.A is my birth place. I was born on 06.12.1924. I climbed Mt. Everest.' );
// -> [ 'U.S.A is my birth place.',
//      'I was born on 06.12.1924.',
//      'I climbed Mt. Everest.' ]
Parameters:
Name Type Description
paragraph string

the input string.

Returns:

of sentences.

Type
Array.<string>

setOfChars(str, ifnopt, idxopt) → {string}

Source:

Creates a set of chars from the input string s. This is useful in even more aggressive string matching using Jaccard or Tversky compared to marker(). It also has an alias soc().

Example
setOfChars( 'the quick brown fox jumps over the lazy dog' );
// -> ' abcdefghijklmnopqrstuvwxyz'
Parameters:
Name Type Attributes Description
str string

the input string.

ifn function <optional>

a function to build index; it receives the first character of str and the idx as input arguments. The build() function of helper.returnIndexer may be used as ifn. If undefined then index is not built.

idx number <optional>

the index; passed as the second argument to the ifn function.

Returns:

the soc.

Type
string

setOfNGrams(str, sizeopt, ifnopt, idxopt) → {set}

Source:

Generates the set of ngrams of size from the input string. The default size is 2, which means it will generate set of bigrams by default. It also has an alias song().

Example
setOfNGrams( 'mama' );
// -> Set { 'ma', 'am' }
song( 'mamma' );
// -> Set { 'ma', 'am', 'mm' }
Parameters:
Name Type Attributes Default Description
str string

the input string.

size number <optional>
2

ngram size.

ifn function <optional>

a function to build index; it is called for every unique occurrence of ngram of str; and it receives the ngram and the idx as input arguments. The build() function of helper.returnIndexer may be used as ifn. If undefined then index is not built.

idx number <optional>

the index; passed as the second argument to the ifn function.

Returns:

of ngrams of size of str.

Type
set

soundex(word, maxLengthopt) → {string}

Source:

Produces the soundex code from the input word.

Example
soundex( 'Burroughs' );
// -> 'B620'
soundex( 'Burrows' );
// -> 'B620'
Parameters:
Name Type Attributes Default Description
word string

the input word.

maxLength number <optional>
4

of soundex code to be returned.

Returns:

soundex code of word.

Type
string

splitElisions(str) → {string}

Source:

Splits basic elisions found in the input string. Typical example of elisions are it's, let's, where's, I'd, I'm, I'll, I've, and Isn't etc. Note it does not touch apostrophe used to indicate possession.

Example
splitElisions( "someone's wallet, isn't it?" );
// -> "someone's wallet, is n't it?"
Parameters:
Name Type Description
str string

the input string.

Returns:

input string after splitting of elisions.

Type
string

stem(word) → {string}

Source:

Stems an inflected word using Porter2 stemming algorithm.

Example
stem( 'consisting' );
// -> 'consist'
Parameters:
Name Type Description
word string

to be stemmed.

Returns:

the stemmed word.

Type
string

tokenize(sentence, detailedopt) → {Array.<string>|Array.<object>}

Source:

Tokenizes the input sentence according to the value of detailed flag. Any occurance of ... in the sentence is converted to ellipses. In detailed = true mode, it tags every token with its type; the supported tags are currency, email, emoji, emoticon, hashtag, number, ordinal, punctuation, quoted_phrase, symbol, time, mention, url, and word.

Example
tokenize( "someone's wallet, isn't it? I'll return!" );
// -> [ 'someone', '\'s', 'wallet', ',', 'is', 'n\'t', 'it', '?',
//      'I', '\'ll', 'return', '!' ]

tokenize( 'For details on wink, check out http://winkjs.org/ URL!', true );
// -> [ { value: 'For', tag: 'word' },
//      { value: 'details', tag: 'word' },
//      { value: 'on', tag: 'word' },
//      { value: 'wink', tag: 'word' },
//      { value: ',', tag: 'punctuation' },
//      { value: 'check', tag: 'word' },
//      { value: 'out', tag: 'word' },
//      { value: 'http://winkjs.org/', tag: 'url' },
//      { value: 'URL', tag: 'word' },
//      { value: '!', tag: 'punctuation' } ]
Parameters:
Name Type Attributes Default Description
sentence string

the input string.

detailed boolean <optional>
false

if true, each token is a object cotaining value and tag of each token; otherwise each token is a string. It's default value of false ensures compatibility with previous version.

Returns:

an array of strings if detailed is false otherwise an array of objects.

Type
Array.<string> | Array.<object>

tokenize0(str) → {Array.<string>}

Source:

Tokenizes by splitting the input string on non-words. This means tokens would consists of only alphas, numerals and underscores; all other characters will be stripped as they are treated as separators. It also removes all elisions; however negations are retained and amplified.

Example
tokenize0( "someone's wallet, isn't it?" );
// -> [ 'someone', 's', 'wallet', 'is', 'not', 'it' ]
Parameters:
Name Type Description
str string

the input string.

Returns:

of tokens.

Type
Array.<string>

trim(str) → {string}

Source:

Trims leading and trailing whitespaces from the input string.

Example
trim( '  Padded   ' );
// -> 'Padded'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string with leading & trailing whitespaces removed.

Type
string

upperCase(str) → {string}

Source:

Converts the input string to upper case.

Example
upperCase( 'Upper Case' );
// -> 'UPPER CASE'
Parameters:
Name Type Description
str string

the input string.

Returns:

input string in upper case.

Type
string