String
Methods
amplifyNotElision
Amplifies the not elision by converting it into not; for example isn't
becomes is not.
Example
amplifyNotElision( "someone's wallet, isn't it?" );
// -> "someone's wallet, is not it?"
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after not elision amplification.
- Type
 - string
 
bagOfNGrams
Generates the bag of ngrams of size from the input string. The
default size is 2, which means it will generate bag of bigrams by default. It
also has an alias bong().
Example
bagOfNGrams( 'mama' );
// -> { ma: 2, am: 1 }
bong( 'mamma' );
// -> { ma: 2, am: 1, mm: 1 }
    Parameters
| Name | Type | Attributes | Default | Description | 
|---|---|---|---|---|
| str | string | the input string.  | 
        ||
| size | number | 
                
                    <optional> | 
            
            
                2 | ngram size.  | 
        
| ifn | function | 
                
                    <optional> | 
            
            
                a function to build index; it is called for
every unique occurrence of ngram of   | 
        |
| idx | number | 
                
                    <optional> | 
            
            
                the index; passed as the second argument to the   | 
        
Returns
bag of ngrams of size from str.
- Type
 - object
 
composeCorpus
Generates all possible sentences from the input argument string.
The string s must follow a special syntax as illustrated in the
example below:
'[I] [am having|have] [a] [problem|question]'
Each phrase must be quoted between [ ] and each possible option of phrases
(if any) must be separated by a | character. The corpus is composed by
computing the cartesian product of all the phrases.
Example
composeCorpus( '[I] [am having|have] [a] [problem|question]' );
// -> [ 'I am having a problem',
//      'I am having a question',
//      'I have a problem',
//      'I have a question' ]
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
of all possible sentences.
- Type
 - Array.<string>
 
edgeNGrams
Generates the edge ngrams from the input string.
Example
edgeNGrams( 'decisively' );
// -> [ 'de', 'deci', 'decisi', 'decisive' ]
edgeNGrams( 'decisively', 8, 10, 1 );
// -> [ 'decisive', 'decisivel', 'decisively' ]
    Parameters
| Name | Type | Attributes | Default | Description | 
|---|---|---|---|---|
| str | string | the input string.  | 
        ||
| min | number | 
                
                    <optional> | 
            
            
                2 | size of ngram generated.  | 
        
| max | number | 
                
                    <optional> | 
            
            
                8 | size of ngram is generated.  | 
        
| delta | number | 
                
                    <optional> | 
            
            
                2 | edge ngrams are generated in increments of this value.  | 
        
| ifn | function | 
                
                    <optional> | 
            
            
                a function to build index; it is called for
every edge ngram of   | 
        |
| idx | number | 
                
                    <optional> | 
            
            
                the index; passed as the second argument to the   | 
        
Returns
of edge ngrams.
- Type
 - Array.<string>
 
extractPersonsName
Attempts to extract person's name from input string.
It assmues the following name format:
[<salutations>] <name part as FN [MN] [LN]> [<degrees>]
Entities in square brackets are optional. Note, it is not a
named entity detection mechanism.
Example
extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
// -> 'Sarah Connor'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
extracted name.
- Type
 - string
 
extractRunOfCapitalWords
Extracts the array of text appearing as Title Case or in ALL CAPS from the input string.
Example
extractRunOfCapitalWords( 'In The Terminator, Sarah Connor is in Los Angeles' );
// -> [ 'In The Terminator', 'Sarah Connor', 'Los Angeles' ]
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
of text appearing in Title Case or in ALL CAPS; if no such
text is found then null is returned.
- Type
 - Array.<string>
 
lowerCase
Converts the input string to lower case.
Example
lowerCase( 'Lower Case' );
// -> 'lower case'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string in lower case.
- Type
 - string
 
marker
Generates marker of the input string; it is defined as 1-gram, sorted
and joined back as a string again. Marker is a quick and aggressive way
to detect similarity between short strings. Its aggression may lead to more
false positives such as Meter and Metre or no melon and no lemon.
Example
marker( 'the quick brown fox jumps over the lazy dog' );
// -> ' abcdefghijklmnopqrstuvwxyz'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
the marker.
- Type
 - string
 
ngram
Generates an array of ngrams of a specified size from the input string. The default size is 2, which means it will generate bigrams by default.
Example
ngram( 'FRANCE' );
// -> [ 'FR', 'RA', 'AN', 'NC', 'CE' ]
ngram( 'FRENCH' );
// -> [ 'FR', 'RE', 'EN', 'NC', 'CH' ]
ngram( 'FRANCE', 3 );
// -> [ 'FRA', 'RAN', 'ANC', 'NCE' ]
    Parameters
| Name | Type | Attributes | Default | Description | 
|---|---|---|---|---|
| str | string | the input string.  | 
        ||
| size | number | 
                
                    <optional> | 
            
            
                2 | ngram's size.  | 
        
Returns
ngrams of size from str.
- Type
 - Array.<string>
 
phonetize
Phonetizes the input string using an algorithmic adaptation of Metaphone; It is not an exact implementation of Metaphone.
Example
phonetize( 'perspective' );
// -> 'prspktv'
phonetize( 'phenomenon' );
// -> 'fnmnn'
    Parameters
| Name | Type | Description | 
|---|---|---|
| word | string | the input word.  | 
        
Returns
phonetic code of word.
- Type
 - string
 
removeElisions
Removes basic elisions found in the input string. Typical example of elisions
are it's, let's, where's, I'd, I'm, I'll, I've, and Isn't etc. Note it retains
apostrophe used to indicate possession.
Example
removeElisions( "someone's wallet, isn't it?" );
// -> "someone's wallet, is it?"
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after removal of elisions.
- Type
 - string
 
removeExtraSpaces
Removes leading, trailing and any extra in-between whitespaces from the input string.
Example
removeExtraSpaces( '   Padded   Text    ' );
// -> 'Padded Text'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after removal of leading, trailing and extra whitespaces.
- Type
 - string
 
removeHTMLTags
Removes each HTML tag by replacing it with a whitespace.
Extra spaces, if required, may be removed using string.removeExtraSpaces function.
Example
removeHTMLTags( '<p>Vive la France  !</p>' );
// -> ' Vive la France  ! '
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after removal of HTML tags.
- Type
 - string
 
removePunctuations
Removes each punctuation mark by replacing it with a whitespace. It looks for
the following punctuations — .,;!?:"!'... - () [] {}.
Extra spaces, if required, may be removed using string.removeExtraSpaces function.
Example
removePunctuations( 'Punctuations like "\'\',;!?:"!... are removed' );
// -> 'Punctuations like               are removed'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after removal of punctuations.
- Type
 - string
 
removeSplChars
Removes each special character by replacing it with a whitespace. It looks for
the following special characters — ~@#%^*+=.
Extra spaces, if required, may be removed using string.removeExtraSpaces function.
Example
removeSplChars( '4 + 4*2 = 12' );
// -> '4   4 2   12'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after removal of special characters.
- Type
 - string
 
retainAlphaNums
Retains only apha, numerals, and removes all other characters from the input string, including leading, trailing and extra in-between whitespaces.
Example
retainAlphaNums( ' This, text here, has  (other) chars_! ' );
// -> 'This text here has other chars'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after removal of non-alphanumeric characters, leading, trailing and extra whitespaces.
- Type
 - string
 
sentences
Detects the sentence boundaries in the input paragraph and splits it into
an array of sentence(s).
Example
sentences( 'AI Inc. is focussing on AI. I work for AI Inc. My mail is r2d2@yahoo.com' );
// -> [ 'AI Inc. is focussing on AI.',
//      'I work for AI Inc.',
//      'My mail is r2d2@yahoo.com' ]
sentences( 'U.S.A is my birth place. I was born on 06.12.1924. I climbed Mt. Everest.' );
// -> [ 'U.S.A is my birth place.',
//      'I was born on 06.12.1924.',
//      'I climbed Mt. Everest.' ]
    Parameters
| Name | Type | Description | 
|---|---|---|
| paragraph | string | the input string.  | 
        
Returns
of sentences.
- Type
 - Array.<string>
 
setOfChars
Creates a set of chars from the input string s. This is useful
in even more aggressive string matching using Jaccard or Tversky compared to
marker(). It also has an alias soc().
Example
setOfChars( 'the quick brown fox jumps over the lazy dog' );
// -> ' abcdefghijklmnopqrstuvwxyz'
    Parameters
| Name | Type | Attributes | Description | 
|---|---|---|---|
| str | string | the input string.  | 
        |
| ifn | function | 
                
                    <optional> | 
            
            
            a function to build index; it receives the first
character of   | 
        
| idx | number | 
                
                    <optional> | 
            
            
            the index; passed as the second argument to the   | 
        
Returns
the soc.
- Type
 - string
 
setOfNGrams
Generates the set of ngrams of size from the input string. The
default size is 2, which means it will generate set of bigrams by default.
It also has an alias song().
Example
setOfNGrams( 'mama' );
// -> Set { 'ma', 'am' }
song( 'mamma' );
// -> Set { 'ma', 'am', 'mm' }
    Parameters
| Name | Type | Attributes | Default | Description | 
|---|---|---|---|---|
| str | string | the input string.  | 
        ||
| size | number | 
                
                    <optional> | 
            
            
                2 | ngram size.  | 
        
| ifn | function | 
                
                    <optional> | 
            
            
                a function to build index; it is called for
every unique occurrence of ngram of   | 
        |
| idx | number | 
                
                    <optional> | 
            
            
                the index; passed as the second argument to the   | 
        
Returns
of ngrams of size of str.
- Type
 - set
 
soundex
Produces the soundex code from the input word.
Example
soundex( 'Burroughs' );
// -> 'B620'
soundex( 'Burrows' );
// -> 'B620'
    Parameters
| Name | Type | Attributes | Default | Description | 
|---|---|---|---|---|
| word | string | the input word.  | 
        ||
| maxLength | number | 
                
                    <optional> | 
            
            
                4 | of soundex code to be returned.  | 
        
Returns
soundex code of word.
- Type
 - string
 
splitElisions
Splits basic elisions found in the input string. Typical example of elisions
are it's, let's, where's, I'd, I'm, I'll, I've, and Isn't etc. Note it does
not touch apostrophe used to indicate possession.
Example
splitElisions( "someone's wallet, isn't it?" );
// -> "someone's wallet, is n't it?"
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string after splitting of elisions.
- Type
 - string
 
stem
Stems an inflected word using Porter2 stemming algorithm.
Example
stem( 'consisting' );
// -> 'consist'
    Parameters
| Name | Type | Description | 
|---|---|---|
| word | string | to be stemmed.  | 
        
Returns
the stemmed word.
- Type
 - string
 
tokenize
Tokenizes the input sentence according to the value of detailed flag.
Any occurance of ... in the sentence is
converted to ellipses. In detailed = true mode, it
tags every token with its type; the supported tags are word, number, url, email,
mention, hashtag, emoji, emoticon, time, ordinal, currency, punctuation, symbol,
and tabCFLF.
Example
tokenize( "someone's wallet, isn't it? I'll return!" );
// -> [ 'someone', '\'s', 'wallet', ',', 'is', 'n\'t', 'it', '?',
//      'I', '\'ll', 'return', '!' ]
tokenize( 'For details on wink, check out http://winkjs.org/ URL!', true );
// -> [ { value: 'For', tag: 'word' },
//      { value: 'details', tag: 'word' },
//      { value: 'on', tag: 'word' },
//      { value: 'wink', tag: 'word' },
//      { value: ',', tag: 'punctuation' },
//      { value: 'check', tag: 'word' },
//      { value: 'out', tag: 'word' },
//      { value: 'http://winkjs.org/', tag: 'url' },
//      { value: 'URL', tag: 'word' },
//      { value: '!', tag: 'punctuation' } ]
    Parameters
| Name | Type | Attributes | Default | Description | 
|---|---|---|---|---|
| sentence | string | the input string.  | 
        ||
| detailed | boolean | 
                
                    <optional> | 
            
            
                false | if true, each token is a object cotaining
  | 
        
Returns
an array of strings if detailed is false otherwise
an array of objects.
- Type
 - Array.<string> Array.<object>
 
tokenize0
Tokenizes by splitting the input string on non-words. This means tokens would consists of only alphas, numerals and underscores; all other characters will be stripped as they are treated as separators. It also removes all elisions; however negations are retained and amplified.
Example
tokenize0( "someone's wallet, isn't it?" );
// -> [ 'someone', 's', 'wallet', 'is', 'not', 'it' ]
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
of tokens.
- Type
 - Array.<string>
 
trim
Trims leading and trailing whitespaces from the input string.
Example
trim( '  Padded   ' );
// -> 'Padded'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string with leading & trailing whitespaces removed.
- Type
 - string
 
upperCase
Converts the input string to upper case.
Example
upperCase( 'Upper Case' );
// -> 'UPPER CASE'
    Parameters
| Name | Type | Description | 
|---|---|---|
| str | string | the input string.  | 
        
Returns
input string in upper case.
- Type
 - string