its
helper
The winkNLP document, collection and item have several contextual properties that are accessible via its.propertyName
, an argument of the out()
method.
The bold part of the argument, i.e., propertyName
, needs to be substituted with a value from the following table, according to requirement. For example, after substituting the value, the argument can be its.stopWordFlag
, its.shape
, its.vector
, etc. The default value of this argument is its.value
.
The method out( its.value )
returns a string when it is applied to a document
or item
and an array of strings when it is applied to a collection
. The other properties follow the same behaviour.
For example, token.out( its.abbrevFlag )
returns a boolean
value, whereas tokens().out( its.abbrevFlag )
returns a boolean
array. This array holds the return value of each token.out( its.abbrevFlag )
for the tokens collection.
Examples:
const text = 'I work for AI Inc.';
const doc = nlp.readDoc( text );
// Check abbreviation flag for a token item
const t1 = doc.tokens().itemAt( 0 ) // -> I
console.log( t1.out(its.abbrevFlag) );
// -> false
const t2 = doc.tokens().itemAt( 4 ) // -> Inc.
console.log( t2.out(its.abbrevFlag) );
// -> true
// Check abbreviation flag for a token collection
console.log(doc.tokens().out(its.abbrevFlag));
// -> [ false, false, false, false, true ]
propertyName
applies to any winkNLP item type, then it applies to its corresponding collection as well. For example, as the table mentions that its.abbrevFlag
applies to a token item, then it also applies to a token collection. propertyName
propertyName | Description | Applies to | Type |
---|---|---|---|
its.abbrevFlag | true if a token is an abbreviation. Common abbreviations like honorifics, unit of measures, education qualifications and more are a part of the lexicon. All initials followed by a period sign are flagged as abbreviations, even if they are not a part of the lexicon. For example, “AI” will not be flagged as an abbreviation, but “A.I.” will be. | Token | boolean |
its.case | lowerCase, upperCase, titleCase or other (for digits, symbols, etc.) | Token | string |
its.contractionFlag | true if a token is a contraction. Here is an example in English: “Can’t stop watching Netflix” Token: Ca, contractionFlag: true Token: n't, contractionFlag: true Token: stop, contractionFlag: false | Token | boolean |
its.detail | The value and type of an entity together in an object. For example, {value: 'March 15, 1972', type: 'DATE' } | Entity | object |
its.lemma | Lemma of the word based on its pos tag. | — | — |
its.markedUpText | JavaScript string equivalent of text marked up using the markup() method | Document, Sentence | string |
its.negationFlag | true if a token is negated. In English, tokens could be negated by words like not, neither, nor, rarely, scarcely, etc. For example, for the text “I didn’t like it.” , the negationFlag will be true | Document, Sentence, Token | boolean |
its.normal | The lower-cased form of a token. In a document, sentence or entity whitespaces are limited to one. This property is applicable to latin script languages such as English or French. When used in english language, it maps the British spelling to its equivalent American spelling, if any. Refer to normalization rules in the language model. | Document, Sentence, Entity, Token | string |
its.pos | Part-of-speech tag of the token. | Token | string |
its.precedingSpaces | Whitespaces before a token in string format | Token | string |
its.prefix | First two characters of a token. For example, the token “time” will have the prefix “ti” | Token | string |
its.readabilityStats | Flesch Reading Ease Score (fres), list complex words and their count, reading time in mins & seconds, sentiment score and more. | Document | object |
its.sentenceWiseImportance | Sentence wise importance for all the sentences in the document in the form of an array of objects. Each object follows {index: number, importance: number} format, where index refers to 0-based sentence’s index and importance is a number between 0 (lowest importance) and 1 (highest importance). | Document | object[] |
its.sentiment | Sentiment score with a value between -1 and +1 | Document, Sentence | number |
its.shape | The shape of a token is mapped as follows: Each alphabet in a token is mapped as X or x depending on its case, digits are mapped as d and all other characters are mapped as themselves. The shape code is trimmed after any four consecutive identical shape patterns. An example in English, shape of token “Billion” will be “Xxxxx” and not “Xxxxxxx” | Token | string |
its.span | Indexes of the first and last token | Document, Sentence, Entity | number[] |
its.stem | Stem of the word according to the Porter Stemmer Algorithm version 2. | — | — |
its.stopWordFlag | true if a token is a stop word. The representative list of stop words are part of the language model. | Token | boolean |
its.suffix | Last three characters of a token. For example, the token “time” will have the suffix “ime” | Token | string |
its.type | The type of entity or token. For a complete list of entity and token types refer to the documentation of the desired language model. | Entity, Token | string |
its.uniqueId | The unique ID of a token. For example, “Hello world” will have the following tokens: Token: Hello, Unique ID: 34736 Token: World, Unique ID: 10980 | Token | number |
its.value | JavaScript sting equivalent. This is the default property. | Document, Sentence, Entity, Token | string |
it.stem
. For details refer to language models as
helper
The out()
method for a collection has two parameters — its.
and as.reducedValue
. As the name suggests, a collection can be reduced to frequency table, bag of words, etc via the as.reducedValue
argument.
The bold part of the argument, i.e., reducedValue
, needs to be substituted with a value from the following table, according to requirement. For example, after substituting the value, the argument can be as.freqTable
, as.bigrams
, as.array
, etc. The default value of this argument is as.array
.
reducedValue
reducedValue | Description | Applies to | Type |
---|---|---|---|
as.array | Collection reduced to JavaScript array This is the default reducer. | Sentences, Entities, Tokens | string[] |
as.bigrams | Tokens reduced to Bigrams | Tokens | 2D string[] |
as.bow | Bag of words of entities, entity types or tokens | Entities, Tokens | object |
as.freqTable | A frequency table in descending order | Sentence, Entities, Tokens | 2D number[] |
as.set | Collection reduced to a JavaScript set | Entities, Tokens | set{} |
as.unique | Unique array of entities, entity types or tokens | Entities, Tokens | string[] |