its helper

The winkNLP document, collection and item have several contextual properties that are accessible via its.propertyName, an argument of the out() method.

The bold part of the argument, i.e., propertyName, needs to be substituted with a value from the following table, according to requirement. For example, after substituting the value, the argument can be its.stopWordFlag, its.shape, its.vector, etc. The default value of this argument is its.value.

The method out( its.value ) returns a string when it is applied to a document or item and an array of strings when it is applied to a collection. The other properties follow the same behaviour.

For example, token.out( its.abbrevFlag ) returns a boolean value, whereas tokens().out( its.abbrevFlag ) returns a boolean array. This array holds the return value of each token.out( its.abbrevFlag ) for the tokens collection.

Examples:

const text = 'I work for AI Inc.';
const doc = nlp.readDoc( text );
// Check abbreviation flag for a token item
const t1 = doc.tokens().itemAt( 0 ) // -> I
console.log( t1.out(its.abbrevFlag) );
// -> false

const t2 = doc.tokens().itemAt( 4 ) // -> Inc.
console.log( t2.out(its.abbrevFlag) );
// -> true

// Check abbreviation flag for a token collection
console.log(doc.tokens().out(its.abbrevFlag));
// -> [ false, false, false, false, true ]
Convention: In the following table, if a propertyName applies to any winkNLP item type, then it applies to its corresponding collection as well. For example, as the table mentions that its.abbrevFlag applies to a token item, then it also applies to a token collection.

propertyName

propertyName Description Applies to Type
its.abbrevFlag true if a token is an abbreviation. Common abbreviations like honorifics, unit of measures, education qualifications and more are a part of the lexicon. All initials followed by a period sign are flagged as abbreviations, even if they are not a part of the lexicon. For example, “AI” will not be flagged as an abbreviation, but “A.I.” will be. Token boolean
its.case lowerCase, upperCase, titleCase or other (for digits, symbols, etc.) Token string
its.contractionFlag true if a token is a contraction. Here is an example in English: “Can’t stop watching Netflix” Token: Ca, contractionFlag: true Token: n't, contractionFlag: true Token: stop, contractionFlag: false Token boolean
its.detail The value and type of an entity together in an object. For example, {value: 'March 15, 1972', type: 'DATE' } Entity object
its.lemma Lemma of the word based on its pos tag.
its.markedUpText JavaScript string equivalent of text marked up using the markup() method Document, Sentence string
its.negationFlag true if a token is negated. In English, tokens could be negated by words like not, neither, nor, rarely, scarcely, etc. For example, for the text “I didn’t like it.” , the negationFlag will be true Document, Sentence, Token boolean
its.normal The lower-cased form of a token. In a document, sentence or entity whitespaces are limited to one. This property is applicable to latin script languages such as English or French. When used in english language, it maps the British spelling to its equivalent American spelling, if any. Refer to normalization rules in the language model. Document, Sentence, Entity, Token string
its.pos Part-of-speech tag of the token. Token string
its.precedingSpaces Whitespaces before a token in string format Token string
its.prefix First two characters of a token. For example, the token “time” will have the prefix “ti” Token string
its.readabilityStats Flesch Reading Ease Score (fres), list complex words and their count, reading time in mins & seconds, sentiment score and more. Document object
its.sentenceWiseImportance Sentence wise importance for all the sentences in the document in the form of an array of objects. Each object follows {index: number, importance: number} format, where index refers to 0-based sentence’s index and importance is a number between 0 (lowest importance) and 1 (highest importance). Document object[]
its.sentiment Sentiment score with a value between -1 and +1 Document, Sentence number
its.shape The shape of a token is mapped as follows: Each alphabet in a token is mapped as X or x depending on its case, digits are mapped as d and all other characters are mapped as themselves. The shape code is trimmed after any four consecutive identical shape patterns. An example in English, shape of token “Billion” will be “Xxxxx” and not “Xxxxxxx” Token string
its.span Indexes of the first and last token Document, Sentence, Entity number[]
its.stem Stem of the word according to the Porter Stemmer Algorithm version 2.
its.stopWordFlag true if a token is a stop word. The representative list of stop words are part of the language model. Token boolean
its.suffix Last three characters of a token. For example, the token “time” will have the suffix “ime” Token string
its.type The type of entity or token. For a complete list of entity and token types refer to the documentation of the desired language model. Entity, Token string
its.uniqueId The unique ID of a token. For example, “Hello world” will have the following tokens: Token: Hello, Unique ID: 34736 Token: World, Unique ID: 10980 Token number
its.value JavaScript sting equivalent. This is the default property. Document, Sentence, Entity, Token string


Some of the properties are language dependent such as it.stem. For details refer to language models

as helper

The out() method for a collection has two parameters — its.propertyName and as.reducedValue. As the name suggests, a collection can be reduced to frequency table, bag of words, etc via the as.reducedValue argument.

The bold part of the argument, i.e., reducedValue, needs to be substituted with a value from the following table, according to requirement. For example, after substituting the value, the argument can be as.freqTable, as.bigrams, as.array, etc. The default value of this argument is as.array.

reducedValue

reducedValue Description Applies to Type
as.array Collection reduced to JavaScript array This is the default reducer. Sentences, Entities, Tokens string[]
as.bigrams Tokens reduced to Bigrams Tokens 2D string[]
as.bow Bag of words of entities, entity types or tokens Entities, Tokens object
as.freqTable A frequency table in descending order Sentence, Entities, Tokens 2D number[]
as.set Collection reduced to a JavaScript set Entities, Tokens set{}
as.unique Unique array of entities, entity types or tokens Entities, Tokens string[]

Leave feedback