Leveraging out()
The out()
method produces appropriate JavaScript built-in datatypes depending on the usage. It is available universally at all levels — document, collection, and item. By default i.e without any input parameter, the out()
returns a string when it is applied to an item and an array of strings when it is applied to a collection. The behaviour of doc.out()
is similar to item.out()
as shown below:
const text = `Its quarterly profits jumped 76% to $1.13 billion for the three months to December, from $639million of previous year.`;
const doc = nlp.readDoc( text );
doc.out()
reproduces the original text:
doc.out()
// -> Its quarterly profits jumped 76% to
// $1.13 billion for the three months to
// December, from $639million of previous year.
The out()
method has two optional arguments — its.propertyName
and as.reducedValue
. These optional arguments are useful in information extraction:
- A token, entity, sentence or document has several contextual properties that are accessible via
its.propertyName
such asits.stopWordFlag
,its.shape
andits.vector
. - The collection of tokens and entities can be reduced to
as.freqTable
,as.bow
(bag of words), oras.bigrams
etc. withas.reducedValue
.
propertyName
in its.propertyName
can have a value such as stopWordFlag
, shape
or vector
. item.out()
While working with an item, any of its properties can be extracted by passing its.propertyName
parameter to the item.out()
method. For example doc.tokens().itemAt(0).out(its.shape)
would return Xxx
— the shape of zeroth token, "Its". Similarly doc.tokens().itemAt(0).out(its.case)
would return titleCase
.
its
is a helper and is required using the following statement:const its = require( 'wink-nlp/src/its.js' );
Each item type has several properties including few that are common across all types. The most prominent one is its.value
— the default for the out()
method. Another important common property, applicable to latin script languages such as English or French, is its.normal
. It is useful for obtaining the lower-cased value. It also has some language specific flavour — for example in english, apart from lower casing the token it also automatically maps the british spellings to its american equivalent if any.
A comprehensive list of properties is available in the reference section titled “its helper”. A select few are outlined below:
Type | Properties |
---|---|
Token |
|
Entity |
|
Sentence |
|
Document |
|
item.out()
method automatically falls back to the default i.e. its.value whenever the input parameter is invalid or the property does not apply to the item in question. For example doc.out(its.case)
would return the same as doc.out()
. This is useful in a variety of NLP tasks such as text pre-processing and information extraction. For example, extracting nouns from a sentence gives a rough sense of its context:
doc.tokens()
.filter(
// Exclude nouns inside an entity
(t) => !t.parentEntity() && t.out(its.pos) === 'NOUN'
)
.out();
// -> [ 'profits' ]
Let us take another example of text classification or intent detection. These sometimes require replacement of entity values by their types. Such replacements are helpful when each individual entity’s value is less semantically important compared to its type. These are typically required in addition to punctuation and stop word removal. Here is an example that illustrates how all of this can be easily achieved using the out()
method:
const processedTokens = [];
const detectedEntities = new Set();
doc.tokens()
.each( (t) => {
const pe = t.parentEntity();
if (pe && !detectedEntities.has(pe.index())) {
detectedEntities.add(pe.index());
processedTokens.push('#'+pe.out(its.type));
} else if (!pe && !t.out(its.stopWordFlag) &&
(t.out(its.type) === 'word'))
processedTokens.push(t.out(its.normal));
});
console.log( processedTokens );
// -> [ 'quarterly', 'profits', 'jumped', '#PERCENT', '#MONEY',
// '#DURATION', '#DATE', '#MONEY', '#DATE']
collection.out()
By default, collection.out()
method produces an array of strings, where collection can be of either sentences, entities, customEntities or tokens. For example:
// Each string in the array is an entity.
doc.entities().out()
// -> ['76%', '$1.13 billion', 'three months', 'December',
// '$639million', 'previous year']
The its.propertyName
parameter in this case acts like a mapper:
doc.entities().itemAt(1).tokens().out(its.type);
// -> [ 'currency', 'number', 'word' ]
doc.entities().itemAt(1).tokens().out(its.shape);
// -> [ '$', 'd.dd', 'xxxx' ]
Note its.shape
trims after any four consecutive identical shape patterns, which is why the shape of “billion” is “xxxx” and not “xxxxxxx”.
The collection.out()
method also accepts a second parameter — as.reducedValue
. Here “as” is another helper like “its”.
as
is a helper and is required using the following statement:const as = require( 'wink-nlp/src/as.js' );
The as.reducedValue
acts like a reducer and it defaults to as.array. Some of the “as” options are as.bow
(bag of words) and as.bigrams
. These reducers further simplify a number of common NLP tasks. Here is an example of bag of words creation:
const poem = `Rain, rain, go away
Come again another day!`;
const doc = nlp.readDoc( poem );
doc.tokens()
.filter(
(t) => !t.out(its.stopWordFlag) &&
(t.out(its.type) === 'word'))
.out(its.normal, as.bow);
// -> { rain: 2, away: 1, come: 1, day: 1 }
The out()
method plays an important role in winkNLP applications. Here is its summary:
-
The
item.out()
method acceptsits.propertyName
as a parameter, whose default value isits.value
, which is also the fall back if contextually invalid value is passed. -
The
doc.out()
method behaves likeitem.out()
. -
The
collection.out()
method has two parameters —its.propertyName
andas.reducedValue
— think of them as a mapper and reducer respectively. Their default values areits.value
andas.array
.