The document provides different views — dynamically — based on your context. It can be viewed as a collection of tokens while in another context it could be a collection of sentences or a collection of named entities such as time, date, or URLs. It lets you access these in a flexible manner. Consider the following text:
const text = `On July 20, 1969, a voice crackled from the speakers. He said simply, "the Eagle has landed." They spent nearly 21 hours on the lunar surface. 20% of the world\'s population watched humans walk on Moon.`; const doc = nlp.readDoc(text);
The document has
tokens() methods to obtain their collection:
doc.sentences().out(); // Returns: // [ 'On July 20, 1969, a voice crackled from the speakers.', // 'He said simply, "the Eagle has landed."', // 'They spent nearly 21 hours on the lunar surface.', // '20% of the world\'s population watched humans walk on Moon.' // ] doc.entities().out(); // Returns: // [ 'July 20, 1969', 'nearly 21 hours', '20%' ] doc.tokens().out(); // Returns: // [ 'On', 'July', '20', ',', ... 'walk', 'on', 'Moon', '.' ]
Each element of a collection is referred to as an item. In other words a single_ token, entity, or sentence is an item. An item is accessed via the
itemAt(n) method, where
doc.entities().itemAt(1).out(); // Returns: // 'nearly 21 hours'
out() was called on an item, it automatically returned a string instead of an array.
out()returns a string when it is applied to an item and an array of strings when it is applied to a collection.
Next, let’s look at what a single sentence or entity might look like:
doc.sentences().itemAt(0).entities().out(); // Returns: // [ 'July 20, 1969' ] doc.sentences() // Collection of all sentences. .itemAt(0) // Its 0th sentence. .entities() // Collection of entities in sentence #0. .itemAt(0) // Its 0th entity. .tokens() // Collection of tokens in entity #0. .out(); // Array of tokens in 0th entity of // 0th sentence of the document! // Returns: // [ 'July', '20', ',', '1969' ]
An attempt to access a non-existent item using
doc.sentences().itemAt(-1); // Returns: // undefined
pipeConfig()method, which returns the currently active processing pipeline based on the loaded language model.
In essence, a document is composed of collections of sentences, named entities, and tokens. Collections and items along with their methods are explained in the next section.