Methods
defineConfig
Defines the criteria to ignore one or more tokens during entity detection. The criteria is specified in terms of array of specific tags and/or values to ignore; this means if any of the listed tag or value is found in a token, it is ignored and it’s value is not considered during entity recognition.
For example by including punctuation
in the array of tags to ignore,
tokens containing punctuations like -
or .
will be skipped. This will
result in recognition of kg and k.g. as kg (kilogram symbol)
or Guinea-Bissau and Guinea Bissau as Guinea-Bissau
(a country in West Africa).
Example
// Do not ignore anything!
myNER.defineConfig( { tagsToIgnore: [], ignoreDiacritics: false } );
// -> { tagsToIgnore: [], valuesToIgnore: [], ignoreDiacritics: false }
// Ignore only '-' and '.'
myNER.defineConfig( {
tagsToIgnore: [],
valuesToIgnore: [ '-', '.' ],
ignoreDiacritics: false
} );
// -> {
// tagsToIgnore: [],
// valuesToIgnore: [ '-', '.' ],
// ignoreDiacritics: false
// }
Parameters
Name | Type | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
config | object | — defines the An empty config object is equivalent to setting default configuration. The table below details the properties of Properties
|
Throws
-
-
if
valuesToIgnore
is not an array of strings. - Type
- error
-
-
-
if
tagsToIgnore
is not an array of strings. - Type
- error
-
Returns
a copy of configuration defined.
- Type
- object
exportJSON
Exports the JSON of the learnings generated by learn()
, which may be
saved in a file that may be used later for NER purpose.
Example
var learnings = myNER.exportJSON();
Returns
of the learnings.
- Type
- json
importJSON
Imports the ner learnings from an already exported ner learnings via the
exportJSON()
.
Example
var myNER = ner();
// Assuming that `json` has valid learnings.
myNER.importJSON( json );
Parameters
Name | Type | Description |
---|---|---|
json | json | — containg an earlier exported learnings in JSON format. |
Throws
-
if invalid JSON is encountered.
- Type
- error
Returns
always true
.
- Type
- boolean
learn
Learns the entities that must be detected via recognize()/predict()
API
calls in a sentence that has been already tokenized either using
wink-tokenizer or follows
it's token format.
It can be used to learn or update learnings incrementally; but it can not be used to unlearn or delete one or more entities.
If duplicate entity definitions are enountered then all the entries except the last one are ignored.
Acronyms must be added with space between each character; for example USA
should be added as 'u s a'
— this ensure correct detection of
U S A
or U. S. A.
or U.S.A.
as USA
[Refer to the example below].
Example
var trainingData = [
{ text: 'manchester united', entityType: 'club', uid: 'manu' },
{ text: 'manchester', entityType: 'city' },
{ text: 'U K', entityType: 'country', uid: 'uk' }
];
myNER.learn( trainingData );
// -> 3
Parameters
Name | Type | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
entities | Array.<object> | — where each element defines an entity via
two mandatory properties viz. In addition to these two properties, you may optionally define two more
properties viz. Note: Apart from the above mentioned properties, you may also define additional properties .
Such properties, along with their values, will be copied to the output token as-is for consumption
by any down stream code in the NLP pipe. An example use-case is pos tagging.
You can define pos property in an entity defition as
Properties
|
Returns
of actual entities learned.
- Type
- number
recognize
Recognizes entities in the input tokens.
Any token(s), which is recognized
as an entity, will automatically receive the properties that have been defined
for the detected entity using learn()
. If a set of tokens together
are recognized as a single entity, then they are merged in to a single
token; the merged tokens value
property becomes the concatenation of all
the values
from merged tokens, separated by space.
Example
// Use wink tokenizer.
var winkTokenizer = require( 'wink-tokenizer' );
// Instantiate it and use tokenize() api.
var tokenize = winkTokenizer().tokenize;
var tokens = tokenize( 'Manchester United is a professional football club based in Manchester, U. K.' )
// Detect entities.
myNER.recognize( tokens );
// -> [
// { entityType: 'club', uid: 'manu', originalSeq: [ 'Manchester', 'United' ], value: 'manchester united', tag: 'word' },
// { value: 'is', tag: 'word' },
// { value: 'a', tag: 'word' },
// { value: 'professional', tag: 'word' },
// { value: 'football', tag: 'word' },
// { value: 'club', tag: 'word' },
// { value: 'based', tag: 'word' },
// { value: 'in', tag: 'word' },
// { value: 'Manchester', tag: 'word', originalSeq: [ 'Manchester' ], uid: 'manchester', entityType: 'city' },
// { value: ',', tag: 'punctuation' },
// { entityType: 'country', uid: 'uk', originalSeq: [ 'U', '.', 'K' ], value: 'u k', tag: 'word' },
// { value: '.', tag: 'punctuation' }
// ]
Parameters
Name | Type | Description |
---|---|---|
tokens | Array.<object> | — tokenized either using wink-tokenizer or follow it's standards. |
Returns
of updated tokens
with entities tagged.
- Type
- Array.<object>
reset
Resets the named entity recognizer by re-initializing all the learnings and by setting the configuration to default.
Example
myNER.reset( );
// -> true
Returns
always true.
- Type
- boolean