NaiveBayesTextClassifier

Naive Bayes Text Classifier class.

Methods

computeOdds

computeOdds(input) → {Array.<array>}

Computes the log base-2 of odds of every label for the input; and returns the array of [ label, odds ] in descending order of odds.

Example
myClassifier.computeOdds( 'I want to pay my car loan early' );
// -> [
        [ 'prepay', 6.169686751688911 ],
        [ 'autoloan', -6.169686751688911 ]
      ]
Parameters
Name Type Description
input String Array.<String>

is either text or tokens determined by the choice of preparatory tasks.

Returns

Array of [ label, odds ] in descending order of odds.

Type
Array.<array>

consolidate

consolidate() → {boolean}

Consolidates the learning. It is a prerequisite for evaluate() and/or predict().

Example
myClassifier.consolidate();
// -> true
Throws

Error if training data belongs to only a single class label or the training data is too small for learning.

Returns

Always true.

Type
boolean

defineConfig

defineConfig(cfg, considerOnlyPresenceopt, smoothingFactoropt) → {boolean}

Defines the configuration for naive bayes text classifier. This must be called before attempting to learn; in other words it can not be set once learning has started.

Example
myClassifier.defineConfig( { considerOnlyPresence: true, smoothingFactor: 0.5 } );
// -> true
Parameters
Name Type Attributes Default Description
cfg object

defines the configuration in terms of the following parameters:

considerOnlyPresence boolean <optional>
false

true indicates a binarized model.

smoothingFactor number <optional>
1

defines the value for additive smoothing. It can have any value between 0 and 1.

Throws

Error if cfg is not a valid Javascript object, or smoothingFactor is invalid, or an attempt to define configuration is made after learning starts.

Returns

Always true.

Type
boolean

definePrepTasks

definePrepTasks(tasks) → {number}

Defines the text preparation tasks to transform raw incoming text into tokens required during learn(), evaluate() and predict() operations. The tasks should be an array of functions; using these function a simple pipeline is built to serially transform the input to the output.

Example
// Load wink NLP utilities
var nlp = require( 'wink-nlp-utils' );
// Define the text preparation tasks.
myClassifier.definePrepTasks( [
  // Simple tokenizer to convert input text in to tokens
  nlp.string.tokenize0,
  // Removes stop words from the input tokens
  nlp.tokens.removeWords,
  // Stems each token into its base form
  nlp.tokens.stem
] );
// -> 3
Parameters
Name Type Description
tasks Array.<function()>

the first function in this array must accept a string as input and the last function must return tokens i.e. array of strings. Please refer to example.

Throws

Error if tasks is not an array of functions.

Returns

The number of functions in task array.

Type
number

evaluate

evaluate(input, label) → {boolean}

Evaluates the learning against a test data set. The input is used to predict the class label, which is compared with the actual class label to populate confusion matrix incrementally.

Example
myClassifier.evaluate( 'can i close my loan', 'prepay' );
// -> true
Parameters
Name Type Description
input String Array.<String>

is either text or tokens determined by the choice of preparatory tasks.

label string

of class to which input belongs.

Returns

Always true.

Type
boolean

exportJSON

exportJSON() → {string}

Exports the learning as a JSON, which may be saved as a text file for later use via importJSON().

Example
myClassifier.exportJSON();
// returns JSON.
Returns

Learning in JSON format.

Type
string

importJSON

importJSON(json) → {boolean}

Imports an existing JSON learning for prediction. It is essential to definePrepTasks()#definepreptasks and consolidate() before attempting to predict.

Parameters
Name Type Description
json JSON

containing learnings in as exported by exportJSON.

Throws

Error if json is invalid.

Returns

Always true.

Type
boolean

learn

learn(input, label) → {boolean}

Learns from the example pair of input and its label.

Example
myClassifier.learn( 'I need loan for a new vehicle', 'autoloan' );
// -> true
Parameters
Name Type Description
input string Array.<string>

if it is a string, then definePrepTasks() must be called before learning so that input string is transformed into tokens on the fly.

label string

of class to which input belongs.

Throws

Error if learnings have been already consolidated.

Returns

Always true.

Type
boolean

metrics

metrics() → {object}

Computes a detailed metrics consisting of macro-averaged precision, recall and f-measure along with their label-wise values and the confusion matrix.

Example
// Assuming that evaluation has been already carried out
JSON.stringify( myClassifier.metrics(), null, 2 );
// -> {
//      "avgPrecision": 0.75,
//      "avgRecall": 0.75,
//      "avgFMeasure": 0.6667,
//      "details": {
//        "confusionMatrix": {
//          "prepay": {
//            "prepay": 1,
//            "autoloan": 1
//          },
//          "autoloan": {
//            "prepay": 0,
//            "autoloan": 1
//          }
//        },
//        "precision": {
//          "prepay": 0.5,
//          "autoloan": 1
//        },
//        "recall": {
//          "prepay": 1,
//          "autoloan": 0.5
//        },
//        "fmeasure": {
//          "prepay": 0.6667,
//          "autoloan": 0.6667
//        }
//      }
//    }
Throws

Error if attempt to generate metrics is made prior to proper evaluation.

Returns

Detailed metrics.

Type
object

predict

predict(input) → {String}

Predicts the class label for the input. If it is unable to predict then it returns a value unknown.

Example
myClassifier.predict( 'I want to pay my car loan early' );
// -> prepay
Parameters
Name Type Description
input String Array.<String>

is either text or tokens determined by the choice of preparatory tasks.

Returns

The predicted class label for the input.

Type
String

reset

reset() → {boolean}

It completely resets the classifier by re-initializing all the learning related variables, except the preparatory tasks. It is useful during cross fold-validation.

Example
myClassifier.reset();
// -> true
Returns

Always true.

Type
boolean

stats

stats() → {object}

Returns basic stats of learning in terms of count of samples under each label, total words, and the size of vocabulary.

Example
myClassifier.stats();
// -> {
//      labelWiseSamples: {
//        autoloan: 5,
//        prepay: 4
//      },
//      labelWiseWords: {
//        autoloan: 36,
//        prepay: 26
//      },
//      vocabulary: 24
//    };
Returns

An object containing count of samples under each label, total words, and the size of vocabulary.

Type
object