NaiveBayesTextClassifier

NaiveBayesTextClassifier

Naive Bayes Text Classifier class.

Methods

computeOdds(input) → {Array.<array>}

Source:

Computes the log base-2 of odds of every label for the input; and returns the array of [ label, odds ] in descending order of odds.

Example
myClassifier.computeOdds( 'I want to pay my car loan early' );
// -> [
        [ 'prepay', 6.169686751688911 ],
        [ 'autoloan', -6.169686751688911 ]
      ]
Parameters:
Name Type Description
input String | Array.<String>

is either text or tokens determined by the choice of preparatory tasks.

Returns:

Array of [ label, odds ] in descending order of odds.

Type
Array.<array>

consolidate() → {boolean}

Source:

Consolidates the learning. It is a prerequisite for evaluate() and/or predict().

Example
myClassifier.consolidate();
// -> true
Throws:

Error if training data belongs to only a single class label or the training data is too small for learning.

Returns:

Always true.

Type
boolean

defineConfig(cfg, considerOnlyPresenceopt, smoothingFactoropt) → {boolean}

Source:

Defines the configuration for naive bayes text classifier. This must be called before attempting to learn; in other words it can not be set once learning has started.

Example
myClassifier.defineConfig( { considerOnlyPresence: true, smoothingFactor: 0.5 } );
// -> true
Parameters:
Name Type Attributes Default Description
cfg object

defines the configuration in terms of the following parameters:

considerOnlyPresence boolean <optional>
false

true indicates a binarized model.

smoothingFactor number <optional>
1

defines the value for additive smoothing. It can have any value between 0 and 1.

Throws:

Error if cfg is not a valid Javascript object, or smoothingFactor is invalid, or an attempt to define configuration is made after learning starts.

Returns:

Always true.

Type
boolean

definePrepTasks(tasks) → {number}

Source:

Defines the text preparation tasks to transform raw incoming text into tokens required during learn(), evaluate() and predict() operations. The tasks should be an array of functions; using these function a simple pipeline is built to serially transform the input to the output.

Example
// Load wink NLP utilities
var nlp = require( 'wink-nlp-utils' );
// Define the text preparation tasks.
myClassifier.definePrepTasks( [
  // Simple tokenizer to convert input text in to tokens
  nlp.string.tokenize0,
  // Removes stop words from the input tokens
  nlp.tokens.removeWords,
  // Stems each token into its base form
  nlp.tokens.stem
] );
// -> 3
Parameters:
Name Type Description
tasks Array.<function()>

the first function in this array must accept a string as input and the last function must return tokens i.e. array of strings. Please refer to example.

Throws:

Error if tasks is not an array of functions.

Returns:

The number of functions in task array.

Type
number

evaluate(input, label) → {boolean}

Source:

Evaluates the learning against a test data set. The input is used to predict the class label, which is compared with the actual class label to populate confusion matrix incrementally.

Example
myClassifier.evaluate( 'can i close my loan', 'prepay' );
// -> true
Parameters:
Name Type Description
input String | Array.<String>

is either text or tokens determined by the choice of preparatory tasks.

label string

of class to which input belongs.

Returns:

Always true.

Type
boolean

exportJSON() → {string}

Source:

Exports the learning as a JSON, which may be saved as a text file for later use via importJSON().

Example
myClassifier.exportJSON();
// returns JSON.
Returns:

Learning in JSON format.

Type
string

importJSON(json) → {boolean}

Source:

Imports an existing JSON learning for prediction. It is essential to definePrepTasks()#definepreptasks and consolidate() before attempting to predict.

Parameters:
Name Type Description
json JSON

containing learnings in as exported by exportJSON.

Throws:

Error if json is invalid.

Returns:

Always true.

Type
boolean

learn(input, label) → {boolean}

Source:

Learns from the example pair of input and its label.

Example
myClassifier.learn( 'I need loan for a new vehicle', 'autoloan' );
// -> true
Parameters:
Name Type Description
input string | Array.<string>

if it is a string, then definePrepTasks() must be called before learning so that input string is transformed into tokens on the fly.

label string

of class to which input belongs.

Throws:

Error if learnings have been already consolidated.

Returns:

Always true.

Type
boolean

metrics() → {object}

Source:

Computes a detailed metrics consisting of macro-averaged precision, recall and f-measure along with their label-wise values and the confusion matrix.

Example
// Assuming that evaluation has been already carried out
JSON.stringify( myClassifier.metrics(), null, 2 );
// -> {
//      "avgPrecision": 0.75,
//      "avgRecall": 0.75,
//      "avgFMeasure": 0.6667,
//      "details": {
//        "confusionMatrix": {
//          "prepay": {
//            "prepay": 1,
//            "autoloan": 1
//          },
//          "autoloan": {
//            "prepay": 0,
//            "autoloan": 1
//          }
//        },
//        "precision": {
//          "prepay": 0.5,
//          "autoloan": 1
//        },
//        "recall": {
//          "prepay": 1,
//          "autoloan": 0.5
//        },
//        "fmeasure": {
//          "prepay": 0.6667,
//          "autoloan": 0.6667
//        }
//      }
//    }
Throws:

Error if attempt to generate metrics is made prior to proper evaluation.

Returns:

Detailed metrics.

Type
object

predict(input) → {String}

Source:

Predicts the class label for the input. If it is unable to predict then it returns a value unknown.

Example
myClassifier.predict( 'I want to pay my car loan early' );
// -> prepay
Parameters:
Name Type Description
input String | Array.<String>

is either text or tokens determined by the choice of preparatory tasks.

Returns:

The predicted class label for the input.

Type
String

reset() → {boolean}

Source:

It completely resets the classifier by re-initializing all the learning related variables, except the preparatory tasks. It is useful during cross fold-validation.

Example
myClassifier.reset();
// -> true
Returns:

Always true.

Type
boolean

stats() → {object}

Source:

Returns basic stats of learning in terms of count of samples under each label, total words, and the size of vocabulary.

Example
myClassifier.stats();
// -> {
//      labelWiseSamples: {
//        autoloan: 5,
//        prepay: 4
//      },
//      labelWiseWords: {
//        autoloan: 36,
//        prepay: 26
//      },
//      vocabulary: 24
//    };
Returns:

An object containing count of samples under each label, total words, and the size of vocabulary.

Type
object