RegressionTree

Regression tree class

Methods

defineConfig

defineConfig(inputDataCols, tree) → {number}

Defines the configuration required to read the input data and to generates the regression tree.

Example
// Define each column.
var columns = [
  { name: 'model', categorical: true, exclude: true },
  { name: 'mpg', categorical: false, target: true },
  { name: 'cylinders', categorical: true },
  { name: 'displacement', categorical: true, exclude: false },
  { name: 'horsepower', categorical: true, exclude: false },
  { name: 'weight', categorical: true, exclude: false },
  { name: 'acceleration', categorical: true, exclude: false },
  { name: 'year', categorical: true, exclude: true },
  { name: 'origin', categorical: true, exclude: false  }
];
// Define parameters to grow the tree.
var treeParams = {
  minPercentVarianceReduction: 2.5,
  minLeafNodeItems: 10,
  minSplitCandidateItems: 30,
  minAvgChildrenItems: 3
};
// Define the configuration using above 2 variables.
myRT.defineConfig( columns, treeParams );
// -> 8
Parameters
Name Type Description
inputDataCols Array.<object>

each object in this array defines a column of input data in the same sequence in which data will be supplied to ingest(). It is defined in terms of the following details:

Properties
Name Type Attributes Default Description
name string

of the column.

categorical boolean

defines column's data type — true indicating categorical or false indicating numeric; currently numeric data type is not supported.

exclude boolean <optional>
false

used to exclude a column during tree building.

target boolean <optional>
false

is set to true only for the target column, whose value needs to be predicted. Note this column must be a numeric column.

tree object

contains key value pairs of the following regression tree's parameters:

Properties
Name Type Attributes Default Description
maxDepth number <optional>
20

is the maximum depth of the tree after which learning stops.

minPercentVarianceReduction number <optional>
10

is the minmum variance reduction required for a split to occur.

minSplitCandidateItems number <optional>
50

the minimum items that must be present at a node for it to be split further, even after the minPercentVarianceReduction target has been achieved.

minLeafNodeItems number <optional>
10

is the minimum number of items that must be present at a leaf node to be retained as an independent node. Nodes with less than this value size are merged together.

minAvgChildrenItems number <optional>
2

the average number of items across children must be greater than this number, for a column to become a candidate for split. A higher number will discourage splits that creates many branches with each child node containing fewer items.

Returns

number of columns defined.

Type
number

evaluate

evaluate(rowObject) → {boolean}

Incrementally evalutes variance reduction for one data row at a time.

Example
myRT.evaluate( input );
Parameters
Name Type Description
rowObject object

contains column name/value pairs including the target column name/value pair as well, which is used in evaluating the variance reduction.

Returns

always true.

Type
boolean

exportJSON

exportJSON() → {json}

Exports the JSON of the rule tree generated by learn(), which may be saved in a file for later predictions.

Example
var rules = myRT.exportJSON();
Returns

of the rule tree.

Type
json

importJSON

importJSON(rulesTree) → {boolean}

Imports the rule tree from the input rulesTree for subsequent use by predict(). Note after a successful import, this can be used ONLY for prediction purpose and not for further ingestion and/or learning.

Example
var anRT = regressionTree();
// Assuming that json has a valid rule tree.
anRT.importJSON( rules );
Parameters
Name Type Description
rulesTree json

containg an earlier exported rule tree in JSON format.

Throws
  • if rulesTree is null.

    Type
    error
  • if rulesTree can not be parsed as a valid JSON.

    Type
    error
  • if rulesTree is of incorrect version or incorrect format.

    Type
    error
Returns

always true.

Type
boolean

ingest

ingest(row) → {boolean}

Ingests one row of the data at a time. It is specially useful for reading data in an asynchronus manner, where this may be used as a call back function on every row read event.

Example
// Load cars training data set.
var cars = require( 'wink-regression-tree/sample-data/cars.json' );
// Ingest the data.
cars.forEach( function ( row ) {
  myRT.ingest( row );
} );
Parameters
Name Type Description
row array

one row of the data to be ingested; column values should be in the same sequence in which they are defined in data configuration via defineConfig().

Throws

if number of elements in row don't match with the number of columns defined.

Type
error
Returns

always true.

Type
boolean

learn

learn() → {number}

Learns from the ingested data and generates the rule tree that is used to predict() the value of target variable from the input. It requires at least 60 data rows to initiate meaningful learning.

Example
myRT.learn();
// -> Number of rules learned
Throws

if number of rows in the ingested data are <60.

Type
error
Returns

number of rules learned from the input data.

Type
number

metrics

metrics() → {object}

Computes the variance reduction observed in the validation data passed to evaluate().

Example
myRT.metrics();
// -> object containing varianceReduction and data size.
Returns

containing the varianceReduction in percentage and data size.

Type
object

predict

predict(input, modifieropt) → {number}

Predicts the value of target variable from the input using the rules tree generated by learn(). If the value of a columm in the input data, required for the prediction is missing, by defualt it throws an error. If the function fn is defined then no error is thrown, instead the name of missing column is passed to this function; and the function is expected to handle the same.

Example
// Populate sample input
var input = {
  model: 'Ford Gran Torino',
  weight: 'very high weight',
  displacement: 'very large displacement',
  horsepower: 'extremely high power',
  origin: 'US',
  acceleration: 'slow'
};
// Attempt prediction.
myRT.predict( input );
// -> 14.3
Parameters
Name Type Attributes Description
input object

data containing column name/value pairs; the column names must the same as defined via defineConfig().

modifier function <optional>

is called once a leaf node is reached during prediction with the following 5 parameters: size, mean and stdev values at the node; an array of column names navigated to reach the leaf and column name for which value is missing in the input (default=undefined). The value returned from this function becomes the prediction.

Throws
  • if the input is not a javascript object.

    Type
    error
  • if a value of a column required for prediction is missing in input, provided modifier has not been defined.

    Type
    error
Returns

mean value or whatever is returned by the modifier function, if defined.

Type
number

reset

reset() → {undefined}

It completely resets the tree by re-initializing all the learning related variables, except it's configuration. It is useful during cross fold-validation.

Example
var myRT.reset();
Returns

nothing!

Type
undefined

summary

summary() → {object}

Generates summary of the learnings in terms of the following:

  1. Relative importance of columns along with the corresponding min/max variance reductions (VR).
  2. The min/max mean values along with the corresponding standard deviations (SD).
  3. The minumum standard deviation (SD) discovered during the learning.
Example
myRT.summary();
// -> returns the summary object.
Returns

containing the following:

  1. table — array of objects, where each object defines level, columnHierarchy, nodesSplit, minVR and maxVR. A lower value of level indicates higher importance; similarly more nodes at a level split on a columnHierarchy is an indication of importance. Therefore, it is sorted in ascending order of level followed by in descending order of nodesSplit.
  2. stats — object containing min.mean, min.itsSD, max.mean, max.itsSD, and minSD.
Type
object