`1.3.1`

# wink-regression-tree

Decision Tree to predict the value of a continuous target variable

Predict the value of a continuous variable such as price, turn around time, or mileage using `wink-regression-tree`. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.

### Installation

Use npm to install:

``````npm install wink-regression-tree --save
``````

### Getting Started

Here is an example of predicting car’s mileage (miles per gallon - mpg) from attributes like displacement, horsepower, acceleration, country of origin, and few more. A sample data row is given for quick reference:

Model MPG Cylinders Displacement Power Weight Acceleration Year Origin
Toyota Mark II 20 6 large displacement high power high weight slow 73 Japan

The code below provides a potential configuration to predict the value of miles per gallon:

```// Load wink-regression-tree.
var regressionTree = require( 'wink-regression-tree' );

// Load cars training data set.
// In practice an async mechanism may be used to
// read data asynchronously and call `ingest()` on
// every row of data read.
var cars = require( 'wink-regression-tree/sample-data/cars.json' );

// Create a sample data to test prediction for
// Ford Gran Torino, having "mpg of 14.5", very
// large displacement, extremely high power, very
// high weight, slow, and with origin as US.
var input = {
model: 'Ford Gran Torino',
weight: 'very high weight',
displacement: 'very large displacement',
horsepower: 'extremely high power',
origin: 'US',
acceleration: 'slow'
};
// Above record is not the part of training data.

// Create an instance of the regression  tree.
var rt = regressionTree();

// Specify columns of the training data.
var columns = [
{ name: 'model', categorical: true, exclude: true },
{ name: 'mpg', categorical: false, target: true },
{ name: 'cylinders', categorical: true, exclude: false },
{ name: 'displacement', categorical: true, exclude: false },
{ name: 'horsepower', categorical: true, exclude: false },
{ name: 'weight', categorical: true, exclude: false },
{ name: 'acceleration', categorical: true, exclude: false },
{ name: 'year', categorical: true, exclude: true },
{ name: 'origin', categorical: true, exclude: false  }
];
// Specify configuration for learning.
var treeParams = {
minPercentVarianceReduction: 0.5,
minLeafNodeItems: 10,
minSplitCandidateItems: 30,
minAvgChildrenItems: 2
};
// Define the regression tree configuration using
// `columns` and `treeParams`.
rt.defineConfig( columns, treeParams );

// Ingest the data.
cars.forEach( function ( row ) {
rt.ingest( row );
} );

// Data ingested! Now time to learn from data!!
console.log( rt.learn() );
// -> 16 (Number of Rules Learned)

// Predict the **mean** value.
var mean = rt.predict( input );
console.log( +mean.toFixed( 1 ) );
// -> 14.3 ( compare with actual mpg of 14.5 )

// In practice one may like to compute a range
// or upper limit using the `modifier` function
// during prediction. Note `size`, `mean`, and `stdev`
// values, passed to this function, can be used
// for computing the range or the upper limit.```

Try experimenting with this example on Runkit in the browser.

### Documentation

For detailed API docs, check out http://winkjs.org/wink-regression-tree/ URL!

### Need Help?

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.

wink-regression-tree is copyright 2017-18 GRAYPE Systems Private Limited.

## How to create an Instance

### regressionTree

Creates an instance of `wink-regression-tree`.

regressionTree(): methods
Returns
`methods`: object conatining set of API methods for tasks like configuration, data ingestion, learning, and prediction etc.
Example
```// Load wink regression tree.
var regressionTree = require( 'wink-regression-tree' );
// Create your instance of regression tree.
var myRT = regressionTree();```

## API Methods

### defineConfig

Defines the configuration required to read the input data and to generates the regression tree.

defineConfig(inputDataCols: Array<object>, tree: object): number
Parameters
inputDataCols `(Array<object>)` — each object in this array defines a column of input data in the same sequence in which data will be supplied to `ingest().` It is defined in terms of the following details:
Name Description
inputDataCols[].name `string` of the column.
inputDataCols[].categorical `boolean` defines column's data type — `true` indicating categorical or `false` indicating numeric; currently numeric data type is not supported.
inputDataCols[].exclude `boolean` (default `false`) used to exclude a column during tree building.
inputDataCols[].target `boolean` (default `false`) is set to `true` only for the target column, whose value needs to be predicted. Note this column must be a numeric column.
tree `(object)` — contains key value pairs of the following regression tree's parameters:
Name Description
tree.maxDepth `number` (default `20`) is the maximum depth of the tree after which learning stops.
tree.minPercentVarianceReduction `number` (default `10`) is the minmum variance reduction required for a split to occur.
tree.minSplitCandidateItems `number` (default `50`) the minimum items that must be present at a node for it to be split further, even after the `minPercentVarianceReduction` target has been achieved.
tree.minLeafNodeItems `number` (default `10`) is the minimum number of items that must be present at a leaf node to be retained as an independent node. Nodes with less than this value size are merged together.
tree.minAvgChildrenItems `number` (default `2`) the average number of items across children must be greater than this number, for a column to become a candidate for split. A higher number will discourage splits that creates many branches with each child node containing fewer items.
Returns
`number`: number of columns defined.
Example
```// Define each column.
var columns = [
{ name: 'model', categorical: true, exclude: true },
{ name: 'mpg', categorical: false, target: true },
{ name: 'cylinders', categorical: true },
{ name: 'displacement', categorical: true, exclude: false },
{ name: 'horsepower', categorical: true, exclude: false },
{ name: 'weight', categorical: true, exclude: false },
{ name: 'acceleration', categorical: true, exclude: false },
{ name: 'year', categorical: true, exclude: true },
{ name: 'origin', categorical: true, exclude: false  }
];
// Define parameters to grow the tree.
var treeParams = {
minPercentVarianceReduction: 2.5,
minLeafNodeItems: 10,
minSplitCandidateItems: 30,
minAvgChildrenItems: 3
};
// Define the configuration using above 2 variables.
myRT.defineConfig( columns, treeParams );
// -> 8```

### ingest

Ingests one row of the data at a time. It is specially useful for reading data in an asynchronus manner, where this may be used as a call back function on every row read event.

ingest(row: array): boolean
Parameters
row `(array)` — one row of the data to be ingested; column values should be in the same sequence in which they are defined in data configuration via `defineConfig()` .
Returns
`boolean`: always `true` .
Throws
• error: if number of elements in `row` don't match with the number of columns defined.
Example
```// Load cars training data set.
var cars = require( 'wink-regression-tree/sample-data/cars.json' );
// Ingest the data.
cars.forEach( function ( row ) {
myRT.ingest( row );
} );```

### learn

Learns from the ingested data and generates the rule tree that is used to `predict()` the value of target variable from the input. It requires at least 60 data rows to initiate meaningful learning.

learn(): number
Returns
`number`: number of rules learned from the input data.
Throws
• error: if number of rows in the ingested data are <60.
Example
```myRT.learn();
// -> Number of rules learned```

### predict

Predicts the value of target variable from the `input` using the rules tree generated by `learn()`. If the value of a columm in the input data, required for the prediction is missing, by defualt it throws an error. If the function `fn` is defined then no error is thrown, instead the name of missing column is passed to this function; and the function is expected to handle the same.

predict(input: object, modifier: function): number
Parameters
input `(object)` — data containing column name/value pairs; the column names must the same as defined via `defineConfig()` .
modifier ```(function = undefined)``` — is called once a leaf node is reached during prediction with the following 5 parameters: size, mean and stdev values at the node; an array of column names navigated to reach the leaf and column name for which value is missing in the input ( `default=undefined` ). The value returned from this function becomes the prediction.
Returns
`number`: `mean` value or whatever is returned by the `modifier` function, if defined.
Throws
• error: if the `input` is not a javascript object.
• error: if a value of a column required for prediction is missing in `input` , provided `modifier` has not been defined.
Example
```// Populate sample input
var input = {
model: 'Ford Gran Torino',
weight: 'very high weight',
displacement: 'very large displacement',
horsepower: 'extremely high power',
origin: 'US',
acceleration: 'slow'
};
// Attempt prediction.
myRT.predict( input );
// -> 14.3```

### summary

Generates summary of the learnings in terms of the following:

1. Relative importance of columns along with the corresponding min/max variance reductions (VR).
2. The min/max mean values along with the corresponding standard deviations (SD).
3. The minumum standard deviation (SD) discovered during the learning.
summary(): object
Returns
`object`: containing the following:
1. `table` — array of objects, where each object defines `level`, `columnHierarchy`, `nodesSplit`, `minVR` and `maxVR`. A lower value of `level` indicates higher importance; similarly more nodes at a level split on a columnHierarchy is an indication of importance. Therefore, it is sorted in ascending order of `level` followed by in descending order of `nodesSplit`.
2. `stats` — object containing `min.mean`, `min.itsSD`, `max.mean`, `max.itsSD`, and `minSD`.
Example
```myRT.summary();
// -> returns the summary object.```

### evaluate

Incrementally evalutes variance reduction for one data row at a time.

evaluate(rowObject: object): boolean
Parameters
rowObject `(object)` — contains column name/value pairs including the target column name/value pair as well, which is used in evaluating the variance reduction.
Returns
`boolean`: always `true` .
Example
`myRT.evaluate( input );`

### metrics

Computes the variance reduction observed in the validation data passed to `evaluate()`.

metrics(): object
Returns
`object`: containing the `varianceReduction` in percentage and data `size` .
Example
```myRT.metrics();
// -> object containing varianceReduction and data size.```

### exportJSON

Exports the JSON of the rule tree generated by `learn()`, which may be saved in a file for later predictions.

exportJSON(): json
Returns
`json`: of the rule tree.
Example
`var rules = myRT.exportJSON();`

### importJSON

Imports the rule tree from the input `rulesTree` for subsequent use by `predict()`. Note after a successful import, this can be used ONLY for prediction purpose and not for further ingestion and/or learning.

importJSON(rulesTree: json): boolean
Parameters
rulesTree `(json)` — containg an earlier exported rule tree in JSON format.
Returns
`boolean`: always `true` .
Throws
• error: if `rulesTree` is `null` .
• error: if `rulesTree` can not be parsed as a valid JSON.
• error: if `rulesTree` is of incorrect version or incorrect format.
Example
```var anRT = regressionTree();
// Assuming that json has a valid rule tree.
anRT.importJSON( rules );```

### reset

It completely resets the tree by re-initializing all the learning related variables, except it's configuration. It is useful during cross fold-validation.

reset(): undefined
Returns
`undefined`: nothing!
Example
`var myRT.reset();`