1.3.1
Decision Tree to predict the value of a continuous target variable
Predict the value of a continuous variable such as price, turn around time, or mileage using winkregressiontree
. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.
Use npm to install:
npm install winkregressiontree save
Here is an example of predicting car’s mileage (miles per gallon  mpg) from attributes like displacement, horsepower, acceleration, country of origin, and few more. A sample data row is given for quick reference:
Model  MPG  Cylinders  Displacement  Power  Weight  Acceleration  Year  Origin 

Toyota Mark II  20  6  large displacement  high power  high weight  slow  73  Japan 
The code below provides a potential configuration to predict the value of miles per gallon:
// Load winkregressiontree.
var regressionTree = require( 'winkregressiontree' );
// Load cars training data set.
// In practice an async mechanism may be used to
// read data asynchronously and call `ingest()` on
// every row of data read.
var cars = require( 'winkregressiontree/sampledata/cars.json' );
// Create a sample data to test prediction for
// Ford Gran Torino, having "mpg of 14.5", very
// large displacement, extremely high power, very
// high weight, slow, and with origin as US.
var input = {
model: 'Ford Gran Torino',
weight: 'very high weight',
displacement: 'very large displacement',
horsepower: 'extremely high power',
origin: 'US',
acceleration: 'slow'
};
// Above record is not the part of training data.
// Create an instance of the regression tree.
var rt = regressionTree();
// Specify columns of the training data.
var columns = [
{ name: 'model', categorical: true, exclude: true },
{ name: 'mpg', categorical: false, target: true },
{ name: 'cylinders', categorical: true, exclude: false },
{ name: 'displacement', categorical: true, exclude: false },
{ name: 'horsepower', categorical: true, exclude: false },
{ name: 'weight', categorical: true, exclude: false },
{ name: 'acceleration', categorical: true, exclude: false },
{ name: 'year', categorical: true, exclude: true },
{ name: 'origin', categorical: true, exclude: false }
];
// Specify configuration for learning.
var treeParams = {
minPercentVarianceReduction: 0.5,
minLeafNodeItems: 10,
minSplitCandidateItems: 30,
minAvgChildrenItems: 2
};
// Define the regression tree configuration using
// `columns` and `treeParams`.
rt.defineConfig( columns, treeParams );
// Ingest the data.
cars.forEach( function ( row ) {
rt.ingest( row );
} );
// Data ingested! Now time to learn from data!!
console.log( rt.learn() );
// > 16 (Number of Rules Learned)
// Predict the **mean** value.
var mean = rt.predict( input );
console.log( +mean.toFixed( 1 ) );
// > 14.3 ( compare with actual mpg of 14.5 )
// In practice one may like to compute a range
// or upper limit using the `modifier` function
// during prediction. Note `size`, `mean`, and `stdev`
// values, passed to this function, can be used
// for computing the range or the upper limit.
Try experimenting with this example on Runkit in the browser.
For detailed API docs, check out http://winkjs.org/winkregressiontree/ URL!
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.
winkregressiontree is copyright 201718 GRAYPE Systems Private Limited.
It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.
Creates an instance of winkregressiontree
.
methods
:
object conatining set of API methods for tasks like configuration,
data ingestion, learning, and prediction etc.
// Load wink regression tree.
var regressionTree = require( 'winkregressiontree' );
// Create your instance of regression tree.
var myRT = regressionTree();
Defines the configuration required to read the input data and to generates the regression tree.
(Array<object>)
— each object in this array defines a column of input
data in the same sequence in which data will be supplied to
ingest().
It is
defined in terms of the following details:
Name  Description 

inputDataCols[].name string

of the column. 
inputDataCols[].categorical boolean

defines column's data type —
true
indicating categorical
or
false
indicating numeric; currently numeric data type is not supported.

inputDataCols[].exclude boolean
(default false )

used to exclude a column during tree building. 
inputDataCols[].target boolean
(default false )

is set to
true
only for the target column, whose
value needs to be predicted. Note this column must be a numeric column.

(object)
— contains key value pairs of the following regression
tree's parameters:
Name  Description 

tree.maxDepth number
(default 20 )

is the maximum depth of the tree after which learning stops. 
tree.minPercentVarianceReduction number
(default 10 )

is the minmum variance reduction required for a split to occur. 
tree.minSplitCandidateItems number
(default 50 )

the minimum items that must be present
at a node for it to be split further, even after the
minPercentVarianceReduction
target has been achieved.

tree.minLeafNodeItems number
(default 10 )

is the minimum number of items that must be present at a leaf node to be retained as an independent node. Nodes with less than this value size are merged together. 
tree.minAvgChildrenItems number
(default 2 )

the average number of items across children must be greater than this number, for a column to become a candidate for split. A higher number will discourage splits that creates many branches with each child node containing fewer items. 
number
:
number of columns defined.
// Define each column.
var columns = [
{ name: 'model', categorical: true, exclude: true },
{ name: 'mpg', categorical: false, target: true },
{ name: 'cylinders', categorical: true },
{ name: 'displacement', categorical: true, exclude: false },
{ name: 'horsepower', categorical: true, exclude: false },
{ name: 'weight', categorical: true, exclude: false },
{ name: 'acceleration', categorical: true, exclude: false },
{ name: 'year', categorical: true, exclude: true },
{ name: 'origin', categorical: true, exclude: false }
];
// Define parameters to grow the tree.
var treeParams = {
minPercentVarianceReduction: 2.5,
minLeafNodeItems: 10,
minSplitCandidateItems: 30,
minAvgChildrenItems: 3
};
// Define the configuration using above 2 variables.
myRT.defineConfig( columns, treeParams );
// > 8
Ingests one row of the data at a time. It is specially useful for reading data in an asynchronus manner, where this may be used as a call back function on every row read event.
(array)
— one row of the data to be ingested; column values
should be in the same sequence in which they are defined in data configuration
via
defineConfig()
.
boolean
:
always
true
.
row
don't match with the
number of columns defined.
// Load cars training data set.
var cars = require( 'winkregressiontree/sampledata/cars.json' );
// Ingest the data.
cars.forEach( function ( row ) {
myRT.ingest( row );
} );
Learns from the ingested data and generates the rule tree that is used to
predict()
the value of target variable from the input. It requires at least
60 data rows to initiate meaningful learning.
number
:
number of rules learned from the input data.
myRT.learn();
// > Number of rules learned
Predicts the value of target variable from the input
using the rules tree generated by
learn()
. If the value of a columm in the input data, required for
the prediction is missing, by defualt it throws an error. If the function
fn
is defined then no error is thrown, instead the name of missing column is passed
to this function; and the function is expected to handle the same.
(object)
— data containing column name/value pairs; the column
names must the same as defined via
defineConfig()
.
(function
= undefined
)
— is called once
a leaf node is reached during prediction with the following 5 parameters:
size,
mean
and
stdev
values at the node; an
array
of column names
navigated to reach the leaf and
column name
for which value is missing
in the input (
default=undefined
). The value returned from this function becomes the prediction.
number
:
mean
value or whatever is returned by the
modifier
function, if defined.
input
is not a javascript object.
input
,
provided
modifier
has not been defined.
// Populate sample input
var input = {
model: 'Ford Gran Torino',
weight: 'very high weight',
displacement: 'very large displacement',
horsepower: 'extremely high power',
origin: 'US',
acceleration: 'slow'
};
// Attempt prediction.
myRT.predict( input );
// > 14.3
Generates summary of the learnings in terms of the following:
object
:
containing the following:
table
— array of objects, where each object defines level
, columnHierarchy
,
nodesSplit
, minVR
and maxVR
. A lower value of level
indicates higher importance; similarly more nodes at a level split on a columnHierarchy
is an indication of importance. Therefore, it is sorted in ascending order of level
followed by in descending order of nodesSplit
.stats
— object containing min.mean
, min.itsSD
, max.mean
, max.itsSD
,
and minSD
.myRT.summary();
// > returns the summary object.
Incrementally evalutes variance reduction for one data row at a time.
(object)
— contains column name/value pairs including the target column
name/value pair as well, which is used in evaluating the variance reduction.
boolean
:
always
true
.
myRT.evaluate( input );
Computes the variance reduction observed in the validation data passed to
evaluate()
.
object
:
containing the
varianceReduction
in percentage and data
size
.
myRT.metrics();
// > object containing varianceReduction and data size.
Exports the JSON of the rule tree generated by learn()
, which may be
saved in a file for later predictions.
json
:
of the rule tree.
var rules = myRT.exportJSON();
Imports the rule tree from the input rulesTree
for subsequent use by predict()
.
Note after a successful import, this can be used ONLY for prediction purpose
and not for further ingestion and/or learning.
(json)
— containg an earlier exported rule tree in JSON format.
boolean
:
always
true
.
rulesTree
is
null
.
rulesTree
can not be parsed as a valid JSON.
rulesTree
is of incorrect version or incorrect format.
var anRT = regressionTree();
// Assuming that json has a valid rule tree.
anRT.importJSON( rules );
It completely resets the tree by reinitializing all the learning related variables, except it's configuration. It is useful during cross foldvalidation.
undefined
:
nothing!
var myRT.reset();