Distance/Similarity functions for Bag of Words, Strings, Vectors and more.

Compute distances or similarities needed for NLP, de-duplication and clustering using ** wink-distance**. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.

Use npm to install:

```
npm install wink-distance --save
```

Check out the distance/similarity API documentation to learn more.

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.

**wink-distance** is copyright 2017-18 GRAYPE Systems Private Limited.

It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.

Computes the cosine distance between the input bag of words (bow)
`a`

and `b`

and returns a value between 0 and 1.

bow.cosine

Parameters

a

`(object)`

— the first set of bows i.e word (i.e. key) and it's frequency
(i.e. value) pairs.
b

`(object)`

— the second set of bows.
Returns

`number`

:
cosine distance between
`a`

and
`b`

.
Example

```
// bow for "the dog chased the cat"
var a = { the: 2, dog: 1, chased: 1, cat: 1 };
// bow for "the cat chased the mouse"
var b = { the: 2, cat: 1, chased: 1, mouse: 1 };
cosine( a, b );
// -> 0.14285714285714302
// Note the bow could have been created directly by
// using "tokens.bow()" from the "wink-nlp-utils".
```

Computes the the hamming distance between two numbers; each number is assumed to be decimal representation of a binary number.

number.hamming

Parameters

Returns

`number`

:
hamming distance between
`na`

and
`nb`

.
Example

```
hamming( 8, 8 );
// -> 0
hamming( 8, 15 );
// -> 3
hamming( 9, 15 );
// -> 2
```

Computes the Jaccard distance between input sets `sa`

and `sb`

.
This distance is always between 0 and 1.

set.jaccard

Parameters

Returns

`number`

:
the Jaccard distance between
`sa`

and
`sb`

.
Example

```
// Set for :-)
var sa = new Set( ':-)' );
// Set for :-(
var sb = new Set( ':-(' );
jaccard( sa, sb );
// -> 0.5
```

Computes the tversky distance between input sets `sa`

and `sb`

.
This distance is always between 0 and 1. Tversky calls `sa`

as
**prototype** and `sb`

as **variant**. The `alpha`

corresponds
to the weight of prototype, whereas `beta`

corresponds to the
weight of variant.

set.tversky

Parameters

sa

`(set)`

— the first set or the prototype.
sb

`(set)`

— the second set or the variant.
alpha

```
(number
=
````0.5`

)

— the prototype weight.
beta

```
(number
=
````0.5`

)

— the variant weight.
Returns

`number`

:
the tversky distance between
`sa`

and
`sb`

.
Example

```
// Set for :-)
var sa = new Set( ':-)' );
// Set for :p
var sb = new Set( ':p' );
tversky( sa, sb, 1, 0 );
// -> 0.6666666666666667
tversky( sa, sb );
// -> 0.6
tversky( sa, sb, 0.5, 0.5 );
// -> 0.6
tversky( sa, sb, 0, 1 );
// -> 0.5
```

Computes the hamming distance between two strings of identical length.
This distance is always `>= 0`

.

string.hamming

Parameters

Returns

`number`

:
hamming distance between
`str1`

and
`str2`

.
Example

```
hamming( 'john', 'john' );
// -> 0
hamming( 'sam', 'sat' );
// -> 1
hamming( 'summer', 'samuel' );
// -> 3
hamming( 'saturn', 'urn' );
// -> throws error
```

Computes the normalized hamming distance between two strings. These strings may be of different lengths. Normalized distance is always between 0 and 1.

string.hammingNormalized

Parameters

Returns

`number`

:
normalized hamming distance between
`str1`

and
`str2`

.
Example

```
hammingNormalized( 'john', 'johny' );
// -> 0.2
hammingNormalized( 'sam', 'sam' );
// -> 0
hammingNormalized( 'sam', 'samuel' );
// -> 0.5
hammingNormalized( 'saturn', 'urn' );
// -> 1
```

Computes the jaro distance between two strings. This distance is always between 0 and 1.

string.jaro

Parameters

Returns

`number`

:
jaro distance between
`str1`

and
`str2`

.
Example

```
jaro( 'father', 'farther' );
// -> 0.04761904761904756
jaro( 'abcdef', 'fedcba' );
// -> 0.6111111111111112
jaro( 'sat', 'urn' );
// -> 1
```

Computes the jaro winkler distance between two strings. This distance,
controlled by the `scalingFactor`

, is always between 0 and 1.

string.jaroWinkler

Parameters

str1

`(string)`

— first string.
str2

`(string)`

— second string.
boostThreshold

```
(number
=
````0.3`

)

— beyond which scaling is applied: it is
applied only if the jaro distance between the input strings is less than or
equal to this value. Any value > 1, is capped at 1 automatically.
scalingFactor

```
(number
=
````0.1`

)

— is used to scale the distance.
Such scaling, if applied, is proportional to the number of shared
consecutive characters from the first character of
`str1`

and
`str2`

.
Any value > 0.25, is capped at 0.25 automatically.
Returns

`number`

:
jaro winkler distance between
`str1`

and
`str2`

.
Example

```
jaroWinkler( 'martha', 'marhta' );
// -> 0.03888888888888883
jaroWinkler( 'martha', 'marhta', 0.3, 0.2 );
// -> 0.022222222222222185
jaroWinkler( 'duane', 'dwayne' );
// -> .15999999999999992
```

Computes the levenshtein distance between two strings. This distance is computed as the number of deletions, insertions, or substitutions required to transform a string to another. Levenshtein distance is always an integer with a value of 0 or more.

string.levenshtein

Parameters

Returns

`number`

:
levenshtein distance between
`str1`

and
`str2`

.
Example

```
levenshtein( 'example', 'sample' );
// -> 3
levenshtein( 'distance', 'difference' );
// -> 5
```

Computes the soundex distance between two strings. This distance is either 1 indicating phonetic similarity or 0 indicating no similarity.

string.soundex

Parameters

Returns

`number`

:
soundex distance between
`str1`

and
`str2`

.
Example

```
soundex( 'Burroughs', 'Burrows' );
// -> 0
soundex( 'Ekzampul', 'example' );
// -> 0
soundex( 'sat', 'urn' );
// -> 1
```

Computes the chebyshev or manhattan distance between two vectors of identical length.

vector.chebyshev

Parameters

Returns

`number`

:
chebyshev distance between
`va`

and
`vb`

.
Example

```
chebyshev( [ 0, 0 ], [ 6, 6 ] );
// -> 6
```

Computes the taxicab or manhattan distance between two vectors of identical length.

vector.taxicab

Parameters

Returns

`number`

:
taxicab distance between
`va`

and
`vb`

.
Example

```
taxicab( [ 0, 0 ], [ 6, 6 ] );
// -> 12
```