Meanwhile, we've also been looking at different metrics to use besides Euclidean distance.
We're currently considering chi-squared and Mahalanobis in addition to Euclidean.
The reason we're inclined towards using the Mahalanobis distance is that it takes into account the covariance among the variables in calculating distances. Doing this solves problems related to scale and correlation in Euclidean distances. When using Euclidean distance, the set of points equidistant from a given location is a sphere. The Mahalanobis distance stretches this sphere to correct for the respective scales of the different variables, and to account for correlation among variables.
data:image/s3,"s3://crabby-images/52ac4/52ac45d3b2c7d7127904ca13813fc19de3c637c5" alt=""
from http://matlabdatamining.blogspot.com/2006/11/mahalanobis-distance.html
We'll be testing other distance metrics once we've fixed this bug.
Update: currently implementing k-nn to do line comparisons, hopefully this will solve the problem.