TY - GEN

T1 - Algorithms to calculate the Manhattan (L1) distance for vertical data represented in pTrees

AU - Hossain, Mohammad K.

AU - Roy, Arjun G.

AU - Chatterjee, Arijit

AU - Perrizo, William

PY - 2012

Y1 - 2012

N2 - In data mining applications different types of distance metrics are used to measure the closeness of two data points. Among these metrics Manhattan (L1), Euclidean (L2) and Max (L.) distances are used very frequently in various algorithms. In pTree vertical data representation Max distance can be efficiently implemented using only bitwise operations across the pTrees without any horizontal access of the data points. But many clustering and classification algorithms require computing L1 and L2 distances in order to increase 1 their accuracy. In this paper we have shown how Manhattan or L1 distance can be calculated for vertical data represented in pTrees. Similar to the Max distance this algorithm also uses only bitwise operations across various pTrees without performing any horizontal scan of the data points. As a result the algorithm works very fast on huge volume of data represented by pTrees comparing with traditional horizontal data representation. Also these algorithms enable various data mining algorithms that use pTrees to improve their accuracy without sacrificing any significant speed.

AB - In data mining applications different types of distance metrics are used to measure the closeness of two data points. Among these metrics Manhattan (L1), Euclidean (L2) and Max (L.) distances are used very frequently in various algorithms. In pTree vertical data representation Max distance can be efficiently implemented using only bitwise operations across the pTrees without any horizontal access of the data points. But many clustering and classification algorithms require computing L1 and L2 distances in order to increase 1 their accuracy. In this paper we have shown how Manhattan or L1 distance can be calculated for vertical data represented in pTrees. Similar to the Max distance this algorithm also uses only bitwise operations across various pTrees without performing any horizontal scan of the data points. As a result the algorithm works very fast on huge volume of data represented by pTrees comparing with traditional horizontal data representation. Also these algorithms enable various data mining algorithms that use pTrees to improve their accuracy without sacrificing any significant speed.

UR - http://www.scopus.com/inward/record.url?scp=84872004517&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872004517&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84872004517

SN - 9781880843840

T3 - Proceedings of the ISCA 27th International Conference on Computers and Their Applications, CATA 2012

SP - 65

EP - 70

BT - Proceedings of the ISCA 27th International Conference on Computers and Their Applications, CATA 2012

T2 - 27th International Conference on Computers and Their Applications, CATA 2012

Y2 - 12 March 2012 through 14 March 2012

ER -