TY - GEN

T1 - A new method of calculating squared euclidean distance (SED) using PTreE technology and its performance analysis

AU - Hossain, Mohammad K.

AU - Abufardeh, Sameer

PY - 2019/3/13

Y1 - 2019/3/13

N2 - One of the advantages of Euclidean distance is that it measures the regular distance between two points in space. For this reason, it is widely used in the applications where the distance between data points are needed to be calculated to measure similarities. However, this method is costly as there involve expensive square and square root operations. One useful observation is that in many data mining applications absolute distance measures are not necessary as long as the distances are used to compare the closeness between various data points. For example, in classification and clustering, we often measure the distances of multiple data points to compare their distances from known classes or from centroids to assign those points in a class or in a cluster. In this regards, an alternative approach known as Squared Euclidean Distance (SED) can be used to avoid the computation of square root to get the squared distance between the data points. SED has been used in classification, clustering, image processing, and other areas to save the computational expenses. In this paper, we show how SED can be calculated for the vertical data represented in pTrees. We also analyze its performance and compared it with traditional horizontal data representation.

AB - One of the advantages of Euclidean distance is that it measures the regular distance between two points in space. For this reason, it is widely used in the applications where the distance between data points are needed to be calculated to measure similarities. However, this method is costly as there involve expensive square and square root operations. One useful observation is that in many data mining applications absolute distance measures are not necessary as long as the distances are used to compare the closeness between various data points. For example, in classification and clustering, we often measure the distances of multiple data points to compare their distances from known classes or from centroids to assign those points in a class or in a cluster. In this regards, an alternative approach known as Squared Euclidean Distance (SED) can be used to avoid the computation of square root to get the squared distance between the data points. SED has been used in classification, clustering, image processing, and other areas to save the computational expenses. In this paper, we show how SED can be calculated for the vertical data represented in pTrees. We also analyze its performance and compared it with traditional horizontal data representation.

UR - http://www.scopus.com/inward/record.url?scp=85078019592&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85078019592&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85078019592

T3 - Proceedings of 34th International Conference on Computers and Their Applications, CATA 2019

SP - 45

EP - 54

BT - Proceedings of 34th International Conference on Computers and Their Applications, CATA 2019

A2 - Lee, Gordon

A2 - Jin, Ying

PB - The International Society for Computers and Their Applications (ISCA)

T2 - 34th International Conference on Computers and Their Applications, CATA 2019

Y2 - 18 March 2019 through 20 March 2019

ER -