Performance Analysis of Dimensionality Reduction Techniques in the Context of Clustering
Author : T. Sudha and P. Nagendra KumarVolume 8 No.3 Special Issue:June 2019 pp 66-71
Abstract
Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.
Keywords
Clustering, Dimensionality Reduction, t-distributed Stochastic Neighbour Embedding, Probabilistic Principal Component Analysis
References
[1] Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Elsevier, Second Edition
[2] The Wikipedia website [Online] Available at: https://en.wikipedia.org/wiki/Curse_of_dimensionality
[3] The Wikipedia website [Online] Available at: https://en.wikipedia.org/wiki/Dimensionality_reduction
[4] John P. Cunningham, Zoubin Ghahramani “Linear dimensionality Reduction: Survey, Insights and Generalizations”, Journal of Machine Learning Research, PP.2859-2900, 2015
[5] The Wikipedia website [Online] Available at: https://en.wikipedia.org/wiki/Nonlinear-dimensionality-reduction.html
[6] The Math works website [Online] Available at: www.mathworks.com/help/stats/t-sne.html
[7] The Math works website [Online] Available at: www.mathworks.com/help/stats/ppca.html
[8] Omprakash Saini and Sumit Sharma “ A Review on Dimensionality Reduction techniques in Data Mining”, Computer Engineering and Intelligent Systems, Vol. 9, No.1, pp.7-14, 2018.
[9] Minseok Song, H.Yang, S.H.Siadat and Mykola Pechenizkiy “A comparative study of dimensionality reduction techniques to enhance trace clustering performances”, Expert Systems with applications, Vol. 40, No. 9, pp. 3722-3734, July 2013.
[10] Vishwa vinay, Ingemar J.cox, Kenwood and Natasa Milic , “A comparison of Dimensionality Reduction Techniques for Text Retrieval”, Proceedings of the Fourth International Conference on Machine Learning and Applications, IEEE, December 2005.
[11] T. Sudha and P. Nagendra Kumar, “Comparative study of dimensionality reduction techniques in the context of clustering”, International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR), Vol. 6, No.1, pp.19-28, February 2016.
[12] T. Sudha and P. Nagendra Kumar, “Achieving Privacy Preserving Clustering in Images using Multidimensional Scaling”, International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR), Vol. 6, No. 2, pp.9-18, May 2016.
[13] Rahmat widia sembiring, Jasni Mohamad Zain and Abdullah Embong, “Dimension Reduction of Health Data Clustering”, International Journal on New Computer Architectures and their applications, Vol. 1, No. 3, pp.1041-1050, 2011.
[14] C.O.S. Sorzano, J. vargas and A. Pascual-Montano, “A survey of dimensionality reduction techniques”, arXiv.org, March 2014.
[15] H. Haripriya, R. Devisree, Dinesh Pooja and Prema Nedungadi, “A Comparative analysis of Self organizing maps on weight initializations using different strategies.” Fifth International conference on Advances in Computing and Communications, pp.434-438, March 2016.
[16] Paul Mangiameli, Shaw chen and David west, “A comparison of SOM Neural network and hierarchical clustering methods”, European Journal of Operational Research”, Vol. 93, No. 2, pp. 402-417, Sept. 1996.
[17] Ashish Gupta and Richard Bowden, “Evaluating Dimensionality Reduction Techniques for Visual Category Recognition using Renyi entropy “, 19th European Signal Processing Conference, pp. 913-917, September 2011.
[18] F.S.Tsai, “Comparative study of Dimensionality Reduction Techniques for Data Visualization”, Journal of Artificial Intelligence, Vol. 3, No.3, pp.119-134, 2010.
[19] Christoph Bartenhagen, Hans-Ulrich Klein, Christian Ruckert, Xiaoyi Jiang and Martin Dugas, “Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data.” BMC Bioinformatics, November 2010.
[20] Anna konstorum, Nathan Jekel, Emily vidal and Reinhard Laubenbacher, “Comparative analysis of linear and nonlinear dimension reduction techniques on Mass Cytometry Data”, bioRxiv.March 2018.
[21] Shiping Huang, Matthew O. Ward and Elke A. Rundensteiner, “Exploration of Dimensionality Reduction for Text Visualization”, NSF grant IIS-0119276.
[22] Kazim yildiz, Yilmaz Camurcu and Buket Dogan, “Comparison of Dimension Reduction Techniques on High Dimensional Datasets.”, The International Arab Journal of Information Technology, Vol. 15, No. 2, March 2018.