Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data

Siddheshwar V. Patil, Dinesh B. Kulkarni

Open Access Article Go Back

Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data

Siddheshwar V. Patil¹ , Dinesh B. Kulkarni²

Section:Review Paper, Product Type: Journal Paper
Volume-7 , Issue-5 , Page no. 1750-1755, May-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i5.17501755

Online published on May 31, 2019

Copyright © Siddheshwar V. Patil, Dinesh B. Kulkarni . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Siddheshwar V. Patil, Dinesh B. Kulkarni, “Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, pp.1750-1755, 2019.

MLA Style Citation: Siddheshwar V. Patil, Dinesh B. Kulkarni "Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data." International Journal of Computer Sciences and Engineering 7.5 (2019): 1750-1755.

APA Style Citation: Siddheshwar V. Patil, Dinesh B. Kulkarni, (2019). Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data. International Journal of Computer Sciences and Engineering, 7(5), 1750-1755.

BibTex Style Citation:
@article{Patil_2019,
author = {Siddheshwar V. Patil, Dinesh B. Kulkarni},
title = {Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2019},
volume = {7},
Issue = {5},
month = {5},
year = {2019},
issn = {2347-2693},
pages = {1750-1755},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4484},
doi = {https://doi.org/10.26438/ijcse/v7i5.17501755}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i5.17501755}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4484
TI - Parallel Computing Approaches for Dimensionality Reduction in the High-Dimensional Data
T2 - International Journal of Computer Sciences and Engineering
AU - Siddheshwar V. Patil, Dinesh B. Kulkarni
PY - 2019
DA - 2019/05/31
PB - IJCSE, Indore, INDIA
SP - 1750-1755
IS - 5
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
348	277 downloads	156 downloads

Bar Line

Abstract

The machine learning, as well as data mining techniques, deals with huge datasets. The numbers of dimensions (many features or instances) for these datasets are very large, which reduces performance (accuracy) of classification. The high dimensionality data models generally involve enormous data to be modeled and visualized for knowledge extraction which may require feature selection, classification, and prediction. Because of the high dimensionality of the datasets, it often consists of many redundant and irrelevant features which will grow the classification complexity while degrade the learning algorithm performance. Recent research focuses on improving accuracy by the way of dimension reduction techniques resulting in reducing computing time. So, it leads researchers to easily opt for parallel computing on high-performance computing (HPC) infrastructure. Parallel computing on multi-core and many-core architectures has evidenced to be important when searching for high-performance solutions. The general purpose graphics processing unit (GPGPU) has gained a very important place in the field of high-performance computing as a result of its low cost and massively data processing power. Also, parallel processing techniques achieve better speedup and scaleup. The popular dimensionality reduction methods are reviewed in this paper. These methods are Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Random Projection (RP), Auto-Encoder (AE), Multidimensional scaling (MDS), Non-negative Matrix Factorization (NMF), Locally Linear Embedding (LLE), Extreme Learning Machine (ELM) and Isometric Feature Mapping (Isomap). The objective of this paper is to present parallel computing approaches on multi-core and many-core architectures for solving dimensionality reduction problems in high dimensionality data.

Key-Words / Index Term

High-performance computing, Parallel computing, Dimensionality reduction, Classification, High-dimensionality data, Graphics processing unit

References

[1] E. Martel, R. Lazcano, J. Lopez, D. Madronal et al., “Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and Comparisons”, Remote Sens, Vol. 10, issue. 6, pp. 864, 2018.
[2] S. Ramirez-Gallego, I. Lastra et al.,“Fast-mrmr: fast minimum redundancy maximum relevance algorithm for high-dimensional big data”, Int. J. Intell. Syst . Vol. 32, Issue. 2, pp. 134-152, 2017.
[3] H. Kvinge, E. Farnell, M. Kirby, and C. Peterson, “A GPU-Oriented Algorithm Design for Secant-Based Dimensionality Reduction”, 17th IEEE International Symposium on Parallel and Distributed Computing (ISPDC), Geneva, pp. 69-76, 2018.
[4] H. Hotelling, “Analysis of a complex of statistical variables into principal components”, J. Edu. Psychol., vol. 24, Issue. 6, pp. 417-441, 1933.
[5] D. Lee, H. Seung, “Learning the parts of objects by non-negative matrix factorization”, Nature, vol. 401, pp. 788-791, 1999.
[6] D. Achlioptas, “Database-friendly random projections”, Proc. 20th Symp. Principles Database Syst., pp. 274-281 , 2001.
[7] Y. Bengio, “Learning deep architectures for AI”, Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, 2009.
[8] Y. Bengio, A. Courville, P. Vincent, “Representation learning: A review and new perspectives”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, issue. 8, pp. 1798-1828, 2013.
[9] M. Chen, Z. Xu et al., “Marginalized denoising autoencoders for domain adaptation”, Proc. 29th Int. Conf. Mach. Learn., pp. 767-774, 2012.
[10] T. Cox, M. Cox, “Multidimensional Scaling”, Handbook of Data Visualization. Springer Handbooks Comp.Statistics. Springer, Berlin, Heidelberg, pp. 316-341, 2008.
[11] S. Roweis, L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding”, Science, vol. 290, pp. 2323-2326, 2000.
[12] J. Tenenbaum, V. de Silva, J. Langford, “A Global Geometric Framework for Non- linear Dimensionality Reduction”, Science, vol. 290, pp. 2319-2323, 2000.
[13] L. Kasun, Y. Yang, G. Huang and Z. Zhang, “Dimension Reduction With Extreme Learning Machine”, IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3906-3918, 2016.
[14] K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, “When is nearest neighbor" meaningful?. Database Theory - ICDT99, 217{235}, 1999.
[15] J. Chen et al., “A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment", IEEE Transactions on Parallel and Distributed Systems, vol. 28,issue. 4, pp. 919-933, 2017.
[16] Y. Wang, A. Shrivastava, J. Wang, and J. Ryu, “Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search”, ACM Proceedings of International Conference on Management of Data. pp. 889-903, 2018.
[17] S. Ramirez-Gallego et al., “An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark”, IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 48, Issue. 9, pp. 1441-1453, 2018.
[18] R. Jin, G. Chen, Anthony K. H. Tung, Lidan Shou, and Beng Chin Ooi, “An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data”, KSII Transactions on Internet and Information Systems. Vol. 12, issue. 6, pp. 2761- 2781, 2018.
[19] M. Awan and F. Saeed, “An Out-of-Core GPU based Dimensionality Reduction Algorithm for Big Mass Spectrometry Data and Its Application in Bottom-up Proteomics”, ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, New York, pp. 550-555, 2017.
[20] K. Siddique, Z. Akhtar et al., “Apache Hama: An Emerging Bulk Synchronous Parallel Computing Framework for Big Data Applications”, IEEE Access. 4, pp. 8879-8887, 2016.
[21] T. Mingjie, Y. Yu et al., “Efficient Parallel Skyline Query Processing for High-Dimensional Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 30, issue. 10, 2018.
[22] K. Passi, A. Nour and C. K. Jain, “Markov blanket: Efficient strategy for feature subset selection method for high dimensional microarray cancer datasets”, IEEE International Conference on Bioinformatics and Biomedicine, USA, 2017.
[23] Z. Wu, Y. Li et al., “Parallel and Distributed Dimensionality Reduction of Hyperspectral Data on Cloud Computing Architectures”, IEEE Journal of Selected Topics, vol. 9, issue. 6, pp. 2270-2278, 2016.

Citations	2325
h-index	16
i10-index	47