Open Access   Article Go Back

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis in An Optimized Manner

R.S. Nyaykhor1 , N.T. Deotale2

Section:Research Paper, Product Type: Journal Paper
Volume-2 , Issue-3 , Page no. 31-35, Mar-2014

Online published on Mar 30, 2014

Copyright © R.S. Nyaykhor, N.T. Deotale . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: R.S. Nyaykhor, N.T. Deotale, “Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis in An Optimized Manner,” International Journal of Computer Sciences and Engineering, Vol.2, Issue.3, pp.31-35, 2014.

MLA Style Citation: R.S. Nyaykhor, N.T. Deotale "Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis in An Optimized Manner." International Journal of Computer Sciences and Engineering 2.3 (2014): 31-35.

APA Style Citation: R.S. Nyaykhor, N.T. Deotale, (2014). Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis in An Optimized Manner. International Journal of Computer Sciences and Engineering, 2(3), 31-35.

BibTex Style Citation:
@article{Nyaykhor_2014,
author = {R.S. Nyaykhor, N.T. Deotale},
title = {Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis in An Optimized Manner},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2014},
volume = {2},
Issue = {3},
month = {3},
year = {2014},
issn = {2347-2693},
pages = {31-35},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=63},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=63
TI - Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis in An Optimized Manner
T2 - International Journal of Computer Sciences and Engineering
AU - R.S. Nyaykhor, N.T. Deotale
PY - 2014
DA - 2014/03/30
PB - IJCSE, Indore, INDIA
SP - 31-35
IS - 3
VL - 2
SN - 2347-2693
ER -

VIEWS PDF XML
4149 3791 downloads 3789 downloads
  
  
           

Abstract

Data mining is the domain which has utility in real world applications. Data sets are prepared from regular transactional databases for the purpose of data mining. However, preparing datasets manually is time consuming and tedious in nature as it involves aggregations, sub queries and joins. Moreover the traditional SQL Structured Query Language) aggregations such as MAX, MIN etc. can generate single row output which is not useful in generating datasets. Therefore it is essential to build horizontal aggregations that can generate datasets in horizontal layout. These data sets can be used further for data mining in the real world applications. This paper focuses on building user-defined horizontal aggregations such as PIVOT, SPJ (SELECT PROJECT JOIN) and CASE whose underlying logic uses SQL queries.

Key-Words / Index Term

Data Mining, Horizontal Aggregations, PIVOT, CASE, SQL, Data Sets

References

[1] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. �Data cube: A relational aggregation operator generalizing group-by, cross-tab and subtotal�. In ICDE Conference, pages 152�159,1996 .
[2] E.F. Codd, �Extending the Database Relational Model to Capture More Meaning,� ACM Trans. Database Systems, vol. 4, no. 4, pp. 397-434, 1979.
[3] Rajesh Reddy Muley, Sravani Achanta and Prof.S.V.Achutha Rao, �Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis�, International Journal of Computer Trends and Technology (IJCTT) � vol.4, no.8, pp 1-5,August 2013.
[4] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C. Kleinerman, �.NET Database Programmability and Extensibility in Microsoft SQL Server,� Proc. ACM SIGMOD Int�l Conf. Management of Data (SIGMOD �08), pp. 1087-1098, 2008.
[5] C. Ordonez. �Integrating K-means clustering with a relational DBMS using SQL,� IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(2):188�201, 2006.
[6] H. Wang, C. Zaniolo, and C.R. Luo.�ATLaS: A small but complete SQL extension for data mining and data streams�. In Proc. VLDB Conference, pages 1113�1116, 2003.
[7] S. Sarawagi, S. Thomas, and R. Agrawal, �Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications,� Proc. ACM SIGMOD Int�l Conf. Management of Data (SIGMOD �98), pp. 343-354, 1998.
[8] A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N. Folkert, A. Gupta, L. Sheng, and S. Subramanian, �Spreadsheets in RDBMS for OLAP,� Proc. ACM SIGMOD Int�l Conf. Management of Data (SIGMOD �03), pp. 52-63, 2003.
[9] H. Garcia-Molina, J.D. Ullman, and J. Widom, �Database Systems: The Complete Book�, first ed. Prentice Hall, 2001.
[10] C. Galindo-Legaria and A. Rosentahl, �Outer Join Simplification and Reordering for Query Optimization,� ACM Trans. Database Systems, vol.22, no.1, pp.43-73, 1997.
[11] G. Bhargava, P. Goel, and B.R. Iyer, �Hypergraph Based Reorderings of Outer Join Queries with Complex Predicates,� Proc. ACM SIGMOD Int�l Conf. Management of Data (SIGMOD �95), pp. 304-315, 1995.
[12] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. �Data cube: A relational aggregation operator generalizing group-by, cross-tab and subtotal�. In ICDE Conference, pages 152�159,1996.
[13] G. Graefe, U. Fayyad, and S. Chaudhuri, �On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases,� Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD �98), pp. 204-208, 1998.
[14] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, �Non- Stop SQL/MX Primitives for Knowledge Discovery,� Proc. ACM SIGKDD Fifth Int�l Conf. Knowledge Discovery and Data Mining (KDD �99), pp. 425-429, 1999.
[15] C. Cunningham, G. Graefe, and C.A. Galindo-Legeria, �PIVOT AND UNPIVOT: Optimization and Execution Strategies in an RDBMS,�Proc: 13th Int�l Conf. Very Large Data Bases (VLDS�04), pp.998-1009, 2004.
[16] C. Ordonez, �Horizontal Aggregations for Building Tabular Data Sets,� Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD �04), pp. 35-42, 2004.
[17] C. Ordonez, �Horizontal Aggregations for Building Tabular Data Sets,� Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD �04), pp. 35-42, 2004.
[18] C. Ordonez, �Vertical and Horizontal Percentage Aggregations,� Proc. ACM SIGMOD Int�l Conf. Management of Data (SIGMOD�04), pp. 866-871,2004.
[19] Carlos Ordonez and Zhibo Chen,� Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis�, IEEE transactions on knowledge and data engineering, vol. 24, no. 4, pp 1-14, April 2012.
[19] G. Luo, J.F. Naughton, C.J. Ellmann, and M. Watzke, �Locking Protocols for Materialized Aggregation Join Views,� IEEE Trans. Knowledge and Data Eng., vol. 17, no.6, pp. 796-807, June 2005.
[20] Jasna S and Manu J Pillai. Article: Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL. International Journal of Computer Applications 86(13):32-36, January 2014.