• #0 (no title)
  • #0 (no title)
  • About
  • Facebook
  • Twitter
  • RSS
(As ISO 9001:2015 Certified Publications)
    • Quick Search
    • Advanced Search
  • Home
  • Editorial Policy
  • Author Guidelines
  • Submission
  • Copyright Form
  • Career
  • Contact us
  • Subscription

Back to Journal

Home»Articles»A Survey on Big Data Analytics Using HADOOP

JournalCover

Asian Journal of Computer Science and Technology (AJCST)

Editor Dr. K. Ganesh
Print ISSN : 2249-0701
Frequency : Quarterly

A Survey on Big Data Analytics Using HADOOP

Author : S. Mamatha and T. Sudha
Volume 8 No.3 Special Issue:June 2019 pp 35-40

Abstract

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.

Keywords

Big Data, Mining, Heterogeneity, HDFS, Map Reduce, HADOOP, Cluster, Name node, Data Node

Full Text:

References

[1] S. Shuman, “Structure, mechanism, and evolution of the mRNA capping apparatus”, Prog Nucleic Acid Res MolBiol, 2000.
[2] A. Rajaraman and J. D. Ullman, “Mining of Massive Datasets. Cambridge – United Kingdom: Cambridge University Press, 2012.
[3] G. F. Coulouris, J. Dollimore, and T. Kindberg, Distributed Systems: Concepts and Design: Pearson Education; 2005.
[4] M. De Oliveira Branco, Distributed Data Management for Large Scale Applications. Southampton – United Kingdom: University of Southampton; 2009.
[5] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: promise and potential”, Health Inform Sci Syst., Vol. 2, No. 1, pp. 3, 2014.
[6] D. E. Bell, H. Raiffa, and A. Tversky, “Descriptive, normative, and prescriptive interactions in decision making”, DecisMak, 1988.
[7] I. Foster, and C. Kesselman, “The Grid 2: Blueprint for a new Computing Infrastructure”, Houston – USA, Elsevier, 2003.
[8] J. D. Owens, M. Houston, D. Luebke, S. Green, “Stone and JC. Phillips: GPU computing”, Proc IEEE, Vol. 96, No. 5, pp. 879–899, 2008.
[9] N. Satish, M. Harris and M. Garland, “Designing efficient sorting algorithms for manycore GPUs”, In Parallel &Distributed Processing, 2009 IPDPS 2009 IEEE International Symposium on: 2009, IEEE, pp. 1–10, 2009.
[10] B. He, W.Fang, Q. Luo, NK. Govindaraju, and T. Wang, “Mars: a MapReduce framework on graphics processors”, In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008,
[11] J. Dean, S. Ghemawat, “MapReduce: simplified data processing on large clusters”, Commun ACM 2008, Vol. 51, No. 1, pp. 107–113.
[12] S. L. Peyton Jones, The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science). New Jersey – USA: Prentice-Hall, Inc; 1987.
[13] R. E. Bryant, “Data-intensive super computing: The case for DISC”, Pittsburgh, PA – USA: School of Computer Science, Carnegie Mellon University; 2007, pp.1–20.
[14] T. White: Hadoop: The Definitive Guide. Sebastopol – USA: ― O’Reilly Media, Inc.‖; 2012.
[15] K. Shvachko, H. Kuang, S. Radia and R. Chansler, “The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST)”, 2010 IEEE 26th Symposium, IEEE, pp.1-10, 2010.
[16] The Apache Software Foundation. [http://apache.org/]
[17] M. Olson, “Hadoop: Scalable, flexible data storage and analysis”, IQT Quart, No. 3, pp. 14–18, 2010.
[18] J. Xiaojing, “Google Cloud Computing Platform Technology Architecture and the Impact of Its Cost”, In 2010 Second WRI World Congress on Software Engineering, pp. 17–20, 2010.
[19] A. Thusoo, JS. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P Wyckoff, and R Murthy, “Hive: a warehousing solution over a map-reduce framework”, Proc VLDB Endowment, Vol. 2, No. 2, pp.1626–1629, 2009.
[20] C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, “Pig latin: A not-so-foreign language for data processing”, In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data: 2008, ACM; 2008, pp. 1099–1110.
[21] S. Prabha and P. Kola Sujatha, “Reduction Of Big Data Sets Using Fuzzy Clustering”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Vol. 3 No. 6, June 2014.
[22] R. Madhuri, M R Murty, J. V. R. Murthy, PVGD Prasad Reddy and S.C Satapathy, “Cluster analysis on di_erent data sets using k-modes and k-prototype algorithms, ICT and Critical Infrastructure”, Proceedings of the 8thAnnual Convention of Computer Society of India,Springer, Vol. 2, pp. 137-144, 2014.
[23] X. F. Jiang, “Application of parallel annealing particle clustering algorithm in data mining”, TELKOMNIKA Indonesian Journal of Electrical Engineering, Vol. 12, No. 3, pp. 2118-2126, 2014.
[24] R. Krishnapuram and J. M. Keller, “A possibilistic approach to clustering”, IEEE Transactions on Fuzzy Systems, Vol. 1, pp. 10-12, 1993.
[25] N. Janardhan, T. SreePravallika and SowjanyaGorantla, “An efficient approach for integrating data mining into cloud computing”, International Journal of Computer Trends and Technology (IJCTT), Vol. 4, No. 5, May 2013.

Asian Journal of Computer Science and Technology is a peer-reviewed international journal that publishes high-quality scientific articles (both theory and practice) and research papers covering all aspects of future computer and Information Technology areas. Topics include, but are not limited to:

Foundations of High-performance ComputingTheory of algorithms and computability

Parallel & distributed computing

Computer networks

Neural networks

LAN/WAN/MAN

Database theory & practice

Mobile Computing for e-Commerce

Future Internet architecture

Protocols and services

Mobile and ubiquitous networks

Green networking

Internet content search

Opportunistic networking

Network applications

Network scaling and limits

Artifial Intelligences

Pattern/Image Recognitions

Communication Network

Information Security

Knowledge Management

Management Information systems

Multimedia communicatiions

Operations research

Optical networks

Software Engineering

Virtual reality

Web Technologies

Wireless technology

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.

Editor-in-Chief
Dr. K. Ganesh
Global Lead, Supply Chain Management, Center of Competence and Senior Knowledge
Expert at McKinsey and Company, India
[email protected]
Editorial Advisory Board
Dr. Eng. Hamid Ali Abed AL-Asadi
Department of Computer Science, Basra University, Iraq
[email protected]
Dr. Norjihan Binti Abdul Ghani
Department of Information System, University of Malaya, Malaysia
[email protected]
Dr. Christos Bouras
Department of Computer Engineering & Informatics, University of Patras, Greece
[email protected]
Dr. Maizatul Akmar Binti Ismail
Department of Information System, University of Malaya, Malaysia
[email protected]
Dr. Harold Castro
Department of Systems Engineering and Computing, University of the Andes, Colombia
[email protected]
Dr. Busyairah Binti Syd Ali
Department of Software Engineering, University of Malaya, Malaysia
[email protected]
Dr. Sri Devi Ravana
Department of Information system, University of Malaya, Malaysia
[email protected]
Dr. Karpaga Selvi Subramanian
Department of Computer Engineering, Mekelle University, Ethiopia
[email protected]
Dr. Mazliza Binti Othman
Department of Computer System & Technology, University of Malaya, Malaysia
[email protected]
Dr. Chiam Yin Kia
Department of Software Engineering, University of Malaya, Malaysia
[email protected]
Dr. OUH Eng Lieh
Department of Information Systems, Singapore Management University, Singapore
[email protected]

2016

2015

2014

  • Results
  • Asian Review of Mechanical Engineering (ARME)
  • career

2013

  • Home
  • Shop
  • My Account
  • Logout
  • Contact us
  • The Asian Review of Civil Engineering (TARCE)

2012

  • Asian Journal of Electrical Sciences(AJES)
  • Asian Journal of Computer Science and Technology (AJCST)
  • Asian Journal of Information Science and Technology (AJIST)
  • Asian Journal of Engineering and Applied Technology (AJEAT)
  • Asian Journal of Science and Applied Technology (AJSAT)
  • Asian Journal of Managerial Science (AJMS)
  • Asian Review of Social Sciences (ARSS)

2011

2010

    Table of Contents

    Editorial Note

    Editorial Dr. K. Ganesh

    Editor-in-Chief
    Dr. K. Ganesh
    Global Lead, Supply Chain Management, Center of Competence and Senior Knowledge
    Expert at McKinsey and Company, India
    [email protected]
    Editorial Advisory Board
    Dr. Eng. Hamid Ali Abed AL-Asadi
    Department of Computer Science, Basra University, Iraq
    [email protected]
    Dr. Norjihan Binti Abdul Ghani
    Department of Information System, University of Malaya, Malaysia
    n[email protected]
    Dr. Christos Bouras
    Department of Computer Engineering & Informatics, University of Patras, Greece
    [email protected]
    Dr. Maizatul Akmar Binti Ismail
    Department of Information System, University of Malaya, Malaysia
    [email protected]
    Dr. Harold Castro
    Department of Systems Engineering and Computing, University of the Andes, Colombia
    [email protected]
    Dr. Busyairah Binti Syd Ali
    Department of Software Engineering, University of Malaya, Malaysia
    [email protected]
    Dr. Sri Devi Ravana
    Department of Information system, University of Malaya, Malaysia
    [email protected]
    Dr. Karpaga Selvi Subramanian
    Department of Computer Engineering, Mekelle University, Ethiopia
    [email protected]
    Dr. Mazliza Binti Othman
    Department of Computer System & Technology, University of Malaya, Malaysia
    [email protected]
    Dr. Chiam Yin Kia
    Department of Software Engineering, University of Malaya, Malaysia
    [email protected]
    Dr. OUH Eng Lieh
    Department of Information Systems, Singapore Management University, Singapore
    [email protected]

    Articles

Advanced Search

You can submit your research paper to the journal in just a few clicks. Please follow the steps outlined below: 1. Register your details and select to be an Author 2. Log in with your user name and password 3. ‘Start a new submission’ and follow these 5 steps:

[gravityform id="1" name="Registration" title="false" description="false"]

Privacy Statement

The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.

Privacy Statement

The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.

Lorem1 ipsum dolor sit amet, consectetur adipiscing elit. Nulla convallis ultricies scelerisque. Fusce dolor augue, sollicitudin eget lacus vitae, rutrum commodo lacus. Praesent ullamcorper facilisis dui. Sed suscipit id lorem ut dapibus. Integer dictum cursus nisl, quis ullamcorper augue. Sed non rutrum mauris. Maecenas in dolor est. Donec eget sagittis mi. Sed non leo eu odio mollis pulvinar vitae et leo. Integer eu feugiat tortor. Duis massa purus, eleifend id erat eget, hendrerit semper risus. Suspendisse cursus varius dapibus

Lorem1 ipsum dolor sit amet, consectetur adipiscing elit. Nulla convallis ultricies scelerisque. Fusce dolor augue, sollicitudin eget lacus vitae, rutrum commodo lacus. Praesent ullamcorper facilisis dui. Sed suscipit id lorem ut dapibus. Integer dictum cursus nisl, quis ullamcorper augue.

Subscription

Subscription (for 12 issues):
Rs. 5000; Overseas - USD 500;
Cheque drawn in favour of "Informatics Publishing Limited"
Click here to download online subscription form

Download

DD Mailing Address

Lorem1 ipsum dolor sit amet,
Lorem1 ipsum dolor sit amet,
Lorem1 ipsum dolor sit amet.

BACK TO TOP

Outstanding Scholars

The Journals honor Outstanding Scholars in various fields. Scholar of the Month should have contributed to their field and to the larger community. Recipients will be nominated by the Advisory Board and approved by the Editor-in-Chief of the allied journals published by The Research Publication. Scholar of the Month will be displayed in the web portal of the concerned journal.

Please send your brief write up to [email protected]

Editors and Reviewers

The Research Publication is seeking qualified researchers to join its editorial team as Associate Editor, Editorial Advisory Board Member, and Reviewers.
Kindly send your details to [email protected]

Call For Papers

Authors are requested to submit their papers electronically to [email protected] with mentioning the journal title.

Mailing Address

The Research Publication 1/611, Maruthi Nagar, Rakkipalayam Post, Coimbatore – 641 031, Tamil Nadu, India Phone No.: 0422 2461001

  • About
  • Editorial Policy
  • Author Guidelines
  • Contact us
  • Copyright
  • Facebook
  • Twitter
  • RSS

© 2015 The Research Publication. All rights reserved.

The Research Publication
  • Home
  • Editorial Policy
  • Author Guidelines
  • Submission
  • Copyright Form
  • Career
  • Contact us
  • Subscription