Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing

Lim, JongBeom; Ahnh, Jong-Suk; Lee, Kang-Woo

Detailed Information

Cited 1 time in webofscience

Cited 1 time in scopus

Metadata Downloads

Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lim, JongBeom	-
dc.contributor.author	Ahnh, Jong-Suk	-
dc.contributor.author	Lee, Kang-Woo	-
dc.date.accessioned	2024-08-08T04:31:32Z	-
dc.date.available	2024-08-08T04:31:32Z	-
dc.date.issued	2016-09	-
dc.identifier.issn	1936-6612	-
dc.identifier.issn	1936-7317	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/18025	-
dc.description.abstract	Although Apache Hadoop, an open-source implementation of the MapReduce programming model in Java, has become a popular big data framework, it is important to understand the challenges of using Hadoop for varying input data sizes, and how efficient is a Hadoop cluster with configurations. In this regard, there is a need to understand the impact of Hadoop implementation for data-parallel programming model on the performance of big data processing. In this paper, we design a performance model of a Hadoop cluster with consideration of the number of Map and Reduce tasks. Because each Hadoop cluster has its own characteristics and system parameters, it is not enough to use default settings of Hadoop configurations Furthermore, we present performance analysis based on real-world environments using cloud computing. With various performance evaluations, we identified a performance tradeoff between the number of Map and Reduce tasks and processing times of a job. Based on our observations for big data jobs with varying input data sizes, we formulated a performance model for a Hadoop cluster not only in a microscopic view but also in a macroscopic view. Our performance model for Hadoop clusters help estimate the processing rate and the average processing time for given input dataset sizes, and choose suitable configurations, which influence overall Hadoop clusters' performance largely.	-
dc.format.extent	6	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	AMER SCIENTIFIC PUBLISHERS	-
dc.title	Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1166/asl.2016.7813	-
dc.identifier.scopusid	2-s2.0-85007443979	-
dc.identifier.wosid	000399357500056	-
dc.identifier.bibliographicCitation	ADVANCED SCIENCE LETTERS, v.22, no.9, pp 2314 - 2319	-
dc.citation.title	ADVANCED SCIENCE LETTERS	-
dc.citation.volume	22	-
dc.citation.number	9	-
dc.citation.startPage	2314	-
dc.citation.endPage	2319	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Science & Technology - Other Topics	-
dc.relation.journalWebOfScienceCategory	Multidisciplinary Sciences	-
dc.subject.keywordAuthor	MapReduce	-
dc.subject.keywordAuthor	Hadoop	-
dc.subject.keywordAuthor	Performance Model	-
dc.subject.keywordAuthor	Big Data	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Advanced Convergence Engineering > Department of Computer Science and Artificial Intelligence > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Lee, Kang Woo photo

Lee, Kang Woo: College of Advanced Convergence Engineering (Department of Computer Science and Artificial Intelligence)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE