دانلود رایگان مقاله انگلیسی ارزیابی عملکرد زمان بندهای کار بر روی Hadoop YARN به همراه ترجمه فارسی
عنوان فارسی مقاله | ارزیابی عملکرد زمان بندهای کار بر روی Hadoop YARN |
عنوان انگلیسی مقاله | Performance evaluation of job schedulers on Hadoop YARN |
رشته های مرتبط | مهندسی کامپیوتر، برنامه نویسی کامپیوتر و مهندسی نرم افزار |
فرمت مقالات رایگان |
مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF آماده دانلود رایگان میباشند همچنین ترجمه مقاله با فرمت ورد نیز قابل خریداری و دانلود میباشد |
کیفیت ترجمه | کیفیت ترجمه این مقاله متوسط میباشد |
توضیحات | ترجمه این مقاله به صورت خلاصه انجام شده است. |
نشریه | وایلی – Wiley |
مجله | همزمانی و محاسبات: تمرین و تجربه |
سال انتشار | 2016 |
کد محصول | F614 |
مقاله انگلیسی رایگان (PDF) |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان (PDF) |
دانلود رایگان ترجمه مقاله |
خرید ترجمه با فرمت ورد |
خرید ترجمه مقاله با فرمت ورد |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات مهندسی کامپیوتر |
فهرست مقاله: چکیده |
بخشی از ترجمه فارسی مقاله: 1- مقدمه |
بخشی از مقاله انگلیسی: 1. INTRODUCTION Hadoop [1] is an open-source software framework supported by Apache to process high volume of datasets on a cluster comprising a large number of commodity machines. Because of its simplicity, cost efficiency, scalability, and fault tolerance, a wide variety of organizations and companies, such as Google, Yahoo!, Facebook, and Amazon, have used Hadoop for both research and production [2]. However, the original Hadoop has several limitations [3]. One example is that the slot-based resource allocation for map tasks and reduce tasks bottlenecks the resource of an entire Hadoop cluster and results in low resource utilization [3]. Another example is that the original Hadoop supports only one type of programming model, i.e., MapReduce [4], which is not suitable for processing all kinds of large-scale computations [3, 5, 6]. To solve these limitations, the open-source community introduced the next generation of Hadoop’s compute platform called YARN (which is short for Yet Another Resource Negotiator) [3]. Other names are MapReduce 2.0 and MRv2. YARN allows individual applications to utilize the resources of a cluster in a shared and multi-tenant manner. Different from the original Hadoop (i.e., all versions before MRv2), YARN separates resource management functions from the programming model, and therefore can support not only MapReduce but also other programming models, including Spark [5], Storm [7], Tez [8], and REEF [9]. In other words, this separation enables various types of applications to execute on YARN in parallel. To enable a shared compute environment, YARN provides two schedulers to schedule resources to applications. One is the capacity scheduler (the default scheduler on YARN) [10], and the other is the fair scheduler [11]. Both of them can organize application submissions into a queue hierarchy. However, the former guarantees a minimum amount of resources for each queue and uses FIFO (which stands for first-in first-out) to schedule applications within a leaf queue. The latter fairly shares resources among all queues and offers three policies, including FIFO, Fair, and Dominant Resource Fairness (DRF for short) [12], to share resources for all running applications within a queue. All of the aforementioned scheduling approaches form the following four scheduling-policy combinations (SPCs for short) and provide great flexibility for YARN managers to achieve their goals, such as fair resource sharing and high resource utilization. 1. Cap-FIFO, which is the capacity scheduler with the FIFO scheduling policy. 2. Fair-FIFO, which is the fair scheduler with the FIFO scheduling policy. 3. Fair-Fair, which is the fair scheduler with the fair scheduling policy. 4. Fair-DRF, which is the fair scheduler with the DRF scheduling policy. Although YARN supports the four SPCs and diverse application types, it is unclear how these SPCs perform when they are individually used to schedule mixed applications. Besides, their performances are also unknown when different queue structures are utilized. Hence, in this paper, we survey the four SPCs and all programming models supported by YARN, and then classify all applications into several types. After that, we conduct extensive experiments to evaluate and compare the performance impacts of the four SPCs on diverse metrics by considering not only a workload consisting of mixed application types, but also the following three scenarios. The purpose is to study whether queue structures influence the performances of the four SPCs or not. 1. One-queue scenario: In this scenario, there is only one queue in our YARN cluster. Hence, all application submissions must wait in this queue before they are executed. 2. Separated-queue scenario: In this scenario, each type of applications is individually put into a separate queue. 3. Merged-queue scenario: In this scenario, there are two queues. One is for applications that will eventually stop by themselves. The other queue is for the rest of the applications. The experimental results show that (1) all SPCs suffer from a resource fragmentation problem, which will be explained later. This problem causes that none of the SPCs could successfully complete a workload consisting of mixed applications; (2) none of the four SPCs always has the best application execution performance in all scenarios; and (3) among the three scenarios, employing the mergedqueue scenario is the most appropriate for all SPCs since they can achieve a higher workload completion rate and a shorter workload turnaround time than they are in the other two scenarios. The contributions of this paper are as follows. (1) This paper provides a comprehensive survey on current schedulers, SPCs, programming models, and application types supported by YARN; (2) We extensively evaluate and compare the four SPCs by considering not only mixed application types, but also diverse queue-structure scenarios; and (3) Based on our experimental results, YARN managers can choose an appropriate SPC and queue structure to achieve a better application performance for their YARN clusters. The rest of this paper is organized as follows. Section 2 describes the related work. Section 3 surveys the origin of YARN. Section 4 introduces the two schedulers supported by YARN and the four SPCs derived from the two schedulers. Section 5 describes the programming models supported by YARN and applications that each programming model can best express and process. In section 6, extensive experiments are conducted and experimental results are discussed. Section 7 concludes this paper and outlines our future work. |