دانلود رایگان مقاله انگلیسی داده کاوی بزرگ مقرون به صرفه در زمینه ابر: یک مطالعه موردی با K-means به همراه ترجمه فارسی
عنوان فارسی مقاله: | داده کاوی بزرگ مقرون به صرفه در زمینه ابر: یک مطالعه موردی با K-means |
عنوان انگلیسی مقاله: | Cost-effective Big Data Mining in the Cloud: A Case Study with K-means |
رشته های مرتبط: | مهندسی کامپیوتر، مهندسی صنایع، رایانش ابری، داده کاوی و مهندسی الگوریتم ها و محاسبات |
فرمت مقالات رایگان | مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF میباشند |
کیفیت ترجمه | کیفیت ترجمه این مقاله متوسط میباشد |
نشریه | OUW |
کد محصول | f401 |
مقاله انگلیسی رایگان (PDF) |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان (PDF) |
دانلود رایگان ترجمه مقاله |
خرید ترجمه با فرمت ورد |
خرید ترجمه مقاله با فرمت ورد |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات مهندسی کامپیوتر |
بخشی از ترجمه فارسی مقاله: I . مقدمه |
بخشی از مقاله انگلیسی: I. INTRODUCTION The era of big data has arrived [1]. Ninety percent of the data in the world today were produced within the past two years and 2.5 quintillion bytes of new data are created every day [2]. For instance, about 6 billion new photos are reported every month by Facebook and 72 hours of video are uploaded to YouTube every minute [2]. This explosive growth of data has fueled big data mining in a wide range of sections, e.g., business [3], government [4], healthcare [5], etc. Most data mining algorithms are exponential in computational complexity. In big data scenarios, it is not rare for the data mining process to take hours, even days, to complete. Thus, big data mining often requires tremendous computational resources. Many businesses and organizations cannot afford the costs of in-house IT infrastructure for big data mining, especially, small and medium sized businesses. Cloud computing is the perfect solution for them [6]. The “pay-as-you-go” model promoted by cloud computing enables flexible and on-demand access to virtually unlimited computational resources. This allows big data mining to be performed using only the computational resources necessary for the needed period of time. In fact, many businesses and organizations have already had their data saved in the cloud. For such businesses and organizations, it is a natural choice to perform data mining in the cloud [6, 7]. However, the monetary cost of utilizing the computational resources in the cloud (referred to as computation cost) for big data mining can be unexpectedly high if they are not managed properly. For example, running 100 m4-xlarge Amazon EC2 virtual machine (VM) instances costs $583.00 per day. Thus, the cost effectiveness in the cloud has become a major obstacle for broad applications of big data mining. On this ground, it is a critical issue to analyze the cost effectiveness of big data mining in the cloud, i.e., how to achieve a sufficiently satisfactory result at the lowest possible computation cost. In many data mining scenarios, achieving the optimal result, e.g., 100% accuracy, is not necessary. Take marketing for example, where data mining is usually performed on a large number of consumers. A reasonable margin of inaccuracy is acceptable. For example, marketers do not need their consumers to be classified with a 100% accuracy. As long as they can obtain a general picture, they are able to make a decision. In fact, in some data mining scenarios, there will never be a 100% accuracy, e.g., weather forecasting and traffic jam prediction. It is possible to achieve high cost effectiveness by stopping the data mining process at a reasonable point in such scenarios because it is often more preferable to achieve a sufficient accuracy, e.g., 99% or 99.9%, at much lower costs, e.g., 10% or 20%, than the cost of achieving a 100% accuracy. Cost-effective data mining allows big data analytics to be applied in a broader range of fields by more businesses and organizations, especially small and medium sized ones. However, it has not been well investigated by the research community. In this paper, we study k-means, one of the top 10 data mining algorithms [8], to explore and demonstrate the cost effectiveness of big data mining in the cloud. The remainder of this paper is organized as follows. Section II discusses the related work. Section III introduces the methodology adopted in this study. Section IV presents and analyzes the experimental results. Section V furtherdiscusses the findings of this study. Section VI analyzes the threats to the validity of our experiments. Finally, Section VII concludes this paper and discusses the future work. |