دانلود رایگان مقاله انگلیسی بهینه سازی پرس و جو با تطبیق الگوی توزیعی به همراه ترجمه فارسی
عنوان فارسی مقاله | بهینه سازی پرس و جو با تطبیق الگوی توزیعی |
عنوان انگلیسی مقاله | Query Optimization of Distributed Pattern Matching |
رشته های مرتبط | مهندسی کامپیوتر، علوم داده و مهندسی الگوریتم ها و محاسبات |
فرمت مقالات رایگان | مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF آماده دانلود رایگان میباشند |
کیفیت ترجمه | کیفیت ترجمه این مقاله متوسط میباشد |
توضیحات | ترجمه این مقاله به صورت خلاصه و ناقص انجام شده است. |
نشریه | آی تریپل ای – IEEE |
مجله | کنفرانس ICDE |
سال انتشار | 2014 |
کد محصول | F792 |
مقاله انگلیسی رایگان |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان |
دانلود رایگان ترجمه مقاله |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات مهندسی کامپیوتر |
فهرست مقاله: چکیده |
بخشی از ترجمه فارسی مقاله: چکیده: الگوریتم های حریصانه برای عملیات تطبیق الگوی گراف اغلب زمانی که دده های گراف را بتوان در حافظه بر روی تک ماشین نگه داری کرد کافی می باشد. با این حال، چون مجموعه داده های گراف به طور فزاینده ای توسعه می یابند و نیاز مند فضای ذخیره ای اضافی و پارتیشن بندی در دسته ای از ماشین ها می باشند، فنون بهینه سازی پرس و جوی پیشرفته تر برای اجتناب از انفجار در تاخیر پرس و جو اهمیت حیاتی دارد.در این مقاله، ما اقدام به معرفی روش های بهینه سازی پرس و جو برای تطبیق الگوی گراف توزیع یافته می کنیم. این فنون شامل، 1- الگوریتم بهینه سازی مبتنی بر برنامه نویسی دینامیک سبک سیستم R، که هر دو برنامه های خطی و نقطه ای را در نظر می گیرد، 2- الگوریتم مبتنی بر تشخیص چرخه که اهرمی برای چرخه ها جهت کاهش اندازه مجموعه های نهایی می باشد و 3- روش استفاده مجدد از محاسبه یا رایانش است که انتقال اطلاعات و اجرای پرس و جوی زائد را در شبکه حذف می کند. نتایج آزمایشی نشان می دهد که این الگوریتم ها می توانند موجب بهبود زیادی در عملکرد پرس و جو شوند. |
بخشی از مقاله انگلیسی: Abstract Greedy algorithms for subgraph pattern matching operations are often sufficient when the graph data set can be held in memory on a single machine. However, as graph data sets increasingly expand and require external storage and partitioning across a cluster of machines, more sophisticated query optimization techniques become critical to avoid explosions in query latency. In this paper, we introduce several query optimization techniques for distributed graph pattern matching. These techniques include (1) a System-R style dynamic programming-based optimization algorithm that considers both linear and bushy plans, (2) a cycle detection-based algorithm that leverages cycles to reduce intermediate result set sizes, and (3) a computation reusing technique that eliminates redundant query execution and data transfer over the network. Experimental results show that these algorithms can lead to an order of magnitude improvement in query performance. I. INTRODUCTION The graph data model is becoming an increasingly popular way to represent data for various applications. Reasons for this include: (1) It can be less complex for a user to shoehorn semi-structured or sparse data into a vertex-edge-vertex data model than a relational data model, (2) some increasingly popular data sets (such as the Twitter, Facebook, and LinkedIn social networks) are most naturally reasoned about using a graph paradigm, and (3) graph operations, such as shortest path calculations, subgraph pattern matching, and PageRank are easily expressed over a graph data model. Many graph data sets are becoming too large to manage on a single machine, and therefore clusters of machines are being deployed to process, store, manage, and analyze graph data. For instance, as of 2012, Facebook’s user graph has 900 million vertices (and the average degree of a vertex is 130) [1]. In Semantic Web community, the Linking Open Data movement has collected 6 billion triples (a triple is equivalent to an edge in a graph) from 300 interconnected data sets [3]. Since many graph algorithms were originally designed with the assumption that the entire graph can be stored in memory on a single machine, these distributed architectures require revisiting these algorithms in a distributed context, as considerations such as network latency and throughput can bottleneck the traditional implementation of these algorithms. Subgraph pattern matching is a particularly important operation that must be revisited for distributed graph stores. Subgraph matching operations are heavily used in social network data mining operations (e.g. counting triangles for gauging social influence of celebrities [33]), SPARQL queries over the Linked Data graph, and machine learning algorithms that power recommendation engines for e-commerce retail applications and entertainment choices. This paper is the first (to the best of our knowledge) to explicitly use System-R style dynamic programming techniques [30] in order to optimize distributed subgraph pattern matching. However, we find that these traditional algorithms need to be modified in three ways in order to work well for graph data: • Although others have noted that even in the traditional relational context, System-R’s consideration of only left-deep join trees can lead to a suboptimal optimization strategy [21], the consideration of bushy plans for distributed graph pattern matching queries is particularly important in order to reduce network traffic and sizes of intermediate output. The heuristics for which bushy plans to consider should leverage graph characteristics. • Cycles appear more frequently in query patterns over a graph model than data represented in other models. They can be potentially leveraged to improve query performance and should be explicitly considered during plan generation. • In general, distributed subgraph pattern matching is performed by finding components of the subgraph separately, and joining these components together. However, when pure graph patterns are being searched for (without any selection predicates on vertex or edge attributes), the intermediate result sets tend to be extremely large. In this paper we introduce two query optimization frameworks for subgraph pattern matching. In particular, given a data store represented in the graph data model, and a query that requests all instances of a particular graph pattern within the data store, we provide algorithms that generate a series of query execution plans, estimate the cost of each of these plans, and select the plan with lowest cost for execution. We make no assumptions about how the graph data is partitioned across a cluster, except that all (directed) edges emanating from the same vertex are stored on the same physical machine, and that a deterministic function (e.g. a hash function) can be applied to any edge in order to determine its location. Furthermore, we make no assumptions about the subgraph being matched — our algorithms apply to both unattributed subgraphs (where just the structure of the graph pattern is being matched) and attributed subgraphs (where each vertex and edge in the graph data set may have attributes associated with it, and the subgraph may include predicates on these attributes in order to reduce the scope of the search). Our work makes the following contributions: • We propose a dynamic programming-based optimization framework that considers bushy plans without encountering query plan space explosion (Section III). We propose an cycle detection-based optimization framework based on the observation that cycles tend to significantly reduce the size of intermediate result sets (Section IV). • We introduce a computation reusing technique which eliminates repetitive identical subquery execution and redundant network transfer of intermediate result sets within a query (Section VI). Our experimental results show that our proposed techniques improve performance of subgraph pattern matching in a distributed graph store by an order of magnitude over commonly used greedy algorithms, and often result in even larger performance gains. |