دانلود رایگان مقاله انگلیسی رویکرد نمونه هدایت شده برای تغییرات آرام در پایگاه ساختار بندی افزایش اطلاعات وب به همراه ترجمه فارسی
عنوان فارسی مقاله: | رویکرد نمونه هدایت شده برای تغییرات آرام در پایگاه ساختار بندی افزایش اطلاعات وب |
عنوان انگلیسی مقاله: | A Sample-Guided Approach to Incremental Structured Web Database Crawling |
رشته های مرتبط: | مهندسی کامپیوتر و فناوری اطلاعات، مدیریت سیستم های اطلاعاتی، اینترنت و شبکه های گسترده و مهندسی نرم افزار |
فرمت مقالات رایگان | مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF میباشند |
کیفیت ترجمه | کیفیت ترجمه این مقاله پایین میباشد |
نشریه | آی تریپل ای – IEEE |
کد محصول | f289 |
مقاله انگلیسی رایگان (PDF) |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان (PDF) |
دانلود رایگان ترجمه مقاله |
خرید ترجمه با فرمت ورد |
خرید ترجمه مقاله با فرمت ورد |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات |
بخشی از ترجمه فارسی مقاله: 1- مقدمه |
بخشی از مقاله انگلیسی: I. INTRODUCTION The Deep Web refers to the data residing in web databases, and most of its content is in form of structured data records[1]. The Deep Web is believed to be the largest source of structured data on the Web and hence Deep Web data integration has been a long standing challenge in the field of Web data management. A promising solution for Deep Web data integration is web database crawling[2]. Crawling-based solution targets at gathering structured records from web databases to make users search and mine the Deep Web in a centralized manner. The rapid development of computer hardware and Internet makes this solution more practical than before. To the best of our knowledge, previous efforts[3][4][5][6] only focus on crawling the whole web database with the goal of maximizing the coverage of the web database. We call this approach “exhaustive crawling”. As it is widely known, most web databases are highly dynamic, e.g. new records are always being inserted constantly. To assure the local database is consistent to the integrated web databases, the maintenance operation has to be performed. However, it is not affordable to always apply the exhaustive-crawling approach to harvest a small quantity of new records(compared to the whole web database), which can result in the heavy burdens for both web databases and the network. In this paper, we study a crucial but largely unresolved problem in the crawling-based solution: how to obtain the new records without crawling the whole web database? To this end, we propose a sample-guided incrementalcrawling approach. The basic idea of this approach is described as follows. First, a small number of random samples are harvested from the web database. Then, by analyzing the deviation between the samples and the history version of the web database, an appropriate record is selected to generate the promising query for crawling new records. In this approach, we propose query-related graph model, and hence, any given web database can be represented as an undirected graph based on the model. The incremental crawling task is thus transformed into a graph traversal process in which the crawler starts with the graph of the samples of the web database and at each step a vertex v is selected and an appropriate query is generated using the selected vertex for crawling. Since the only general way of accessing a web database is through its query interface, automatic query generation is the key of our approach. Our goal is to maximize the coverage of the new records and minimize the coverage of the old ones of at the same time. As the initial effort to address the incremental web database crawling problem, the contribution of the paper is summarized as follows. First, we identify this novel problem of incremental web database crawling. Contrary to the previous exhaustivecrawling works, we demonstrate that a central issue of efficient web database crawling lies in the consistency between the local database and the integrated web databases. Second, we provide a theoretical framework that formally models query-based web database crawling as graph traversal. Different to the attributelevel graph models proposed by previous works(e.g. [3]), our graph model is on record level, which can characterize whether any two records are query related in a straightforward way. Third, based on the graph model, we propose simple and smart methods for the key problems in the incremental-crawling approach, which aims at generating promising queries to harvest the new records as many as possible. The rest of this paper is organized as follows: Section 2 presents the preliminaries. Section 3 introduces the queryrelated graph model. The query selection method based on the query-related graph model is proposed in Section 4. We discuss our experimental findings in Section 5. Section 6 reviews some related work. Section 7 concludes this paper. |