دانلود رایگان مقاله انگلیسی یک روش کلی برای معنا كردن كلمه: ابهام زدایی در ویکی پدیابه همراه ترجمه فارسی
عنوان فارسی مقاله: | یک روش کلی برای معنا كردن كلمه: ابهام زدایی در ویکی پدیا |
عنوان انگلیسی مقاله: | A Generalized Method for Word Sense: Disambiguation based on Wikipedia |
رشته های مرتبط: | مهندسی کامپیوتر، مهندسی نرم افزار و هوش مصنوعی |
فرمت مقالات رایگان | مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF میباشند |
کیفیت ترجمه | کیفیت ترجمه این مقاله خوب میباشد |
توضیحات | ترجمه صفحات پایانی مقاله موجود نیست. |
نشریه | اسپرینگر – Springer |
کد محصول | f172 |
مقاله انگلیسی رایگان |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان |
دانلود رایگان ترجمه مقاله |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات مهندسی کامپیوتر |
بخشی از ترجمه فارسی مقاله: چکیده در این مقاله ما یک چارچوب کلی برای ابهامزدایی مفهوم کلمه با استفاده از دانش نهفته در ویکیپدیا پيشنهاد ميكنيم. اليالخصوص، از مجموعه متون غنی و در حال رشد ویکیپدیا به منظور دستیابی به مخزن دانش بزرگ و قوی متشکل از عبارات كليدي ها و مباحث منتخب مرتبط با آنها بهرهبرداري مينماييم. عبارات كليدي عمدتاً از عناوین مقالات ویکیپدیا و متون مرجع مرتبط با لينكهاي ويكي مشتق شده است. ابهامزدایی از عبارات كليدي هم بر اساس عموميت موضوع منتخب و هم ارتباط وابسته به متن است که در آن اطلاعات متني غیرضروری (و به طور بالقوه مختلكننده) حذف شدهاند. ما با ارزیابیهای گسترده تجربی با استفاده از مقياسهاي مختلف ارتباطي، نشان ميدهيم که روش پیشنهادی به دقت ابهامزدایی قابل مقايسهاي نسبت به تکنیک های پيشرفته، دست مييابد، در حالی که مقدار هزینه محاسبه کمتري را متحمل ميشود. |
بخشی از مقاله انگلیسی: Abstract In this paper we propose a general framework for word sense disambiguation using knowledge latent in Wikipedia. Specifically, we exploit the rich and growing Wikipedia corpus in order to achieve a large and robust knowledge repository consisting of keyphrases and their associated candidate topics. Keyphrases are mainly derived from Wikipedia article titles and anchor texts associated with wikilinks. The disambiguation of a given keyphrase is based on both the commonness of a candidate topic and the context-dependent relatedness where unnecessary (and potentially noisy) context information is pruned. With extensive experimental evaluations using different relatedness measures, we show that the proposed technique achieved comparable disambiguation accuracies with respect to state-of-the-art techniques, while incurring orders of magnitude less computation cost. 1 Introduction Word sense disambiguation (WSD) is the problem of identifying the sense (meaning) of a word within a specific context. In our daily life, our brain subconsciously relates an ambiguous word to an appropriate meaning based on the context it appears. In natural language processing, word sense disambiguation is thus the task of automatically determining the meaning of a word by considering the associated context(s). It is a complicated but crucial task in many areas such as topic detection and indexing [7, 13], cross-document co-referencing [2, 18], and web people search [1, 12, 22]. Given the current explosive growth of online information and content, an efficient and high-quality disambiguation method with high scalability is of vital importance. Two main approaches can be found in the literature that try to address the issue, namely knowledge-based methods and supervised machine learning methods. The former relies primarily on dictionaries, thesauri, or lexical knowledge bases, e.g., a sense inventory consisting of words/phrases and definitions of their possible senses. The Lesk algorithm [11] is the seminal algorithm of such kind, with the assumption that the words referring to the same meaning share a common topic in their neighborhood. Following this idea, a lot of works attempted to identify the correct meaning for a word by maximizing the agreement between the dictionary definitions and the contextual terms of the given ambiguous word. Within the disambiguation process, a high-quality sense inventory is a critical factor that affects the performance. However, building such a large-scale, machine-readable lexical resource is tedious and laborious. Thus, the knowledge acquisition bottleneck is the main problem limiting the performance of such systems. The second method based on supervised machine learning attempts to derive a set of local and global contextual features from a manually sense-tagged dataset and to integrate these training examples into a machine learning classi- fier. Many machine learning techniques have been applied to WSD, and shown to be successful [6, 10, 17]. Nevertheless, machine learning methods too suffer from the knowledge acquisition bottleneck since they require substantial amounts of training examples. In this paper, we propose a generalized method exploring the use of Wikipedia as the lexical resource for disambiguation. Wikipedia is the largest online encyclopedia and collaborative knowledge repository in the world with over 3.2M articles in English alone. It provides with a reasonably broad if not exhaustive coverage of topics, in comparison to many other knowledge bases. Previous study has found that the quality of Wikipedia articles is comparable to the editor-based encyclopedia [5]. Because of its massive scale of collaboration as well as usage, Wikipedia has become a fruitful resource in many research areas in recent years. The proposed disambiguation framework is illustrated in Figure 1. Three key components, Wikipedia inventory, keyphrase identification and pruning, and sense disambiguator are developed in our work for disambiguation. Specifically, we build a word sense inventory by extracting the polysemy, synonym and hyperlinks encoded in Wikipedia. Each entry in the inventory is a keyphrase which refers to at least one Wikipedia article. To be detailed in Section 3.1, a keyphrase is either a Wikipedia article title, or the surface form (or anchor text) of a wikilink. Those keyphrases, each of which refers to exactly one Wikipedia article, are unambiguous keyphrases. Some keyphrases are ambiguous; each of which refers to multiple Wikipedia articles (i.e., candidate topics/senses, shown in Figure 1). Given a document, the unambiguous keyphrases recognized from the document serve as context information to disambiguate the ambiguous keyphrases. In between, the keyphrase pruning helps identify the most important keyphrases in the context of the occurrence of the given ambiguous keyphrase for disambiguation, and it can largely filter out the noise and improve efficiency of the system. The disambiguator is the core component of our framework. It aims to balance the agreement between the context of the ambiguous keyphrase and the context of each candidate sense. Empirical evaluations based on a ground-truth dataset illustrate that our method outperforms other state-of-the-art approaches in terms of both effectiveness and efficiency. Moreover, since the Wikipedia inventory we create relies on the rich semantic information contained in Wikipedia, our approach avoids the traditional knowledge acquisition bottleneck and is applicable to any domain of varying size. It can be plugged into the existing works which require to address word sense disambiguation as well as potential applications. Our approach is general enough in several senses: given rather exhaustive coverage of Wikipedia topics, the Wikipedia inventory is domain independent; given Wikipedia’s growing popularity in other languages, our approach can be readily reused across different languages; and finally, the modular framework allows for using different relatedness measures suiting different application needs. The rest of this paper is structured as follows: Section 2 reviews related works. Section 3 introduces our approach along with the individual components in the proposed framework. In Section 4, we present and discuss the experimental results. Finally, we conclude in Section 5. |