دانلود رایگان مقاله انگلیسی دسته بندی اسناد متنی بر اساس ماشین های دارای بردار پشتیبان مربع حداقل با تجزیه مقدار واحد به همراه ترجمه فارسی
عنوان فارسی مقاله | دسته بندی اسناد متنی بر اساس ماشین های دارای بردار پشتیبان مربع حداقل با تجزیه مقدار واحد |
عنوان انگلیسی مقاله | Text Document Classification based-on Least Square Support Vector Machines with Singular Value Decomposition |
رشته های مرتبط | مهندسی کامپیوتر و فناوری اطلاعات، مدیریت سیستم های اطلاعاتی و رایانش ابری |
کلمات کلیدی | طبقه بندی متن، ماشین های دارای بردار پشتیبان مربع حداقل، تجزیه مقدار واحد |
فرمت مقالات رایگان |
مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF آماده دانلود رایگان میباشند همچنین ترجمه مقاله با فرمت ورد نیز قابل خریداری و دانلود میباشد |
کیفیت ترجمه | کیفیت ترجمه این مقاله متوسط میباشد |
مجله | مجله بین المللی کاربرد های کامپیوتر – International Journal of Computer Applications |
سال انتشار | 2011 |
کد محصول | F667 |
مقاله انگلیسی رایگان (PDF) |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان (PDF) |
دانلود رایگان ترجمه مقاله |
خرید ترجمه با فرمت ورد |
خرید ترجمه مقاله با فرمت ورد |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات |
فهرست مقاله: چکیده |
بخشی از ترجمه فارسی مقاله: 2. پیش پردازش 7. نتیجه گیری و تحقیقات آینده |
بخشی از مقاله انگلیسی: 2. PREPROCESSING In order to obtain all words that are used in a given text, a tokenization process is required, i.e. a text document is split into a stream of words by removing all punctuation marks and by replacing tabs and other non-text characters by single white spaces. This tokenized representation is then used for further processing. In order to reduce the size of the set of words describing document can be reduced by filtering and stemming. In this section, we describe our proposed preprocessing method for creating the optimistic vector space model. Our proposed preprocessing method leads to the optimal creation of the vector space model with less time complexity. In our preprocessing approach we collect all the stopwords, which are commonly available. Now uses the ASCII values of each letter without consider case(either lower case or upper case) and sum the each letter corresponding ASCII value for every word and generate the number. Assign number to corresponding word, and keep them in sorted order. Suppose for example the word ―and‖, corresponding ASCII value of a=97,n=111and d=101then the total word ―and‖ value is 309.similarily for word ―to‖ is 127+122=249. But in this approach there is chance that the ascii sum of the two word’s values can be same as shown with the below example , the word ―ask‖ sum value is 97+115+107=319 and the word ―her‖ sum value is 104+101+111=319. Solution for above mentioned problem is during the comparison we can compare with the ascii sum value and in the corresponding array we can take stopwords string. So that we can compare with the string and confirm that will be no loss of key words and also we should create a subset of strings with same ascii sum so that it is enough to compare with only that subset. For searching of ASCII values we used for individual letters used interpolation search method to get quick corresponding value. The above proposal incorporates that into a porter stemming algorithm for stemming that gives effective preprocessing of document. The Porter stemmer is divided into five steps, in step1 removes the i-suffixes and step 2 to 4the d-suffixes. Composite d-suffixes are reduced to single d-suffixes one at a time. So for example if a word ends icational, step 2 reduces it to icate and step 3 to ic. Three steps are sufficient for this process in English. Step 5 does some tidying up. 7. CONCLUSION AND FUTURE WORK In this paper, we used preprocessing method stemming with ASCII based, to eliminate the stopwords and find keywords from the verbs and nouns from the document. Finding keywords we used entropy based approach which is best to find the keywords in the input documents. Used SVD method is to reduce the dimensionality of the input term-document matrix. This paper proposes new algorithm called LS-SVM which combines the advantages of LSI and SVM. The experiment results also confirm that LS-SVM is a very practical and effective method for classification of documents. In future work, we will continue our focus on improving the efficiency and scalability of our preprocessing and classification schemes especially in the multiple theme documents. |