دانلود رایگان مقاله انگلیسی مطالعه ای مختصر در خصوص طبقه بندی توالی به همراه ترجمه فارسی
عنوان فارسی مقاله | مطالعه ای مختصر در خصوص طبقه بندی توالی |
عنوان انگلیسی مقاله | A Brief Survey on Sequence Classification |
رشته های مرتبط | مهندسی کامپیوتر، زیست شناسی، بیوانفورماتیک، مهندسی الگوریتم ها و محاسبات و مهندسی نرم افزار |
فرمت مقالات رایگان |
مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF آماده دانلود رایگان میباشند همچنین ترجمه مقاله با فرمت ورد نیز قابل خریداری و دانلود میباشد |
کیفیت ترجمه | کیفیت ترجمه این مقاله متوسط میباشد |
توضیحات | ترجمه این مقاله دارای مشکلات ویرایشی می باشد. |
نشریه | ACM |
مجله | شناسایی SIGKDD |
سال انتشار | 2010 |
کد محصول | F726 |
مقاله انگلیسی رایگان (PDF) |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان (PDF) |
دانلود رایگان ترجمه مقاله |
خرید ترجمه با فرمت ورد |
خرید ترجمه مقاله با فرمت ورد |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات مهندسی کامپیوتر |
فهرست مقاله: چکیده |
بخشی از ترجمه فارسی مقاله: 1- مقدمه 4-2 داده ها ی سری ها ی زمانی |
بخشی از مقاله انگلیسی: 1. INTRODUCTION Sequence classification has a broad range of real-world applications. In genomic research, classifying protein sequences into existing categories is used to learn the functions of a new protein [13]. In health-informatics, classifying ECG time series (the time series of heart rates) tells if the data comes from a healthy person or comes from a patient with heart disease [59]. In anomaly detection/intrusion detection, the sequence of a user’s system access activities on Unix is monitored to detect abnormal behaviors [33]. In information retrieval, classifying documents into different topic categories has attracted a lot of attentions [51]. Other interesting examples include classifying query log sequences to distinguish web-robots from human users [58; 18] and classifying transaction sequence data in a bank for the purpose of combating money laundering [42]. Generally, a sequence is an ordered list of events. An event can be represented as a symbolic value, a numerical real value, a vector of real values or a complex data type. In this paper, we consider sequence data into the following subtypes. • Given an alphabet of symbols {E1, E2, E3, …, En}, a simple symbolic sequence is an ordered list of the symbols from the alphabet. For example, a DNA sequence is composed of four animo acid A, C, G, T and a DNA segment, such as ACCCCCGT , is a simple symbolic sequence. • A complex symbolic sequence is an ordered list of vectors. Each vector is a subset of the alphabet [34]. For example, for a sequence of items bought by a customer over one year, treating each transaction as a vector, a sequence can be h(milk, bread)(milk, egg)· · · (potatos, cheese, coke)i. 4.2 Time Series Data Time series data is an important type of sequence data. In Time Series Data Library [4], time series data across 22 domains, such as agriculture, chemistry, health, finance,industry, are collected. UCR time series data archive [27] provides a set of time series datasets as a benchmark for evaluating time series classification methods. For simple time series data, to apply feature based methods, the feature selection is a challenging task since we cannot do feature enumeration on numeric data. Therefore, distance based methods are widely adopted to classify time series [61; 26; 59; 48]. It is shown that comparing to a wide range of classifiers, such as neural networks, SVM and HMM, 1- nearest neighbor classifier with dynamic time warping distance is usually superior in classification accuracy [61]. To apply feature based methods on simple time series, usually, before feature selection, time series data needs to be transformed into symbolic sequences through discretization or symbolic transformation [40]. Without discretization, Ye et al. [65] propose a method to find time series shapelets and use a decision tree to classify time series. Comparing to distance based methods, feature based methods may speed up the classification process and be able to generate some interpretable results. Model based methods are also applied to classify simple time series, such as HMM which is widely used in speech recognition [47]. Multivariate time series classification has been used for gesture recognition [24] and motion recognition [38]. The multivariate data is generated by a set of sensors which measure the movements of objects in different locations and directions. For multivariate time series classification, Kadous et al. [24] propose a feature based classifier. A set of userdefined meta-features are constructed and a multivariate time series is transformed into a feature vector. Some universal meta-features include the features to describe the trends of increases and decreases and local max or min values. By using those features, multivariate time series with additional non-temporal attributes can be classified by a decision tree. One multivairate time series can be viewed as a matrix. Li et al. [31] propose a method to transform a multivariate time series into a vector through singular value decomposition and other transformations. SVM is then used to classify the vectors. 4.3 Text Data Sequence classification is also widely used in information retrieval to categorize text and documents. The widely used methods for document classification include Naive Bayes [29] and SVM [43]. Text classification has various extensions such as multi-label text classification [67], hierarchical text classification [57] and semi-supervised text classification [46]. Sebastiani et al. [51] provide a more detailed survey on text classification . 5. CONCLUSION In this paper, we provide a brief survey on sequence classification. We categorize sequence data into five subtypes. We group sequence classification methods in feature based methods, sequence distance based methods and model based methods. We also present several extensions of the conventional sequence classification. At last, we compare sequence classification methods applied in different application domains. We notice that most of the works focus on the classification task on simple symbolic sequences and simple time series data. Although there are a few works on multiple variate time series and complex symbolic sequences, the problem of classifying complex sequence data is still open at large. Furthermore, most of the methods are devoted to the conventional sequence classification task. Streaming sequence classification, early classification, semi-supervised classification on sequence data and the combinations of those problems on complex sequence data which have practical applications, present challenges for future studies. |