دانلود رایگان مقاله انگلیسی مدل تركيبی داده كاوی الگوريتم های انتخاب ويژگی و طبقه بندی كننده های يادگيری گروهی برای امتياز دهی اعتباری به همراه ترجمه فارسی
عنوان فارسی مقاله: | مدل تركيبی داده كاوی الگوريتم های انتخاب ويژگی و طبقه بندی كننده های يادگيری گروهی برای امتياز دهی اعتباری |
عنوان انگلیسی مقاله: | A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring |
رشته های مرتبط: | مدیریت، مهندسی صنایع، مهندسی کامپیوتر، مهندسی الگوریتم ها و محاسبات، داده کاوی و بانکداری یا مدیریت امور بانکی |
فرمت مقالات رایگان | مقالات انگلیسی و ترجمه های فارسی رایگان با فرمت PDF میباشند |
کیفیت ترجمه | کیفیت ترجمه این مقاله خوب میباشد |
توضیحات | ترجمه این مقاله به صورت خلاصه انجام شده است. |
نشریه | الزویر – Elsevier |
کد محصول | f291 |
مقاله انگلیسی رایگان (PDF) |
دانلود رایگان مقاله انگلیسی |
ترجمه فارسی رایگان (PDF) |
دانلود رایگان ترجمه مقاله |
خرید ترجمه با فرمت ورد |
خرید ترجمه مقاله با فرمت ورد |
جستجوی ترجمه مقالات | جستجوی ترجمه مقالات |
بخشی از ترجمه فارسی مقاله: 1- مقدمه: |
بخشی از مقاله انگلیسی: 1. Introduction Recently, banks and financial institutions have extensively started to consider the credit risk of their customers. In order to differentiate customers for offering credit services to them and managing their risks, banks have needed to apply credit scoring systems in their procedures (Gray and Fan, 2008). Lately, nonparametric approaches and data mining practices have been used in the area of customer credit scoring. The statistical methods, non-parametric methods, and artificial intelligence (AI) approaches have been suggested in order to provision the credit scoring developments. In addition, ensemble credit scoring methods have been used in many studies. It should be mentioned that a noticeable number of researches have shown that ensemble learning classification approaches in credit scoring have a better performance in comparison with single classifiers. With respect to the review of these studies, there are nine main approaches in credit scoring researches as provided in the following: 1. Single-classifier credit scoring models. 2. Multiple-classifier credit scoring models. 3. Credit scoring models based on statistical methods. 4. Credit scoring models based on AI methods. 5. Linear and non-linear credit scoring models. 6. Parametric credit scoring models, including linear probability model, discriminant analysis model, probit and logit models, etc. 7. Non-parametric (data mining) credit scoring models, including decision tree, K nearest-neighbor (KNN) model, expert system, ANN, fuzzy logic, GA, etc. 8. Ensemble learning credit scoring models. 9. Hybrid credit scoring models. Many researchers have employed the above-mentioned approaches in their investigations. Hu and Ansell (2007) utilized some algorithms, including Naïve Bayes, logistic regression (LR), recursive partitioning, ANN, and sequential minimal optimization (SMO) in their study. In a study by Min and Lee (2008), they applied the credit scoring model based on data envelopment analysis (DEA). In another study, link analysis ranking method with the SVM was used for credit scoring (Xu et al., 2009). Setiono et al. (2009) used GA to optimize the KNN classification algorithm in credit scoring. Moreover, Yeh and Lien (2009) compared the data mining techniques, including KNN, LR, discriminant analysis, Naïve Bayes, ANN, and decision trees. Zhou et al. (2009) used direct search for parameters selection in the SVM classification algorithm. In a study by Ping and Yongheng (2011), neighborhood rough set and the SVM-based classifier were used for credit scoring. In another study (Kao et al., 2012), Bayesian latent variable model with classification regression tree was employed. Vukovic et al. (2012) used the preference theory functions in the casebased reasoning (CBR) model for credit scoring model. Danenas and Garsva (2015) applied particle swarm optimization (PSO) for the optimal linear SVM classifier selection in the domain of credit risk. As cited above, recently, the ensemble credit scoring models have been used in a number of researches. Tsai and Wu (2008) applied multilayer perceptron (MLP) neural network ensembles for the credit scoring problem. In an investigation by Nanni and Lumini (2009), an ensemble of classifiers, including bootstrap aggregating (Bagging), Random Subspace, Class Switching, and Random Forest, was involved in the credit scoring. In addition, the ensemble of classifiers, including ANN, decision tree, Naïve Bayes, KNN, and logistic discriminant analysis was applied by Twala (2010). Hsieh and Hung (2010) utilized bagging ensemble classi- fier, including ANN, SVM, and Bayesian network. In another study by Paleologo et al. (2010), the subagging ensemble classifier, including kernel SVM, KNN, decision trees, AdaBoost, and subagged classifiers, was used in the credit scoring. Wang and Ma (2012) proposed a hybrid ensemble learning approach using SVM as a base learner for enterprise credit risk assessment. Several studies have deployed the FS approach in their credit scoring models. Wang and Huang (2009) applied evolutionarybased FS approaches in a case study of credit approval data. Tsai (2009) compared five famous FS methods used in bankruptcy prediction, which were t-test, correlation matrix, stepwise regression, PCA, and factor analysis, in order to examine their performance by using MLP neural networks. Chen and Li (2010) proposed a combined strategy of FS approaches, including Linear Discriminant Analysis (LDA), rough set theory, decision tree, F-score and SVM classification model in credit scoring. In a research by Wang et al. (2012), the rough set and scatter search meta heuristic in FS were used for credit scoring. Chen (2012) developed an integrated FS and a cumulative probability distribution approach based on rough sets in credit rating classification. Hajek and Michalak (2013) suggested an approach to combine the mixed and individual FS methods with well-known machine learning models, such as MLP, radial basis function (RBF), SVM, Naive Bayes, random forest, LDA, and nearest mean classifier in corporate credit rating prediction. Oreski and Oreski (2014) presented a new hybrid GA with ANN to identify an optimum feature subset in order to increase the classification accuracy and scalability in credit risk assessments. Liang et al. (2015) deployed three filters including LDA, t-test, and linear regression, and two wrappers including GA and PSO based FS methods, combined with six different prediction models, namely linear SVM, RBF SVM, KNN, Naive Bayes, classification and regression tree (CART), and MLP under some experiments in bankruptcy and credit scoring datasets. The above-mentioned studies have been stated from three main viewpoints as follows: (1) General credit scoring studies (2) Ensemble credit scoring studies (3) FS based credit scoring studies. This article is differentiated from the rest of the papers due to the simultaneous consideration of these three viewpoints. It is worth mentioning that past studies have considered only one or two of these viewpoints. In addition, it should be stated that the main aim of this article is to propose a proper FS algorithm and an appropriate base and ensemble classifier via three types of evaluation approaches, i.e., the SVM classification accuracy (only for FS), classification accuracy, and the area under the receiver operating characteristic curve (AUC) for classifiers and parameters setting (for both) in the context of hybrid credit scoring model. Moreover, many studies have not examined the effect of several FS methods and classifier parameter setting on the credit scoring problem. As another distinguished aspect, on the basis of the aforementioned attentions, in the present paper, nine approaches are combined in order to build a new hybrid FS and ensemble learning credit scoring model. The proposed model is a combination of FS techniques and several base (single) classifiers and ensemble classifiers in the parametric (owing to the Naïve Bayes algorithm) and non-parametric approaches of credit scoring. The parameters setting of four FS algorithms and two types of classi- fication algorithms (base and ensemble) are used. For each FS algorithm, the performance is examined in terms of the SVM classification accuracy measure. The SVM is an influential learning method for classification problems. As cited by Brown and Mues (2012), “it is based on construction of maximum-margin separating hyper plane in some transformed feature space”. It should be indicated that SVM is one of the most popular techniques used in the literature. Then, SVM is utilized to evaluate the performance FS algorithms. Moreover, the classification algorithms are compared according to the classification accuracy and AUC measures. For experimental results, the dataset of the ‘Export Development Bank of Iran’ is used. In the hybrid model, four FS algorithms are used as follows: (1) PCA; (2) GA; (3) Information gain ratio; and (4) Relief algorithm. Furthermore, two types of classification algorithms prevalent in the previous studies are as follows: (1) Base classifi- cation algorithms: Naïve Bayes, CART decision tree, SVM, and ANN; (2) Ensemble classification algorithms: bagging, AdaBoost, random forest, and staking. The results can confirm that the hybrid model of credit scoring has a robust functioning in comparison with the other classification algorithms presented in this paper. As an abstract representation, the main contributions of this study reflected in the proposed model are as follows: 1. Providing a comprehensive study by comparing different FS methods and classifiers, with respect to the credit scoring problem. 2. Hybrid simultaneous use of three general, ensemble, and FS based credit scoring approaches. 3. Using FS algorithms and comparing their performance with the aid of the accuracy measure of the SVM classification algorithm and also the accuracy and AUC measures of the base and ensemble classifiers. 4. Employing the parameters setting procedures for FS and classification algorithms in order to improve the credit scoring performance with an iterative manner. 5. Simultaneous use and comparison of the base and ensemble learning classification algorithms in the proposed credit scoring model. 6. Using and comparing nine approaches of credit scoring models employed in the literature in an integrated framework. 7. Whereas credit scoring in the studies is mostly based on real customers, in the current study, the credit scoring model is built based on legal customers. This paper has been structured as follows. In Section 2, the related methods used in this paper are briefly described. In Section 3, the experimental design is presented, including the dataset description and pre-processing, performance evaluation, and development of the hybrid credit scoring model. By using a case study, the experimental results and discussions are elaborated in Section 4. Finally, Section 5 is devoted to the conclusions as well as the future recommendations of the paper. |