Implementasi Metode Bagging Nearest. Neighbor Support Vector Machine Untuk
. Prediksi Kebangkrutan. Penyusun: M. Ulin Nuha – 5108100164.
PRESENTASI TUGAS AKHIR – KI091391
Implementasi Metode Bagging Nearest Neighbor Support Vector Machine Untuk Prediksi Kebangkrutan (Keyword: Prediksi kebangkrutan, BNNSVM, Bootstrap aggregating, K-nearest neighbor, Support Vector Machine)
Penyusun: M. Ulin Nuha – 5108100164
Dosen Pembimbing: Isye Arieshanti, S.Kom., M.Phil Yudhi Purwananto, S.Kom., M.Kom.
Latar Belakang
Tujuan Permasalahan Pengembangan Perangkat Lunak
Kesimpulan Daftar Pustaka
Latar Belakang
Tujuan
Permasalahan Pengembangan Perangkat Lunak Kesimpulan
Daftar Pustaka
Latar Belakang Krisis Finansial Global Perusahaan Bangkrut Prediksi Kebangkrutan Bagging Nearest Neighbor Support Vector Machine
Latar Belakang
Tujuan
Permasalahan Pengembangan Perangkat Lunak Kesimpulan
Daftar Pustaka
Tujuan Implementasi Bagging Nearest Neighbor Support Vector Machine (BNNSVM)
Prediksi Kebangkrutan
Latar Belakang
Tujuan
Permasalahan Pengembangan Perangkat Lunak Kesimpulan
Daftar Pustaka
Permasalahan Bagaimana mengimplementasikan metode Bagging Nearest Neighbor Support Vector Machine (BNNSVM) untuk prediksi kebangkrutan? Bagaimana menguji model BNNSVM untuk memprediksi kebangkrutan perusahaan?
Latar Belakang
Tujuan
Permasalahan Pengembangan Perangkat Lunak Kesimpulan
Daftar Pustaka
Pengembangan Perangkat Lunak
Studi Literatur
Desain dan Implementasi
Uji Coba
Studi Literatur
Desain dan Implementasi
SVM (Support Vector Machine)
KNN (KNearest Neighbor)
Bagging (Bootstrap Aggregating)
BNNSVM
Uji Coba
K-Nearest Neighbor
X
X
(b) 2-nearest neighbor
(c) 3-nearest neighbor
X
(a) 1-nearest neighbor
Jarak antar data dapat dihitung dengan Euclidean distance 𝑛
(𝑝𝑖 − 𝑞𝑖 )2
𝑑 𝑝, 𝑞 = 𝑖=1
SVM – Konsep Dasar
Mencari hyperplane (decision boundary) yang memisahkan data
SVM – Konsep Dasar B1
Salah satu solusi
SVM – Konsep Dasar
B2
Solusi lain
SVM – Konsep Dasar B1
w x b 0 w x b 1
w x b 1
b11 b12
SVM – Konsep Dasar
Optimisasi permasalahan 𝑚𝑖𝑛 𝑤
dengan konstrain 𝑦𝑖 .
1 𝑤 2
2
𝑤. 𝑥𝑖 + 𝑏 ≥ 1, 𝑖 = 1, … , 𝑛
Decision function 𝑓 𝑥 = 𝑠𝑖𝑔𝑛(𝑤. 𝑥 + 𝑏) ◦ ◦ ◦ ◦
w : nilai vektor yang unik dari hyperplane 𝑏 : nilai tahanan (intercept) dari hyperplane y : kelas label (+1, -1) x : vektor berisi nilai atribut masing-masing data
SVM – Konsep Dasar B1
w x b 0 w x b 1
w x b 1
b11
if w x b 1 1 f ( x) 1 if w x b 1
b12
2 Margin || w ||
SVM – Soft Margin
Data tidak dapat dipisahkan secara linier
SVM – Soft Margin w x b 0
w x b 1
Variabel slack ξ
𝑤. 𝑥 + 𝑏 = −1 + ξ
SVM – Soft Margin
Optimisasi permasalahan min w,b,𝜉
1 𝑇 𝑤 𝑤+𝐶 2
𝑙
𝜉𝑖 𝑖=1
dengan konstrain
𝑦𝑖 . 𝑏 + 𝑤 𝑇 . 𝛷(𝑥𝑖 )
≥ 1 − 𝜉𝑖 ; 𝑖 = 1, … , 𝑙 ; 𝜉𝑖 ≥ 0
Nilai cost C ◦ Semakin besar, kemungkinan error juga semakin besar
SVM – Kernel Trick
Decision boundary tidak linier
SVM – Kernel Trick
Transformasi data ke dimensi yang lebih tinggi
SVM – Kernel Trick Jenis Kernel Linear Polynomial Radial Basis Function (RBF)
Fungsi 𝐾 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 . 𝑥𝑗 𝐾 𝑥𝑖 , 𝑥𝑗 = (𝛾. 𝑥𝑖 . 𝑥𝑗 + 𝑐𝑛 )𝑑 𝐾 𝑥𝑖 , 𝑥𝑗 = exp(−𝛾 ||𝑥𝑖 − 𝑥𝑗 ||2 )
Bagging (Bootstrap Aggregating) Data latih asli
D
D1
C1
D2
C2
…
…
C*
Dn-1
Cn-1
Dn
Cn
Membuat n data latih baru dengan sampling with replacement
Membuat n model klasifikasi
Menggabungkan (voting) hasil prediksi
Bagging (Bootstrap Aggregating)
Sampling with replacement
Data latih asli
1
2
3
4
5
6
7
8
9
Bagging (1)
7
8
9
8
2
5
9
2
1
Bagging (2)
1
4
9
1
2
3
2
7
3
Bagging (3)
1
8
5
9
5
5
9
6
3
Studi Literatur
Desain dan Implementasi
Data
Proses
Antarmuka
Desain dan Implementasi
Uji Coba
Data: Input Wieslaw dataset Australian credit approval dataset
Dataset
Jumlah Record +
Jumlah Record -
Jumlah Record
Jumlah Atribut
Wieslaw
128
112
240
30
Australian credit approval
383
307
690
14
Data: Output Data output latih: model klasifikasi Data output uji: hasil prediksi
◦ -1 bangkrut (Wieslaw) kredit ditolak (Australian)
◦ +1 tidak bangkrut (Wieslaw) kredit diterima (Australian)
Proses: Pembagian Data Cross Validation
Pembagian Data Latih Dataset
Proses: Pembagian Data Cross Validation
Pembagian Data Latih
Dataset
Cross Validation
Data Latih 1
Data Uji 1
…
Data Latih n
Data Uji n
Proses: Pembagian Data Cross Validation
Data Uji
Pembagian Data Latih
Data Latih
Cross Validation
Data trs 1
Data ts 1
…
Data trs n
Data ts n
Proses: Latih Bagging Data trs
Data ts
Data Uji
KNN
SVM training
Proses: Latih
Data Uji
Bagging
KNN
Data trs
Data trs
Data ts
Bagging
Bootstrap sample 1
Bootstrap sample 2
…
Bootstrap sample 9
SVM training
Bootstrap sample 10
Proses: Latih
Data Uji
Bagging
KNN
SVM training
Data trs
Bagging
Data ts
Bootstrap sample 1
Bootstrap sample 2
…
Bootstrap sample 9
Bootstrap sample 10
Proses: Latih
Data Uji
Bagging
KNN
SVM training
Data ts
Bootstrap sample 1
Bootstrap sample 2
K-Nearest Neighbor
K-Nearest Neighbor
KNN 1
KNN 2
…
…
Bootstrap sample 9
Bootstrap sample 10
K-Nearest Neighbor
K-Nearest Neighbor
KNN 9
KNN 10
Proses: Latih
Data Uji
Bagging
KNN
SVM training
Data ts
Bootstrap sample 1
Bootstrap sample 2
K-Nearest Neighbor
K-Nearest Neighbor
KNN 1
KNN 2
…
…
Bootstrap sample 9
Bootstrap sample 10
K-Nearest Neighbor
K-Nearest Neighbor
KNN 9
KNN 10
Proses: Latih
Data Uji
Bagging KNN 1
KNN 2
SVM Training
SVM Training
SVM Model 1
SVM Model 2
KNN …
…
SVM training
KNN 9
KNN 10
SVM Training
SVM Training
SVM Model 9
SVM Model 10
Proses: Uji
Data Uji
SVM testing
SVM Model 1
SVM Model 2
Bagging
…
SVM Model 9
SVM Model 10
Proses: Uji
Data Uji
SVM testing
SVM Model 1
SVM Model 2
Bagging
…
SVM Model 9
SVM Model 10
Proses: Uji SVM testing
Bagging Data Uji
SVM Model 1
SVM Model 2
SVM Testing
SVM Testing
Prediksi 1
Prediksi 2
…
…
SVM Model 9
SVM Model 10
SVM Testing
SVM Testing
Prediksi 9
Prediksi 10
Proses: Uji SVM testing
Bagging Data Uji
SVM Model 1
SVM Model 2
SVM Testing
SVM Testing
Prediksi 1
Prediksi 2
…
…
SVM Model 9
SVM Model 10
SVM Testing
SVM Testing
Prediksi 9
Prediksi 10
Proses: Uji SVM testing Prediksi 1
Prediksi 2
Bagging …
Bagging (voting)
Prediksi Akhir
Prediksi 9
Prediksi 10
Antarmuka
Studi Literatur
Desain dan Implementasi
Nilai k (KNN) Nilai cost & Jenis Kernel (SVM) Perbandingan dengan metode lain
Uji Coba
Uji coba dengan nilai k berbeda (Wieslaw) Sensiti- Specifivity city
90
k
Akurasi
Presisi
1
65.67
67.88
69.76
61.67
2
68.92
71.39
70.61
67.18
3
68.92
72.41
69.85
69.78
80
4
70.58
72.87
71.99
68.6
75
5
70.83
73.82
71.03
70.25
6
69.42
72.78
69.06
69.48
70
7
70.5
75.3
68.76
72.5
65
8
70.58
73.55
70.41
70.91
9
70.83
74.39
70.55
72.58
10
71.58
73.99
71.86
71.08
85
60
1 2 3 4 5 6 7 8 9 10 Akurasi Sensitivity
Presisi Specificity
Uji coba dengan nilai k berbeda (Australian) k
Akurasi
Presisi
Sensiti -vity
Specifi -city
90
1
84.49
83.91
81.17
87.22
85
2
84.9
83.94
82.35
87.19
3
84.93
83.78
82.76
87.04
80
4
85.74
84.81
83.02
87.84
75
5
85.3
83.19
84.54
86.28
6
84.43
83.53
81.88
86.51
70
7
85.07
83.08
83.94
86.16
65
8
86.23
85.92
83.06
88.95
9
85.77
84.96
82.95
88.1
10
85.22
84.68
82.34
87.3
60
1 2 3 4 5 6 7 8 9 10 Akurasi Sensitivity
Presisi Specificity
Uji coba dengan nilai cost SVM berbeda (Wieslaw) Cost
Akurasi Presisi
Sensiti- Specifivity city
95
0.01
67.58
68.22
75.51
59.25
90
0.1
70.17
72.66
72.12
66.93
85
1
71.08
75.12
71.61
72.17
10
70.33
72.58
70.85
69.47
100
71
75.25
70.03
72.87
80 75 70 65 60 55
0.01
0.1
Akurasi Sensitivity
1
10
100
Presisi Specificity
Uji coba dengan nilai cost SVM berbeda (Australian) Cost
Akurasi Presisi
Sensiti- Specifivity city
95
0.01
83.51
86.86
74.5
90.7
90
0.1
85.54
82.51
85.55
85.44
85
1
84.64
83.64
82.09
86.56
10
80.72
81.68
73.79
86.27
100
75.8
78.57
65.46
83.85
80 75 70 65 60 55
0.01
0.1
Akurasi Sensitivity
1
10
100
Presisi Specificity
Uji coba dengan kernel RBF dan nilai gamma berbeda (Wieslaw) Gamma Akurasi Presisi
Sensiti- Specifivity city
100
0.0001
54.17
58.36
73.19
35.07
0.001
58.75
60.59
65.61
50.55
0.01
56.83
57.28
74.14
37.45
0.1
54.5
54.36
89.48
14.33
40
1
53.25
53.3
99.87
0
20
10
52.67
51
96
4
80 60
0
Akurasi Sensitivity
Presisi Specificity
Uji coba dengan kernel RBF dan nilai gamma berbeda (Australian) Gamma Akurasi Presisi
Sensiti- Specifivity city
100
0.0001
67.88
68.43
52.17
80.79
0.001
68.81
64.98
64.13
72.45
0.01
56.06
51.05
14.67
89.41
0.1
55.04
16
0.4
98.89
40
1
55.51
0
0
100
20
10
55.51
0
0
100
80 60
0
Akurasi Sensitivity
Presisi Specificity
Uji coba dengan kernel Polynomial dan nilai degree berbeda (Wieslaw) Degree Akurasi Presisi
Sensiti- Specifivity city
1
65.25
66.92
71.77
58.96
2
71.33
72.34
74.22
67.8
3
69.67
72.22
69.36
69.95
4
68.08
71.42
68.64
68.47
5
70.08
72.25
71.11
69.32
100 90 80
70 60 50
1
2
Akurasi Sensitivity
3
4 Presisi Specificity
5
Uji coba dengan kernel Polynomial dan nilai degree berbeda (Australian) Degree Akurasi Presisi
Sensiti- Specifivity city
1
79.83
89.47
62.87
93.85
2
80.32
85.79
67.64
90.48
3
72.35
75.74
62.65
79.87
4
60.09
64.66
62.55
57.88
5
57.97
64.9
52.6
62.46
100 90 80
70 60 50
1
2
Akurasi Sensitivity
3
4 Presisi Specificity
5
Uji coba perbandingan dengan metode klasifikasi lain (Wieslaw) Metode Akurasi
Presisi
Sensitivity
Spesificity
KNN
75
76.22
78.04
73.03
ANN
70
70
70
70
SVM
70.42
74.29
69.91
74.82
BLR
87.54
90.68
86.42
11.07
BNNSVM
71.58
73.99
71.86
71.08
KNN = K-Nearest Neighbor ANN = Artificial Neural Network SVM = Support Vector Machine BLR = Binary Logistic Regression BNNSVM = Bagging Nearest Neighbor Support Vector Machine
100 95 90 85 80 75 70 65
Akurasi Sensitivity
Presisi Spesificity
Uji coba perbandingan dengan metode klasifikasi lain (Australian) Metode
Akurasi
Presisi
Sensitivity
Spesificity
KNN
83.19
80.9
80.5
85.2
ANN
83.48
83.5
85.11
87.94
SVM
84.35
77.03
93.44
77.92
BLR
80.83
81.73
75.89
14.84
BNNSVM
86.23
85.92
83.06
88.95
KNN = K-Nearest Neighbor ANN = Artificial Neural Network SVM = Support Vector Machine BLR = Binary Logistic Regression BNNSVM = Bagging Nearest Neighbor Support Vector Machine
100 95 90 85 80 75 70 65
Akurasi Sensitivity
Presisi Spesificity
Latar Belakang
Tujuan
Permasalahan Pengembangan Perangkat Lunak Kesimpulan
Daftar Pustaka
Prediksi Kebangkrutan
BNNSVM Hasil
Dataset
Akurasi
Presisi
Sensitivity
Specificity
Wieslaw
71.58 %
73.99 %
71.86 %
71.08 %
Australian credit approval
86.23 %
85.92 %
83.06 %
88.95 %
Latar Belakang
Tujuan
Permasalahan Pengembangan Perangkat Lunak Kesimpulan
Daftar Pustaka
Daftar Pustaka Li, H., & Sun, J. (2011). Forecasting Business Failure: The Use of Nearest-Neighbour Support Vectors and Correcting Imbalanced Samples - Evidence from Chinese Hotel Industry. Tourism Management , XXXIII (3), 622-634. Frank, A., & Asuncion, A. (2010). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. Diambil kembali dari http://archive.ics.uci.edu/ml Wieslaw, P. (2004). Application of Discrete Predicting Structures in An Early Warning Expert System for Financial Distress. Tourism Management.
Tan, P. N., Steinbach, M., & Kumar,V. (2006). Introduction to Data Mining (4th ed.). Boston: Pearson Addison Wesley.
Terima kasih