Kombinasi Linier Target Data Untuk Regresi Multitarget Menggunakan Principal Component Analysis

Penulis

  • Yonathan Purbo Santosa Universitas Katolik Soegijapranata

DOI:

https://doi.org/10.54914/jtt.v9i1.516

Kata Kunci:

PCA, reduksi dimensi, regresi linier, regresi multidimensi, regresi multitarget

Abstrak

Regresi linier adalah metode untuk memprediksi sebuah nilai (variabel dependen) berdasarkan beberapa input (variabel independen). Permasalahan pada regresi linier adalah beberapa data tidak termasuk ke dalam kategori linier. Sebuah metode bernama RLC diciptakan untuk menemukan korelasi antara data output dengan cara memproyeksikan data ke dalam dimensi yang lebih tinggi. Sayangnya, metode RLC tidak dapat diinvers transformasinya. Selain itu, dengan memproyeksikan data ke dimensi yang lebih tinggi akan menambah kompleksitas dari algoritma pembelajaran. Oleh karena itu, PCA akan digunakan untuk memecahkan masalah ini dengan cara memproyeksikan data ke dimensi yang lebih rendah sembari mempertahankan kemampuan untuk melakukan invers proyeksi. Penelitian ini diimplementasikan dengan bantuan library scikit-learn untuk membuat model regresi dan transformasi data dengan menggunakan bahasa pemrograman Python. Hasilnya, untuk 12 dataset, metode augmentasi PCA mampu mendapatkan nilai error yang lebih rendah dalam 7 dataset dibandingkan dengan RLC dengan rata-rata nilai error 0.3270 untuk metode augmentasi PCA dan 0.4003 untuk metode augmentasi RLC.

Unduhan

Data unduhan belum tersedia.

Referensi

C. Wallisch et al., “Review of guidance papers on regression modeling in statistical series of medical journals,” PLoS One, vol. 17, no. 1, p. e0262918, Jan. 2022, doi: 10.1371/JOURNAL.PONE.0262918.

Anila. M and G. Pradeepini, “Least Square Regression for Prediction Problems in Machine Learning using R,” International Journal of Engineering & Technology, vol. 7, no. 3.12, pp. 960–962, Jul. 2018, doi: 10.14419/IJET.V7I3.12.17612.

T. Nabarian, M. Aris Ganiardi, and R. F. Malik, “Implementasi Metode Hibrid Fuzzy C-Means dan Fuzzy Swarm untuk Pengelompokkan Data Benang Perusahaan Tekstil,” Jurnal Teknologi Terpadu, vol. 6, no. 1, pp. 39–45, Jul. 2020, doi: 10.54914/JTT.V6I1.247.

Carudin, “Pemanfaatan Data Transaksi untuk Dasar membangun Strategi berdasarkan Karakteristik Pelanggan dengan Algoritma K-Means Clustering dan Model RFM,” Jurnal Teknologi Terpadu, vol. 7, no. 1, pp. 7–14, Jul. 2021, doi: 10.54914/JTT.V7I1.318.

J. N. Hussain, “High dimensional data challenges in estimating multiple linear regression,” J Phys Conf Ser, vol. 1591, no. 1, p. 12035, 2020, doi: 10.1088/1742-6596/1591/1/012035.

I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput Sci, vol. 2, no. 3, pp. 1–21, May 2021, doi: 10.1007/S42979-021-00592-X/FIGURES/11.

S. Jameel and S. Schockaert, “Modeling context words as regions: An ordinal regression approach to word embedding,” CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings, pp. 123–133, 2017, doi: 10.18653/V1/K17-1014.

S. A. T. al Azhima, D. Darmawan, N. F. A. Hakim, I. Kustiawan, M. al Qibtiya, and N. S. Syafei, “Hybrid Machine Learning Model untuk memprediksi Penyakit Jantung dengan Metode Logistic Regression dan Random Forest,” Jurnal Teknologi Terpadu, vol. 8, no. 1, pp. 40–46, Jul. 2022, doi: 10.54914/JTT.V8I1.539.

M. Bataineh and T. Marler, “Neural network for regression problems with reduced training sets,” Neural Networks, vol. 95, pp. 1–9, 2017, doi: https://doi.org/10.1016/j.neunet.2017.07.018.

S. Lathuilière, P. Mesejo, X. Alameda-Pineda, and R. Horaud, “A Comprehensive Analysis of Deep Regression,” IEEE Trans Pattern Anal Mach Intell, vol. 42, no. 9, pp. 2065–2081, 2020, doi: 10.1109/TPAMI.2019.2910523.

D. Rügamer et al., “deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression,” arXiv:2104.02705 [cs, stat], 2021, [Online]. Available: http://arxiv.org/abs/2104.02705

G. Tsoumakas, E. Spyromitros-Xioufis, A. Vrekou, and I. Vlahavas, “Multi-Target Regression via Random Linear Target Combinations,” arXiv:1404.5065 [cs], vol. 8726, pp. 225–240, 2014, doi: 10.1007/978-3-662-44845-8_15.

Q. Zhao, E. Adeli, N. Honnorat, T. Leng, and K. M. Pohl, “Variational AutoEncoder for Regression: Application to Brain Aging Analysis,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11765 LNCS, pp. 823–831, 2019, doi: 10.1007/978-3-030-32245-8_91/COVER.

Muthukrishnan R and Maryam Jamila S, “Predictive Modeling Using Support Vector Regression,” International Journal of Scientific & Technology Research, vol. 9, no. 2, pp. 4863–4865, Feb. 2020, Accessed: Dec. 06, 2022. [Online]. Available: www.ijstr.org

K. Berggren et al., “Roadmap on emerging hardware and technology for machine learning,” Nanotechnology, vol. 32, no. 1, p. 012002, Oct. 2020, doi: 10.1088/1361-6528/ABA70F.

W. Chiang, X. Liu, T. Zhang, and B. Yang, “A Study of Exact Ridge Regression for Big Data,” Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, pp. 3821–3830, Jan. 2019, doi: 10.1109/BIGDATA.2018.8622274.

D. Xu, Y. Shi, I. W. Tsang, Y.-S. Ong, C. Gong, and X. Shen, “A Survey on Multi-output Learning,” Jan. 2019, doi: 10.48550/arxiv.1901.00248.

P. Boye, D. Mireku-Gyimah, and C. A. Okpoti, “Multiple Linear Regression Model for Estimating the Price of a Housing Unit,” Ghana Mining Journal, vol. 17, no. 2, pp. 66–77, 2017, doi: 10.4314/gm.v17i2.9.

N. Herawati, K. Nisa, E. Setiawan, N. Nusyirwan, and T. Tiryono, “Regularized multiple regression methods to deal with severe multicollinearity,” Int J Stat Appl, vol. 8, no. 4, pp. 167–172, 2018.

O. Eguasa, E. Edionwe, and J. I. Mbegbu, “Local Linear Regression and the problem of dimensionality: a remedial strategy via a new locally adaptive bandwidths selector,” https://doi.org/10.1080/02664763.2022.2026895, 2022, doi: 10.1080/02664763.2022.2026895.

Y. Xu, S. Balakrishnan, A. Singh, and A. Dubrawski, “Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information,” Journal of Machine Learning Research, vol. 21, no. 162, pp. 1–54, 2020, [Online]. Available: http://jmlr.org/papers/v21/19-505.html

T. Górecki and M. Łuczak, “Stacked Regression With a Generalization of the Moore-Penrose Pseudoinverse,” Statistics in Transition New Series, vol. 18, no. 3, pp. 443–458, 2017, doi: doi:10.21307/stattrans-2016-080.

S. Katoch, S. S. Chauhan, and V. Kumar, “A review on genetic algorithm: past, present, and future,” Multimed Tools Appl, vol. 80, no. 5, pp. 8091–8126, Feb. 2021, doi: 10.1007/S11042-020-10139-6/FIGURES/8.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

C. R. Harris et al., “Array programming with NumPy,” Nature 2020 585:7825, vol. 585, no. 7825, pp. 357–362, Sep. 2020, doi: 10.1038/s41586-020-2649-2.

A. Asuncion and D. Newman, “UCI: Machine Learning Repository : Solar Flare Dataset,” 1989. http://archive.ics.uci.edu/ml/datasets/Solar+Flare (accessed Jun. 05, 2022).

S. Džeroski, D. Demsar, and J. Grbović, “Predicting Chemical Parameters of River Water Quality from Bioindicator Data,” Applied Intelligence, vol. 13, pp. 7–17, 2000, doi: 10.1023/A:1008323212047.

E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, and I. Vlahavas, “Multi-target regression via input space expansion: treating targets as inputs,” Mach Learn, vol. 104, no. 1, pp. 55–98, 2016, doi: 10.1007/s10994-016-5546-z.

A. Karalic and I. Bratko, “First Order Regression,” Mach Learn, vol. 26, pp. 147–176, 1997, doi: 10.1023/A:1007365207130.

##submission.downloads##

Diterbitkan

2023-07-04

Cara Mengutip

[1]
Y. P. Santosa, “Kombinasi Linier Target Data Untuk Regresi Multitarget Menggunakan Principal Component Analysis”, j. teknologi terpadu, vol. 9, no. 1, hlm. 01–09, Jul 2023.

Terbitan

Bagian

Artikel