Analisis Komponen Utama pada Data Diabetes
Abstract
Permasalahan dalam penelitian ini adalah tingginya jumlah variabel yang saling berkorelasi, sehingga menyulitkan pemahaman terhadap struktur data. Tujuan penelitian ini untuk mereduksi dimensi variabel yang saling berkorelasi dan memperoleh pemahaman yang lebih baik terhadap struktur data. Data yang digunakan terdiri dari 768 sampel dengan 8 variabel bebas dan 1 variabel terikat pada Data Diabetes. Langkah-langkah analisis meliputi penentuan jumlah komponen utama, uji Bartlett dan uji Keiser-Meyer-Olkin (KMO) untuk memastikan kecocokan data, perhitungan koefisien komponen utama, serta visualisasi grafik AKU. Hasil analisis menunjukkan bahwa terdapat 5 komponen utama yang mampu menangkap lebih dari 80% keragaman data, serta hubungan yang beragam antar variabel yang diamati.
The problem in this research is the high number of variables that weaken each other, making it difficult to understand the data structure. The aim of this research is to reduce the dimensions of mutually burdening variables and gain a better understanding of the data structure. The data used consists of 768 samples with 8 independent variables and 1 dependent variable in Diabetes Data. The analysis steps include determining the number of principal components, Bartlett's test and Keiser-Meyer-Olkin (KMO) test to ensure data suitability, performance of principal component coefficients, and visualization of the AKU graph. The results of the analysis show that there are 5 main components that are able to capture more than 80% of the diversity of the data, as well as various relationships between the observed variables.
References
Beattie, J. R., & Esmonde-White, F. W. (2021). Exploration of principal component analysis: deriving principal component analysis visually using spectra. Applied Spectroscopy, 75(4), 361-375. https://doi.org/10.1177/0003702820987847
Binois, M., & Wycoff, N. (2022). A survey on high-dimensional Gaussian process modeling with application to Bayesian optimization. ACM Transactions on Evolutionary Learning and Optimization, 2(2), 1-26. https://doi.org/10.1145/3545611
Bonner, R., Albajrami, O., Hudspeth, J., & Upadhyay, A. (2020). Diabetic kidney disease. Primary Care: Clinics in Office Practice, 47(4), 645-659. https://doi.org/10.1016/j.pop.2020.08.004
Chandra, N. K., Canale, A., & Dunson, D. B. (2023). Escaping the curse of dimensionality in bayesian model-based clustering. Journal of machine learning research, 24(144), 1-42.
Dinanti, A., & Purwadi, J. (2023). Analisis Performa Algoritma K-Nearest Neighbor dan Reduksi Dimensi Menggunakan Principal Component Analysis. Jambura Journal of Mathematics, 5(1), 155-165. https://doi.org/10.34312/jjom.v5i1.17098
Greenacre, M., Groenen, P. J., Hastie, T., d’Enza, A. I., Markos, A., & Tuzhilina, E. (2022). Principal component analysis. Nature Reviews Methods Primers, 2(1), 100. https://doi.org/10.1038/s43586-022-00184-w
Gewers, F. L., Ferreira, G. R., Arruda, H. F. D., Silva, F. N., Comin, C. H., Amancio, D. R., & Costa, L. D. F. (2021). Principal component analysis: A natural approach to data exploration. ACM Computing Surveys (CSUR), 54(4), 1-34. https://doi.org/10.1145/3447755
Hasan, B. M. S., & Abdulazeez, A. M. (2021). A review of principal component analysis algorithm for dimensionality reduction. Journal of Soft Computing and Data Mining, 2(1), 20-30. https://doi.org/10.30880/jscdm.2021.02.01.003
Haritha, R., Sureshbabu, D., & Sammulal, P. (2019). Diabetes detection using principal component analysis and neural networks. In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part II 2 (pp. 270-285). Springer Singapore. https://doi.org/10.1007/978-981-13-9184-2_24
Islamiyati, A., Sahriman, S., & Oktoni, S. (2022). Studi Longitudinal Pada Analisis Data Gula Darah Pasien Diabetes melalui Principal Component Analysis. Jambura Journal Of Mathematics, 4(1), 41-49. https://doi.org/10.34312/jjom.v4i1.11407
Krishan, G., Bhagwat, A., Sejwal, P., Yadav, B. K., Kansal, M. L., Bradley, A., ... & Muste, M. (2023). Assessment of groundwater salinity using principal component analysis (PCA): a case study from Mewat (Nuh), Haryana, India. Environmental monitoring and assessment, 195(1), 37. https://doi.org/10.1007/s10661-022-10555-1
Majumder, S. (2022). A Gaussian mixture model method for eigenvalue-based spectrum sensing with uncalibrated multiple antennas. Signal Processing, 192, 108404. https://doi.org/10.1016/j.sigpro.2021.108404
Muhammad, M. U., Jiadong, R., Muhammad, N. S., Hussain, M., & Muhammad, I. (2019). Principal component analysis of categorized polytomous variable-based classification of diabetes and other chronic diseases. International Journal of Environmental Research and Public Health, 16(19), 3593. https://doi.org/10.3390/ijerph16193593
Pokala, V. S. K., & Kumar, N. S. (2022). Analysis and comparison for prediction of Diabetic Pregnant women using Innovative Principal Component Analysis algorithm over Support Vector Machine Algorithm with Improved Accuracy. Cardiometry, (25), 942-948. https://doi.org/10.18137/cardiometry.2022.25.942948
Roopa, H., & Asha, T. (2019). A linear model based on principal component analysis for disease prediction. IEEE Access, 7, 105314-105318. https://doi.org/10.1109/ACCESS.2019.2931956
Saeedi, P., Salpea, P., Karuranga, S., Petersohn, I., Malanda, B., Gregg, E. W., ... & Williams, R. (2020). Mortality attributable to diabetes in 20–79 years old adults, 2019 estimates: Results from the International Diabetes Federation Diabetes Atlas. Diabetes research and clinical practice, 162, 108086. https://doi.org/10.1016/j.diabres.2020.108086
Schreiber, J. B. (2021). Issues and recommendations for exploratory factor analysis and principal component analysis. Research in Social and Administrative Pharmacy, 17(5), 1004-1011. https://doi.org/10.1016/j.sapharm.2020.07.027
Sidou, L. F., & Borges, E. M. (2020). Teaching principal component analysis using a free and open source software program and exercises applying PCA to real-world examples. Journal of chemical education, 97(6), 1666-1676. https://doi.org/10.1021/acs.jchemed.9b00924
Smith, J., Doe, A., & Roe, P. (2019). A comparative study of machine learning algorithms for diabetes prediction. Journal of Medical Systems, 43(5), 140. https://doi.org/10.1007/s10916-019-1345-3
Sürücü, L., Yikilmaz, İ., & Maslakci, A. (2022). Exploratory Factor Analysis (EFA) in quantitative researches and practical considerations. https://doi.org/10.31219/osf.io/fgd4e
Uddin, M. P., Mamun, M. A., & Hossain, M. A. (2021). PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Technical Review, 38(4), 377-396. https://doi.org/10.1080/02564602.2020.1740615
Zhang, L., Lee, K., & Wong, P. (2020). Random forests for the prediction of diabetes: A comparative analysis. Healthcare Informatics Research, 26(3), 215-225. https://doi.org/10.4258/hir.2020.26.3.215
