Optimizing Breast Cancer Prediction by Applying Machine Learning

  • Vina Nurmadani Sumatera Institute od Technology
  • Indah Suciati
  • Yoga Aji Sukma
  • Linda Rassiyanti
Keywords: ANN; breast cancer; machine learning; SVM; XGBoost

Abstract

In 2015, breast cancer ranked among the most prevalent and fatal cancers affecting women globally.  Artificial intelligence is urgently needed to help medical professionals make more accurate decisions, reduce overdiagnosis, and streamline the diagnostic process.  This study will implement and perform a comparative study of selected machine learning techniques algorithms, with a focus on SVM, XGBoost, and ANN, with various parameter combinations on the breast cancer dataset. Performance metrics such as accuracy, precision, recall, and F1-score were employed to evaluate and compare the algorithms. The results of this study show that the best model for predicting chronic breast cancer disease, which can help medical professionals predict chronic disease so that it can be treated quickly and accurately, is the SVM method using 8 parameters without the mitosis parameter: Clump thickness, Cell Size Uniformity, Cell Shape Uniformity, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, and Normal Nuclei, with an accuracy value of 0.96 and a sensitivity value of 0.98.

References

Aldhyani, T. H. H., Alshebami, A. S., & Alzahrani, M. Y. (2020). Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms. Journal of Healthcare Engineering, 2020. https://doi.org/10.1155/2020/4984967
Al-Harahsheh, H., Al-Shraideh, M., & Al-Sharaeh, S. (2021). Performance of Malware Detection Classifier Using Genetic Programming in Feature Selection. Informatica (Slovenia), 45(4), 517–529. https://doi.org/10.31449/INF.V45I4.3819
Dangeti, Pratap. (2017). Statistics for Machine Learning. Packt Publishing.
El_Rahman, S. A. (2021). Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study. Journal of Ambient Intelligence and Humanized Computing, 12(8), 8585–8623. https://doi.org/10.1007/s12652-020-02590-y
Houssein, E. H., Emam, M. M., Ali, A. A., & Suganthan, P. N. (2021). Deep and machine learning techniques for medical imaging-based breast cancer: A comprehensive review. In Expert Systems with Applications (Vol. 167). Elsevier Ltd. https://doi.org/10.1016/j.eswa.2020.114161
Intan Permata, & Esther Sorta Mauli Nababan. (2023). Application Of Game Theory In Determining Optimum Marketing Strategy In Marketplace. JURNAL RISET RUMPUN MATEMATIKA DAN ILMU PENGETAHUAN ALAM, 2(2), 65–71. https://doi.org/10.55606/jurrimipa.v2i2.1336
Mahesh, B. (2020). Machine Learning Algorithms - A Review. International Journal of Science and Research (IJSR), 9(1), 381–386. https://doi.org/10.21275/art20203995
Mubarog, I., Setyanto, A., & Sismoro, H. (2021). Sistem Klasifikasi Pada Penyakit Breast Cancer Dengan Menggunakan Metode Naïve Bayes. Creative Information Technology Journal, 6(2), 109. https://doi.org/10.24076/citec.2019v6i2.246
Obaido, G., Achilonu, O., Ogbuokiri, B., Amadi, C. S., Habeebullahi, L., Ohalloran, T., Chukwu, C. W., Mienye, E. D., Aliyu, M., Fasawe, O., Modupe, I. A., Omietimi, E. J., & Aruleba, K. (2024). An Improved Framework for Detecting Thyroid Disease Using Filter-Based Feature Selection and Stacking Ensemble. IEEE Access, 12, 89098–89112. https://doi.org/10.1109/ACCESS.2024.3418974
Purbolaksono, M. D., Tantowi, M. I., Hidayat, A. I., & Adiwijaya. (2021). Perbandingan Support Vector Machine dan Modified Balanced Random Forest dalam Deteksi Pasien Penyakit Diabetes. Jurnal RESTI, 5(2), 393–399. https://doi.org/10.29207/resti.v5i2.3008
Siegel, R. L., Miller, K. D., & Jemal, A. (2017). Cancer statistics, 2017. CA: A Cancer Journal for Clinicians, 67(1), 7–30. https://doi.org/10.3322/caac.21387
Sun, Y. S., Zhao, Z., Yang, Z. N., Xu, F., Lu, H. J., Zhu, Z. Y., Shi, W., Jiang, J., Yao, P. P., & Zhu, H. P. (2017). Risk factors and preventions of breast cancer. In International Journal of Biological Sciences (Vol. 13, Issue 11, pp. 1387–1397). Ivyspring International Publisher. https://doi.org/10.7150/ijbs.21635
Villavicencio, C. N., Macrohon, J. J. E., Inbaraj, X. A., Jeng, J. H., & Hsieh, J. G. (2021). Covid-19 prediction applying supervised machine learning algorithms with comparative analysis using weka. Algorithms, 14(7). https://doi.org/10.3390/a14070201
WHO : BREAST CANCER. (2024, March 13). WHO. https://www.who.int/news-room/fact-sheets/detail/breast-cancer
Wolberg, W. (1992). Breast Cancer Wisconsin (Original). UCI Machine Learning Repository. https://archive-beta.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
Published
2025-08-12
How to Cite
Vina Nurmadani, Indah Suciati, Yoga Aji Sukma, & Linda Rassiyanti. (2025). Optimizing Breast Cancer Prediction by Applying Machine Learning. Sciencestatistics: Journal of Statistics, Probability, and Its Application, 3(2), 86-91. https://doi.org/10.24127/sciencestatistics.v3i2.9667
Section
Articles