• 国家药监局综合司 国家卫生健康委办公厅
  • 国家药监局综合司 国家卫生健康委办公厅

Research on prediction model of breast cancer based on LDA and XGBoost algorithm

Corresponding author: yanjunfeng, junfengyan@hnucm.edu.cn
DOI: 10.12201/bmr.202106.00007
Statement: This article is a preprint and has not been peer-reviewed. It reports new research that has yet to be evaluated and so should not be used to guide clinical practice.
  •  

    Abstract: Breast cancer is the leading cause of cancer death in women, and the number of male breast cancer patients can not be ignored. Therefore, using information technology to predict the disease is an important way to improve the rate of disease diagnosis. This experiment carries out dimension reduction to the multi index characteristics of the breast cancer dataset provided by the kaggle database, analyzes the medical test indexes of the 498 groups of 30 dimensional breast cancer patients, uses the linear discriminant analysis (LDA) to merge the characteristic attributes, and projects the data to the low dimensional space, and proposes the extreme gradient lifting algorithm (eXtreme Gradient Boosting). Xgboost), which uses grid search for cross validation to obtain the optimal parameters, constructs xgboost prediction model, and uses AdaBoost, random forest and naive Bayes algorithm as performance comparison classifiers; The experimental results show that the classification accuracy of the prediction model trained after dimensionality reduction is 2.7% higher than that before dimensionality reduction, and the classification effect of the prediction model constructed by xgboost is the best, reaching 98.7%.

    Key words: breast cancer; Dimension reduction; LDA; XGBoost; classification

    Submit time: 7 March 2022

    Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity.
  • 图表

  • HUANG Yucheng, YANG Xuming, QIAO Qiong. Establishing a model based on data mining for predicting the recurrence factor of breast cancer. 2020. doi: 10.12201/bmr.202009.00011

    Zhan Haixia, Hu Dong, Zhang Wenting, Gu Ying. Effect of cluster nursing mode on shoulder function recovery and quality of life of patients with breast cancer after modified radical mastectomy. 2020. doi: 10.12201/bmr.202004.00015

    Zhu Xiaoxiao, Qian Aibing. Analysis of Network Attention Characteristics of Breast Cancer Prevention and Treatment Health Information Based on Baidu Index. 2020. doi: 10.12201/bmr.201906.00001

    LiYu, Yang Tao, Hu Kongfa. Research on Medication Rules of Famous TCM Physicians in Treating Lung Cancer Based on Hierarchical Community Partition Algorithm. 2021. doi: 10.12201/bmr.202110.00020

    fengli. A Comparative Study on the Accuracy of Nine Combined Machine Learning Algorithms in Early Diagnosis of Tumors Based on High-dimensional dataFeng Li 1,*, Yue Xiaofei 2. 2021. doi: 10.12201/bmr.202108.00016

    xie jia qi. Leveraging Pre-trained Language Model for Consumer Health Question Classification. 2021. doi: 10.12201/bmr.202101.00017

    Xu Xiaowei, Guo Haihong, Li Jiao. Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification. 2021. doi: 10.12201/bmr.202101.00018

    kangyishuai, shaochenjie. An Algorithm for Generating TCM Document Questions Based on Unified Language Model. 2022. doi: 10.12201/bmr.202110.00044

    MU Jun, XIAO xiaoxia, LIU Qingping. PBL and Capability-oriented Exploration of Teaching Computational Thinking and Algorithm Design. 2021. doi: 10.12201/bmr.202108.00015

    Guo Mengying, Zhou Yi, He Jingshu, Pan Jiaxin, Sun JingKai, Huang Wei. Research on Function Classification of WeChat Official Account Service Platform of Traditional Chinese Medicine Hospital Based on Card Classification. 2020. doi: 10.12201/bmr.202010.00833

  • ID Submit time Number Download
    2 2021-09-15

    bmr.202106.00007V2

    Download
    1 2021-06-29

    bmr.202106.00007V1

    Download
  • Public  Anonymous  To author only

Get Citation

ruanxuling, liuqi, guo zhiheng, yanjunfeng. Research on prediction model of breast cancer based on LDA and XGBoost algorithm. 2022. biomedRxiv.202106.00007

Article Metrics

  • Read: 1007
  • Download: 13
  • Comment: 0

Email This Article

User name:
Email:*请输入正确邮箱
Code:*验证码错误