• 国家药监局综合司 国家卫生健康委办公厅
  • 国家药监局综合司 国家卫生健康委办公厅

Leveraging Pre-trained Language Model for Consumer Health Question Classification

Corresponding author: xie jia qi, 66350354@qq.com
DOI: 10.12201/bmr.202101.00017
Statement: This article is a preprint and has not been peer-reviewed. It reports new research that has yet to be evaluated and so should not be used to guide clinical practice.
  •  

    Abstract: Data mining has been widely applied in various of practical scenario recently, especially in the smart medical field. The data mining algorithms for medical are crucial for maximizing the usage of medical data, e.g. the health question classification. The health question classification aims to detect different questions from a given sentence, accuratelySdistinguish various questions is important for smart medical. The current medical data existed in the web is unstructured and non-standardized. Since the above data has no label, it is hard for us to discover some useful information from the above data. Besides, without high quality labeled data, training a good classifier is really hard. In this paper, we leverage various pre-trained language model to solve the health question classification task, including BERT-base, BERT-wwm and RoBERTa. By fine-tuning the pre-trained models using the labeled data, we can obtain some neural classifiers for the task. Beside, fine-tuning the pre-trained models may provide unstable results, which may have negative influence when applying to the practical scenario. Inspired by adversarial training, we employ this technique to improve the stability of our model. Meanwhile, category_C is rare in the training set, so we design a rule-based method to detect category_C, integrating the neural and human knowledge at the same time, and further improve the model’s performance. The experimental results show that our method can achieve good performance on the leaderboard.

    Key words: Consumer Health, Question Classification, Deep Learning, Pre-trained Language Model, Adversarial Training

    Submit time: 27 May 2021

    Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity.
  • 图表

  • Xu Xiaowei, Guo Haihong, Li Jiao. Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification. 2021. doi: 10.12201/bmr.202101.00018

    Gu Yao-wen, Li Jiao. Progress of Mining Electronic Health Records based on Unsupervised Deep Learning Methods. 2021. doi: 10.12201/bmr.202104.00013

    kangyishuai, shaochenjie. An Algorithm for Generating TCM Document Questions Based on Unified Language Model. 2022. doi: 10.12201/bmr.202110.00044

    guo xuan zhi, zhou wu jie, shang xin, lian chun hua, zhan kai ming, lin long yong. The Model based on UNILM of question conditional generation in the field of Chinese medicine. 2021. doi: 10.12201/bmr.202110.00036

    Guo Mengying, Zhou Yi, He Jingshu, Pan Jiaxin, Sun JingKai, Huang Wei. Research on Function Classification of WeChat Official Account Service Platform of Traditional Chinese Medicine Hospital Based on Card Classification. 2020. doi: 10.12201/bmr.202010.00833

    jia lirong. research on question understanding about the automatic question answering system of TCM. 2021. doi: 10.12201/bmr.202101.00002

    刘, Yan Zhu, Zongyou Li, Dongfei Lin, Lihong Liu, Dongyun Shi. A study on Diseases Classification and Modelof the SNOMED CT. 2021. doi: 10.12201/bmr.202110.00005

    liuqingjin, wangrui, miaoyuanqing. Intelligent Detection of Silent Myocardial Ischemia Dynamic Electrocardiogram Based on Deep Learning. 2021. doi: 10.12201/bmr.202111.00009

    limengxiang, xuyang, chenlei. Research on construction and application of online intelligent pre-consultation system. 2021. doi: 10.12201/bmr.202110.00026

    Guo Yi, Gong Liyue, Hu Dehua. Research on the Influencing Factors of Users Continuance Intention of Online Health Communities--Based on the Integrated Model. 2021. doi: 10.12201/bmr.202110.00041

  • ID Submit time Number Download
    1 2021-01-14

    bmr.202101.00017V1

    Download
  • Public  Anonymous  To author only

Get Citation

xie jia qi. Leveraging Pre-trained Language Model for Consumer Health Question Classification. 2021. biomedRxiv.202101.00017

Article Metrics

  • Read: 763
  • Download: 1
  • Comment: 0

Email This Article

User name:
Email:*请输入正确邮箱
Code:*验证码错误