Leveraging Pre-trained Language Model for Consumer Health Question Classification

xie jia qi¹,

1. The 8th Medical Centre of PLA General Hospital;

Corresponding author: xie jia qi, 66350354@qq.com

DOI: 10.12201/bmr.202101.00017

Statement: This article is a preprint and has not been peer-reviewed. It reports new research that has yet to be evaluated and so should not be used to guide clinical practice.

Abstract: Data mining has been widely applied in various of practical scenario recently, especially in the smart medical field. The data mining algorithms for medical are crucial for maximizing the usage of medical data, e.g. the health question classification. The health question classification aims to detect different questions from a given sentence, accuratelySdistinguish various questions is important for smart medical. The current medical data existed in the web is unstructured and non-standardized. Since the above data has no label, it is hard for us to discover some useful information from the above data. Besides, without high quality labeled data, training a good classifier is really hard. In this paper, we leverage various pre-trained language model to solve the health question classification task, including BERT-base, BERT-wwm and RoBERTa. By fine-tuning the pre-trained models using the labeled data, we can obtain some neural classifiers for the task. Beside, fine-tuning the pre-trained models may provide unstable results, which may have negative influence when applying to the practical scenario. Inspired by adversarial training, we employ this technique to improve the stability of our model. Meanwhile, category_C is rare in the training set, so we design a rule-based method to detect category_C, integrating the neural and human knowledge at the same time, and further improve the model’s performance. The experimental results show that our method can achieve good performance on the leaderboard.

Key words: Consumer Health, Question Classification, Deep Learning, Pre-trained Language Model, Adversarial Training

Submit time: 27 May 2021

Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity.
html
图表
Xu Xiaowei, Guo Haihong, Li Jiao. Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification. 2021. doi: 10.12201/bmr.202101.00018

Gu Yao-wen, Li Jiao. Progress of Mining Electronic Health Records based on Unsupervised Deep Learning Methods. 2021. doi: 10.12201/bmr.202104.00013

kangyishuai, shaochenjie. An Algorithm for Generating TCM Document Questions Based on Unified Language Model. 2022. doi: 10.12201/bmr.202110.00044

guo xuan zhi, zhou wu jie, shang xin, lian chun hua, zhan kai ming, lin long yong. The Model based on UNILM of question conditional generation in the field of Chinese medicine. 2021. doi: 10.12201/bmr.202110.00036

Guo Mengying, Zhou Yi, He Jingshu, Pan Jiaxin, Sun JingKai, Huang Wei. Research on Function Classification of WeChat Official Account Service Platform of Traditional Chinese Medicine Hospital Based on Card Classification. 2020. doi: 10.12201/bmr.202010.00833

jia lirong. research on question understanding about the automatic question answering system of TCM. 2021. doi: 10.12201/bmr.202101.00002

刘, Yan Zhu, Zongyou Li, Dongfei Lin, Lihong Liu, Dongyun Shi. A study on Diseases Classification and Modelof the SNOMED CT. 2021. doi: 10.12201/bmr.202110.00005

liuqingjin, wangrui, miaoyuanqing. Intelligent Detection of Silent Myocardial Ischemia Dynamic Electrocardiogram Based on Deep Learning. 2021. doi: 10.12201/bmr.202111.00009

limengxiang, xuyang, chenlei. Research on construction and application of online intelligent pre-consultation system. 2021. doi: 10.12201/bmr.202110.00026

Guo Yi, Gong Liyue, Hu Dehua. Research on the Influencing Factors of Users Continuance Intention of Online Health Communities--Based on the Integrated Model. 2021. doi: 10.12201/bmr.202110.00041

ID	Submit time	Number	Download
1	2021-01-14	bmr.202101.00017V1	Download

Public Anonymous To author only

Get Citation

xie jia qi. Leveraging Pre-trained Language Model for Consumer Health Question Classification. 2021. biomedRxiv.202101.00017

Article Metrics

Read: 763
Download: 1
Comment: 0

Leveraging Pre-trained Language Model for Consumer Health Question Classification

Corresponding author: xie jia qi, 66350354@qq.com

DOI: 10.12201/bmr.202101.00017

Get Citation

Article Metrics

Share

Email This Article