面向公众健康问句分类数据挖掘算法评测研究

中国医学科学院/北京协和医学院医学信息研究所;

通讯作者: 李姣, li.jiao@imicams.ac.cn

DOI：10.12201/bmr.202101.00018

声明：预印本系统所发表的论文仅用于最新科研成果的交流与共享，未经同行评议，因此不建议直接应用于指导临床实践。

Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification

Xu Xiaowei,
Guo Haihong,
Li Jiao

Institute of Medical Information,Chinese Academy of Medical Sciences,Peking Union Medical College ;

Corresponding author: Li Jiao, li.jiao@imicams.ac.cn

摘要：为促进数据挖掘算法更好地支撑智能化医学信息系统,评价算法在特定医学问题上的性能表现,本研究面向互联网问诊应用场景下的公众健康问句理解,设定公众健康问句分类数据挖掘算法评测任务,采用随机抽样法和交叉验证法进行了数据采集和标注,构建了包含8000个中文健康问句的语料库,确定了符合任务特点的评测指标。本次评测任务吸引了来自学术界和工业界组成的396支团队注册参赛,最终共149支团队提交结果进行线上评测。成绩最高的队伍获得宏平均F1值为0.755的模型性能。本次评测所用的语料库、评测指标、自动评测脚本工具都将公开用于科学研究。

关键词： 公众健康; 问句分类算法评测; ;

Abstract: To recognize better data mining algorithms for intelligent medical information systems, it is crucial to assess the algorithms’ performance in solving a specific medical problem. This study set up an algorithm evaluation task for consumer health-related question classification, which it is an important scenario of consumer health question understanding in Internet-based healthcare service. This study constructed a corpus with 8000 health-related questions, using random sampling method and cross validation method. for data collecting and annotation. Correspondingly, evaluation metric was set up for the question classification task. This evaluation task attracted 396 teams from both research and industry communities , and 149 of them submitted their algorithms for online evaluation. The best performance of submitted algorithms achieved macro-F1 score of 0.755 on the independent test set. All the evaluation resources are open accessible, including the corpus, evaluation metrics and scripts.

Key words: Consumer; Health Question; Classification Algorithm; Evaluation

提交时间：2021-05-27

版权声明：作者本人独立拥有该论文的版权，预印本系统仅拥有论文的永久保存权利。任何人未经允许不得重复使用。
html
图表
谢甲琦, 李政. 基于预训练语言模型的公众健康问句分类. 2021. doi: 10.12201/bmr.202101.00017

郭梦颖, 周易, 和静淑, 潘佳欣, 孙靖凯, 黄炜. 基于卡片分类法的中医医院微信公众号服务平台的功能分类研究. 2020. doi: 10.12201/bmr.202010.00833

贾李蓉. 中医药自动问答系统的问题理解研究. 2021. doi: 10.12201/bmr.202101.00002

曹海霞, 柴荣, 刘宇薇, 郭进京. 国内外图书馆公众健康信息服务比较研究. 2020. doi: 10.12201/bmr.201908.00001

陈颖, 邓盼盼, 李军莲. 国外知名公众健康网站比较研究. 2021. doi: 10.12201/bmr.202109.00023

顾耀文, 李姣. 基于无监督深度学习的电子健康档案数据挖掘技术研究进展. 2021. doi: 10.12201/bmr.202104.00013

刘晶, 朱彦, 李宗友, 林东飞, 刘丽红, 史冬云. SNOMED CT疾病分类及概念模型研究. 2021. doi: 10.12201/bmr.202110.00005

阮旭凌, 刘琦, 郭志恒, 晏峻峰. 基于LDA和XGBoost算法的乳腺癌预测模型构建研究. 2022. doi: 10.12201/bmr.202106.00007

穆珺, 肖晓霞, 刘青萍. 基于问题驱动和能力导向的《计算思维与算法设计基础》课程教学实践探索. 2021. doi: 10.12201/bmr.202108.00015

康一帅, 邵陈杰. 基于统一语言模型的中医文献问题生成算法. 2022. doi: 10.12201/bmr.202110.00044

序号	提交日期	编号	操作
1	2021-01-18	bmr.202101.00018V1	下载

公开评论匿名评论仅发给作者

引用格式

徐晓巍, 郭海红, 李姣. 面向公众健康问句分类数据挖掘算法评测研究. 2021. biomedRxiv.202101.00018

访问统计

阅读量：1146
下载量：3
评论数：0

面向公众健康问句分类数据挖掘算法评测研究

通讯作者: 李姣, li.jiao@imicams.ac.cn

DOI：10.12201/bmr.202101.00018

Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification

Corresponding author: Li Jiao, li.jiao@imicams.ac.cn

引用格式

推荐引用格式

访问统计

分享

Email This Article