中文医疗因果关系抽取数据集 CMedCausal

通讯作者: 陈漠沙, chenmosha.cms@alibaba-inc.com

CMedCausal - A dataset of Chinese medical causal relationship extraction

Corresponding author: Chen Mosha, chenmosha.cms@alibaba-inc.com
  • 摘要:现代医疗很强调解释性,要求医生在为患者诊断时能够给出合理且有根据的诊断结果。在线问诊结果文本中存 在大量关于症状,诊断和治疗等医学概念的因果相关关系的解释,因此从文本中挖掘这些关系对于提升医疗搜索的准 确性和可解释性有重要帮助。基于此,本文构建了一个全新的中文医疗因果关系抽取数据集 CMedCausal (Chinese Medical Causal dataset),数据集定义了 3 类关键的医学因果解释推理关系:因果关系、条件关系和上下位关系, 由 9,153 段医学文本组成,共标注了 79,244 对实体关系。研究人员可基于 CMedCausal 开展医疗因果关系挖掘、医 疗因果解释图谱建设等方向的研究。同时我们也依托第八届中国健康信息处理会议 (CHIP2022) 举办了“医学因果 实体关系抽取”评测比赛,旨在推动中文医学因果关系挖掘技术的发展。

    关键词: 因果关系,关系抽取,解释性


    Abstract: Modern medicine emphasizes interpretability and requires doctors to give reasonable, well-founded and con- vincing diagnostic results when diagnosing patients. Therefore, there are a large number of causal correlations in medical concepts such as symptoms, diagnosis and treatment in the text of the results of the inquiry. Explanation of relationships, and mining these relationships from text is of great help in improving the accuracy and inter- pretability of medical searches. Based on this, this paper constructs a new medical causality extraction dataset CMedCausal (Chinese Medical Causal dataset), which defines three key types of medical causal explanation and reasoning relationships: causal relationship, conditional relationship, and hypothetical relationship. It consists of 9,153 medical texts with a total of 79,244 entity relationships annotated. Researchers can carry out research on medical causal relationship mining and medical causal interpretation map construction based on CMedCausal. At the same time, relying on the 8th China Conference on Health Information Processing (CHIP2022), we also held the evaluation task of ”Medical Causal Entity Relationship Extraction”, aiming to promote the development of Chinese medical causal relationship mining technology.

    Key words: causal relationship, relation extraction, interpretability


李子昊, 陈漠沙, 马镇新, 尹康平, 童毅轩, 谭传奇, 郎珍珍, 汤步洲. 中文医疗因果关系抽取数据集 CMedCausal. 2022. biomedRxiv.202211.00004


