中华医学教育杂志 ›› 2021, Vol. 41 ›› Issue (10): 932-935.DOI: 10.3760/cma.j.cn115259-20201203-01668

• 医学教育评估 • 上一篇    下一篇

人工智能深度学习技术在医学考试试题难度预估中的应用研究

黄广仕1, 周梦强2, 韩春梅3, 吕萍2, 吴及2   

  1. 1国家医学考试中心信息技术处,北京 100097;
    2清华大学电子工程系,北京 100084;
    3国家医学考试中心试题开发一部,北京 100097
  • 收稿日期:2020-12-03 出版日期:2021-10-01 发布日期:2021-09-28
  • 通讯作者: 韩春梅, Email: hanchunmei2000@163.com, 电话: 010-52004655

Application of artificial intelligence deep learning technology in difficulty predicting of medical examination questions

Huang Guangshi1, Zhou Mengqiang2, Han Chunmei3, Lyu Ping2, Wu Ji2   

  1. 1Information Technology Department, National Medical Examination Center, Beijing 100097, China;
    2Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
    3Test Question Development Department I, National Medical Examination Center, Beijing 100097, China
  • Received:2020-12-03 Online:2021-10-01 Published:2021-09-28
  • Contact: Han Chunmei, Email: hanchunmei2000@163.com, Tel: 0086-10-52004655

摘要: 目的 利用人工智能深度学习技术预测医师资格考试试题的难度,准确控制试卷难度。方法 利用构建属性模型与语义模型进行试题难度的预估,并将预估结果和专家预估结果与实测难度分别进行相关分析和重复测量方差分析,以评价采用模型进行医学试题难度预估的可行性和有效性。结果 对于某年整卷试题难度预估,属性模型预估结果与实测难度的皮尔森相关系数为0.266,略低于专家预估难度与实测难度的相关系数0.356,2个系数置信区间有交叉,差异无统计学意义(P>0.05);语义模型预估结果与实测难度的皮尔森相关系数为0.512,高于专家预估难度与实测难度的相关系数0.356,2个系数置信区间无交叉,差异具有统计学意义(P<0.05)。重复测量方差分析发现,仅语义模型预估难度与实测难度的差异无统计学意义(P>0.05)。结论 使用语义模型预估的试题难度比专家预估的难度更接近实测难度,可以尝试将该方法在考前应用于试题难度预估,结合专家预估的结果共同指导组卷,从而更加客观、准确地把握试卷难度。

关键词: 人工智能, 深度学习, 医学考试, 试题难度预估

Abstract: Objective To predict the difficulty of questions used by physician qualifying examination and to accurately control the difficulty of the whole paper by using artificial intelligence (AI) deep learning technology. Methods By constructing the framework of attribute model and semantic model, the difficulty of the test questions was estimated. The results of AI prediction and experts' prediction were correlated and repeated-measures ANOVA were conducted with the actual test difficulty respectively for evaluation of feasibility and effectiveness of the AI model applied for difficulty prediction. Results For a given year's whole paper questions, the Pearson correlation coefficient between attribute model prediction and actual test difficulty was 0.266, which was slightly lower than the correlation coefficient of 0.356 between the experts prediction difficulty and the actual difficulty. There was a crossover between the confidence intervals of the two coefficients (P>0.05). The Pearson correlation coefficient between semantic model and actual difficulty was 0.512, which was higher than the correlation coefficient between the difficulty by experts' prediction and the actual test (0.356). There was no crossover in the confidence intervals of the two coefficients (P<0.05). The results of statistical analysis using one-way repeated measures ANOVA showed that there was no statistical difference only between sematic model prediction difficulty and actual test difficulty (P>0.05). Conclusions The difficulty of the test questions predicted by the semantic model is closer to the actual difficulty of the test than the difficulty predicted by the experts. So it may be applied to the pre-examination difficulty prediction, and combined with the results of the experts' prediction to jointly guide the development of test paper based a well-predicted difficulty.

Key words: Artificial intelligence, Deep learning, Medical examination, Difficulty prediction

中图分类号: