Chinese Journal of Medical Education ›› 2021, Vol. 41 ›› Issue (10): 932-935.DOI: 10.3760/cma.j.cn115259-20201203-01668

Previous Articles     Next Articles

Application of artificial intelligence deep learning technology in difficulty predicting of medical examination questions

Huang Guangshi1, Zhou Mengqiang2, Han Chunmei3, Lyu Ping2, Wu Ji2   

  1. 1Information Technology Department, National Medical Examination Center, Beijing 100097, China;
    2Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
    3Test Question Development Department I, National Medical Examination Center, Beijing 100097, China
  • Received:2020-12-03 Online:2021-10-01 Published:2021-09-28
  • Contact: Han Chunmei, Email: hanchunmei2000@163.com, Tel: 0086-10-52004655

Abstract: Objective To predict the difficulty of questions used by physician qualifying examination and to accurately control the difficulty of the whole paper by using artificial intelligence (AI) deep learning technology. Methods By constructing the framework of attribute model and semantic model, the difficulty of the test questions was estimated. The results of AI prediction and experts' prediction were correlated and repeated-measures ANOVA were conducted with the actual test difficulty respectively for evaluation of feasibility and effectiveness of the AI model applied for difficulty prediction. Results For a given year's whole paper questions, the Pearson correlation coefficient between attribute model prediction and actual test difficulty was 0.266, which was slightly lower than the correlation coefficient of 0.356 between the experts prediction difficulty and the actual difficulty. There was a crossover between the confidence intervals of the two coefficients (P>0.05). The Pearson correlation coefficient between semantic model and actual difficulty was 0.512, which was higher than the correlation coefficient between the difficulty by experts' prediction and the actual test (0.356). There was no crossover in the confidence intervals of the two coefficients (P<0.05). The results of statistical analysis using one-way repeated measures ANOVA showed that there was no statistical difference only between sematic model prediction difficulty and actual test difficulty (P>0.05). Conclusions The difficulty of the test questions predicted by the semantic model is closer to the actual difficulty of the test than the difficulty predicted by the experts. So it may be applied to the pre-examination difficulty prediction, and combined with the results of the experts' prediction to jointly guide the development of test paper based a well-predicted difficulty.

Key words: Artificial intelligence, Deep learning, Medical examination, Difficulty prediction

CLC Number: