中华医学教育杂志 ›› 2026, Vol. 46 ›› Issue (6): 433-437.DOI: 10.3760/cma.j.cn115259-20250805-00890

• 教育技术 • 上一篇    下一篇

基于医学知识图谱的AI助教在病理学绘图评分中的应用研究

墨晶1, 韩继媛1, 赵秀兰1, 闫景瑞2, 孙保存1   

  1. 1天津医科大学基础医学院病理学教研室,天津 300070;
    2天津医科大学基础医学院教学办公室,天津 300070
  • 收稿日期:2025-08-05 出版日期:2026-06-01 发布日期:2026-05-28
  • 通讯作者: 孙保存, Email: sunbaocun@tmu.edu.cn

Application of AI assistant based on medical knowledge graphs in pathology drawing scoring

Mo Jing1, Han Jiyuan1, Zhao Xiulan1, Yan Jingrui2, Sun Baocun1   

  1. 1Department of Pathology, School of Basic Medicine, Tianjin Medical University, Tianjin 300070, China;
    2Teaching Office, School of Basic Medicine, Tianjin Medical University, Tianjin 300070, China
  • Received:2025-08-05 Online:2026-06-01 Published:2026-05-28
  • Contact: Sun Baocun, Email: sunbaocun@tmu.edu.cn

摘要: 目的 探讨基于医学知识图谱的AI助教在病理学绘图评分中应用的可靠性。方法 2025年2月,以天津医科大学2020 级“5+3”一体化临床医学专业学生的135份病理学绘图为资料来源,分别由AI助教、Kimi和5名病理学教师按照评分标准对绘图进行独立评分。评分维度包括专业性、准确性、逻辑性、内容完整性、知识运用能力、学习态度与规范性以及创新性与批判性思维。采用Wilcoxon秩和检验分析AI助教和Kimi的评分结果与教师评分的差异。以5名病理学教师的评分平均值作为参照标准,通过组内相关系数(intraclass correlation coefficient,ICC),比较AI助教和Kimi评分与教师评分的一致性。结果 AI助教、Kimi、教师的病理学绘图评分总分分别为68.0(10.0)分、82.0(9.0)分和81.2(7.3)分。AI助教总分低于教师总分,其差异具有统计学意义(P<0.001)。Kimi评分与教师评分差异无统计学意义(P=0.112)。AI助教各维度评分与教师评分的一致性均高于Kimi,其中AI助教在专业性(ICC=0.55)和准确性(ICC=0.56)维度上评分与教师呈中等一致性,而Kimi在专业性(ICC=0.24)和准确性(ICC=0.20)维度上评分与教师的评分一致性差。结论 基于医学知识图谱的AI助教在病理学绘图评分中的可靠性优于通用人工智能模型,可以作为病理学绘图评分的辅助支持。AI助教评分较为严格,可以调整AI助教的评分设置对齐教师评分区间。

Abstract: Objective To explore the reliability of applying AI assistant based on medical knowledge graph in pathological drawing scoring. Methods The study was conducted in February 2025. A total of 135 pathological drawings from students of the 2020 ″5+3″ integrated clinical medicine program at Tianjin Medical University were collected as data sources. AI assistant, Kimi, and 5 pathology teachers independently scored the drawings according to the scoring criteria. The scoring dimensions included professionalism, accuracy, logic, content integrity, knowledge application ability, learning attitude and standardization, and innovation and critical thinking. Wilcoxon rank-sum test was used to analyze the differences between the scoring results of AI assistant, Kimi and teachers' scores. Taking as reference standard the average score of 5 pathology teachers, the intraclass correlation coefficient (ICC) was used to compare the consistency between the scores of AI assistant, Kimi and teachers. Results The total scores of pathological drawing were 68.0 (10.0) for AI assistant, 82.0 (9.0) for Kimi, and 81.2 (7.3) for teachers. The total score of AI assistant was lower than that of the teachers, and the difference was statistically significant (P<0.001). There was no statistically significant difference between the total score of Kimi and that of teachers (P=0.112). The consistency between AI assistant and teachers' scores in each scoring dimension was higher than that of Kimi. AI assistant showed moderate consistency with teachers in the dimensions of professionalism (ICC=0.55) and accuracy (ICC=0.56), while Kimi had poor consistency with teachers in professionalism (ICC=0.24) and accuracy (ICC=0.20). Conclusions The reliability of AI assistant based on medical knowledge graph in pathological drawing scoring is better than that of general artificial intelligence model, and it can be used as auxiliary support for pathological drawing scoring. The score of the AI assistant is relatively strict, and the scoring settings of the AI assistant can be adjusted to align with the teachers' scoring range.

中图分类号: