中华医学教育杂志 ›› 2022, Vol. 42 ›› Issue (7): 577-580.DOI: 10.3760/cma.j.cn115259-20210817-01034

• 医学教育评估专栏 •    下一篇

临床医学专业(本科)水平测试的等值方法比较研究

张泉慧1, 何惧2, 任杰3, 张颖4, 卢燕5   

  1. 1国家医学考试中心信息评价部,北京 100097;
    2国家医学考试中心,北京 100097;
    3北京语言大学语言测试和人才测评研究所,北京 100083;
    4国家医学考试中心考务管理部,北京 100097;
    5国家医学考试中心发展研究部,北京 100097
  • 收稿日期:2021-08-17 出版日期:2022-07-01 发布日期:2022-06-29
  • 通讯作者: 卢燕, Email: luyan810206@163.com

A comparative study of equating methods applied in standardized competence test for clinical medicine undergraduates

Zhang Quanhui1, He Ju2, Ren Jie3, Zhang Ying4, Lu Yan5   

  1. 1Department of Information and Assessment, National Medicine Examination Center, Being 100097, China;
    2National Medicine Examination Center, Being 100097,China;
    3Institute of Language Testing and Talent Evaluation, Beijing Language and Culture University, Being 100083, China;
    4Department of Examination Management, National Medicine Examination Center, Being 100097, China;
    5Department of Development Research, National Medicine Examination Center, Being 100097, China
  • Received:2021-08-17 Online:2022-07-01 Published:2022-06-29
  • Contact: Lu Yan, Email: luyan810206@163.com

摘要: 目的 基于经典测验理论(classical test theory,CTT)和项目反应理论(item response theory,IRT)下的等值方法对2个年度临床医学专业(本科)水平测试(简称学业水平测试)考生作答情况进行分析,探讨学业水平测试中更为适合的等值方法。方法 基于CTT方法,采用塔克(Tucker)观察分数线性等值方法、列文(Levine)观察分数线性等值方法、等百分位法、等百分位平滑法4种方法,基于IRT方法的单参数、双参数模型中,采用分别估计法、同时估计法和固定共同题参数估计法各3种校准方法进行等值探索,通过等值标准误来分析以上10种等值结果的稳定性。结果 CTT方法的等值误差在0.7~1.6之间,IRT方法的等值误差在0.2~0.6之间,IRT误差更小。CTT方法中,Tucker观察分数线性等值方法误差最小,为0.7,等百分位平滑法误差最大,为1.6;IRT方法中,单参数模型的等值结果优于双参数模型,单参数模型中,固定共同题参数估计法的误差最小,为0.2。结论 学业水平测试等值可以选择IRT单参数模型中的固定共同题参数估计法,通过等值,年度2学业水平测试等值后的分数上调,合格标准保持不变,有效地实现了分数可比,保证了考试公平。

关键词: 临床医学专业, 水平测试, 经典测验理论, 项目反应理论, 等值

Abstract: Objective This paper analyzes equating methods applied in Standardized Competence Test for undergraduates of clinical medicine based on classical test theory (CTT) and item response theory (IRT) in order to explore a more suitable equating method. Methods The research uses four equating methods based on the CTT and six equating methods based on the IRT.CTT equating methods include Tucker observation score linear equating method,Levine observation score linear equating method, equipercentile equating smoothing method and equating standard error equating unsmoothed method. While in the one-parameter model and two-parameter model of IRT, three calibration methods are used which are linking separate calibration, concurrent calibration and fixed Item Parameter Calibration. The stability of the 10 equating results is analyzed by the equating standard error. Results The results show that the equating standard error of CTT method is 0.7~1.6, while the equating standard error of IRT method is 0.2~0.6, IRT equating standard error is smaller than CTT equating method. Among four CTT equating methods, the equating standard error of Tucker observation score linear equating method is 0.7 as the smallest one, the error of equipercentile equating method is 1.6 as the largest one. Among six IRT equating methods, the result of one-parameter model is better than that of two-parameter model and the error of fixed item parameter calibration is the smallest one in one-parameter model, which the equating standard error is 0.2. Conclusions The fixed item parameter calibration in one-parameter model of IRT can be selected as the equating method of this test. Through equating, the score of year 2 is improved, and the eligibility criteria remain unchanged, which effectively achieves the score comparability and ensures the fairness of the test.

Key words: Clinical medicine, Competence test, Classical test theory, Item response theory, Equating

中图分类号: