Setting and participants
A prospective cohort study of Queen’s emergency medicine (EM) residents was designed and approved by the Health Sciences and Affiliated Teaching Hospitals Research Ethics Board at the Queen’s University. All EM residents from postgraduate year (PGY) one to five enrolled at the Queen’s University from July 1, 2016 to June 30, 2017 (n = 28) were recruited for the study. The study was carried out at the Queen’s Clinical Simulation Center, Kingston General Hospital, and through online collaboration with expert raters from June 2016 to July 2017. Residents provided informed consent to participate in the study, including video recording of their performances in the simulation lab.
QSAT modification to create the RAT
The Queen’s simulation assessment tool (QSAT)  was modified to create the entrustment-based resuscitation assessment tool (RAT) and subsequently used to directly compare EM residents’ performance in the simulation environment to performance in the ED. A strong validity argument for the QSAT has been previously published  along with comparisons of the QSAT to in-training evaluation report scoring  and the multicenter implementation of the QSAT . However, limitations to the QSAT have been noted, including the need for scenario customization and a desire for the tool to utilize an entrustment-based global assessment score. Therefore, limited modifications to the QSAT (Additional file 1) were undertaken to create the workplace-based RAT. The two modifications were (1) the development of generic behavioral anchors for resuscitation performance using a modified Delphi process  for each domain (primary assessment, diagnostic actions, therapeutic actions and communication) and (2) the replacement of the global assessment scale with a contemporary entrustment scale . A pilot study has demonstrated a strong correlation between the existing/original global assessment score of the QSAT and the chosen entrustment score .
A purposeful sample of practicing physicians in critical care, local EM faculty, external EM faculty, and junior and senior residents were chosen to participate in the derivation of anchors. Specific individuals were invited to participate based on past experience with the QSAT and qualifications reflecting expertise in EM and simulation-based education and assessment. An email invitation was sent out, explicitly stating that participation would require adherence to a revision timeline including three rounds of a modified Delphi via FluidSurveys™.
In the first survey, participants were asked in an open-ended format to generate behavioral anchors for each of the four domains of assessment of the current QSAT. The focus of assessment for the RAT was competence in resuscitation performance, as defined by an entrustable professional activity  written by study authors (AH, DD): “Resuscitate and manage the care of critically ill medical/surgical patients”. The anchors refer to critical component actions for successful resuscitation in the ED. The anchors were compiled by thematic analysis by researcher KW and reviewed by AH and JR, all blinded to participant identity.
In round two, the most frequently cited anchors for each domain were then distributed to the experts via a second survey. In this round, the same participants were asked to rank each anchor according to importance, based on a 5-item Likert scale (1 = not important, 5 = extremely important), and explain each ranking through an open response question. An inclusive list of important anchors for each assessment domain was used to generate the first draft of the complete RAT. The draft RAT was then distributed to the experts for a third round of minor revisions to ensure experts have reached agreement on the inclusion and wording of specific anchors.
Following derivation of the RAT, a multipronged approach to tool introduction and rater training was provided for all EM attending physicians and residents. The RAT was presented and described at departmental rounds, and faculty were trained in small groups in the ED while on shift by study investigators (AH, DD). Resident RAT training was provided as a special session within the core training curriculum early in the academic year (AH).
Workplace-based resuscitation assessment and simulation-based resuscitation assessment
Residents were opportunistically assessed by their attending EM physician utilizing the RAT while on shift in the Kingston General Hospital ED. Resuscitation cases were defined as any case involving critical illness/injury that required life-threatening critical care, as described in detail by provincial fee codes , familiar to all EM physicians in Ontario. The decision to complete an assessment using the RAT was left to the discretion of the staff EM physician and the resident on shift. The clinical context of the case on which the RAT was completed was recorded on the RAT.
EM residents participated in simulation-based objective structured clinical examinations (OSCEs) in August 2016 and February 2017 as part of their established EM education program . The OSCEs were held at the Queen’s Clinical Simulation Center. Each examination involved two previously developed and piloted resuscitation scenarios involving nurse and respiratory technologist actors . The four cases assessed in the simulation-based OSCEs were set a priori and included a gastrointestinal bleed causing pulseless electrical activity cardiac arrest, chronic obstructive pulmonary disease exacerbation requiring intubation, ventricular fibrillation due to ST-elevation myocardial infarction, and hyperkalemia-induced bradycardia. In summary, each OSCE included two resuscitation cases, so a resident had the potential to be assessed on four cases, each with a single global entrustment score and opportunity to rationalize the numerical score with narrative feedback.
Resident performance was scored using the RAT by an in-person rater and video recorded. In order to measure the reliability of the scoring by the in-person rater, the video recorded performance was also scored by a blinded external rater using the RAT. In-person raters and external raters not involved in RAT development received an orientation training session in which they rated a standardized sample of training video recordings and reviewed with one of the investigators (AKH) until consensus scoring was achieved. Of note, some of the residents were invited to wear eye-tracking glasses during the OSCEs as part of a separate, unrelated study.
Mean entrustment scores were computed for each resident for the summer 2016 OSCE, winter 2017 OSCE, and workplace-based assessments. Scores were compared using the Pearson product-moment correlation coefficient to determine the linear relationship between mean entrustment scores on OSCE simulation-cases and on workplace-based assessments. To determine whether there was any difference in residents’ simulation performance on OSCE scores in the summer 2016 and the winter 2017, a paired-samples t test was conducted. Intraclass correlation coefficients, using a two-way random effects model with absolute agreement, were used to measure the interrater reliability between live and blind ratings of resident entrustment on the four OSCE cases. Residents with missing data (either no OSCE or no workplace-based data) were excluded from the analysis.
Narrative comments collected on the RAT for both workplace-based assessments and simulation-based assessments were coded using inductive thematic analysis . Codes were identified and grouped into themes and then compared across simulation and workplace-based settings by author KW and subsequently reviewed by AH.