Skip to main content

Table 5 Kane’s framework relating to inferences on the validity of the DART tool [14, 15]

From: The Debriefing Assessment in Real Time (DART) tool for simulation-based medical education

Assessment decision(s)

a. Determine debriefer’s approach towards facilitation (i.e. relative level of ‘guide on the side’ versus ‘sage on the stage’ behaviour exhibited during a debriefing by the facilitator) [17]

b. Type of feedback to be given to debriefer(s) by co-faculty of health professions educators supervisors (i.e. DART tool focuses feedback to an observed debriefer)

Scoring

Are the scores provided by the DART tool appropriate to assess debriefing?

• DART tool scale uses a cumulative tally of instructor statements, instructor questions, and trainee responses. Ratios of these cumulative scores may be calculated. The approach of using cumulative scoring was adapted by LH following experience of observing the debriefing of teams at NASA [19]

• The notion of ‘lumpers’ and ‘splitters’ found when tallying instructor statements, instructor questions, and trainee responses mirrors the natural mental processes that allow for the classification of things through grouping and differentiation. What individuals ‘lump’ or ‘split’ are partially dependent on their cognitive socialisation [20]

• Inter-rater reliability was investigated in this study and prior pilot study using a large number of simulation educators as raters [13]

• This study occurred using videos of debriefings rather than real time limiting the analysis

• The DART scale risks oversimplifying global assessment of debriefing quality in two areas as follows:

(1) Assessment of the full context (how well was the facilitation of the whole simulation activity?) — this

may require use of OSAD or DASH scores

(2) Quality of individual questions — this may require a gestalt interpretation

• Raters require an orientation to the tool to minimise error in scoring statements [13]

Generalisation

Are the scores observed likely to be reproducible?

• Study site was external to that of the tool developers, and no developers evaluated the tool

• DART displays reproducibility of scores [13] and Cronbach α > 0.85

• This appears acceptable when compared to reported reliability of tools used to assess clinical teaching [21] and entrustable professional activities

• Of concern, good quality questions may be preceded by several statements when using advocacy-enquiry techniques [17]. For example, a good quality question of this sort may have 3 statements and 1 question. This in turn may significantly alter the DART scores and could explain lack of association between DASH and DART scores

For novice learners, it might be appropriate to ‘provide information’ — this in turn will affect DART scores

Extrapolation

Do DART tool scores reflect debriefer performance?

• Expert-novice differences not demonstrated

• No evidence for individual debriefer improvement over time through use of DART

• Cutrer and colleagues described master adaptive learners’ improvement over time [22]. A similar conceptual framework is described in debriefers (Cheng et al., 2020). [23] In this context, Cutrer and colleagues described informed self-assessment as important with feedback that is ‘clear, timely, specific and constructive feedback offered by trusted, credible supervisors’. These ideas would appear relevant to debriefer development with the DART tool, as well as other assessment tools aiding this process

• No correlation/association was observed between DART scores and DASH scores

• In other settings, simple objective data has been clearly shown to improve actual performance as follows:

1) Real-time objective audio-visual feedback of CPR performance such as chest compression depth, chest compression rate, and ventilation rate lead to improvements of those objective measures of CPR performance and improvements in the rate of ROSC [2, 23]

2) Real-time quantitative feedback in the form of mean concentric velocity displayed in front of participants leads to improvements in physical performance of strength exercises and improvements in motivation, competitiveness, and mood [24]

Cutrer et al. suggested that using data can be a powerful tool to change behaviour [25]

Implication

What is the impact of the DART tool on debriefers?

• Qualitative data from users (Table 4) suggests that raters are unsure how to interpret the scores

• DART scores identify debriefer’s relative inclusivity and student centeredness, but scores would need to be interpreted broadly in a wider whole of simulation context by experienced simulation

• DART ratios with low TR:(IQ+IS) ratios could indicate when debriefers lecture, which is a common pitfall as feedback is educator driven, instead of learner driven [17]

• DART may amplify feedback to debriefers who do not elicit reflection and/or self-assessment from learners

• DART may have a role in faculty development in the context of peer coaching or feedback from colleagues [7]. DART may have a role in Cheng’s conceptual framework of staged development debriefing skills over time [23]. DART may have uses at all levels of experience within this framework but will be particularly relevant in novice debriefers as to allocate attention to questions that lead to multiple responses and experienced faculty who tend to lecture during debriefings as noted above