Predicting MI fidelity like a human

Brian Pace

November 29, 2021

Motivational interviewing (MI) is the gold standard intervention for facilitating behavior change. It is the key intervention behind most – if not all – health coaching applications, evidence-based substance abuse treatment, and many other conversational interventions in healthcare.

MI is clearly a useful intervention, but how do we help get it out in the world? Like any skill, clinicians need feedback and support to learn a new intervention. Not surprisingly, the developers and researchers that pioneered MI also pioneered detailed fidelity monitoring in mental health interventions. They were among the first to provide well researched tools that support detailed, utterance-level behavioral feedback to clinicians as a part of training (e.g., did the clinician use key components of active listening – open questions, reflections, affirmations) as well as higher-level session ratings of essential MI elements like empathy and collaboration. Over the years, MI fidelity monitoring has provided insights into the best way to train clinicians as well as maintain treatment fidelity.

Unfortunately, the usefulness of feedback systems like those developed for MI is limited by the number of people it can reach. Labeling therapy sessions is a labor intensive, costly, and time-consuming experience – usually more expensive than delivering the counseling itself! Due to these barriers, very few clinicians have access to MI fidelity feedback – and thus it’s hard to learn and maintain MI in all sorts of settings – health coaching apps, traditional brick and mortar clicks, etc.

We created Lyssn to tackle this problem using evolving technologies in machine learning and artificial intelligence to automate the fidelity monitoring process. By giving providers easy access to clinical feedback, clinicians can maintain MI clinical skills learned through training and supervision.

At Lyssn, we pride ourselves on transparency. Our founders’ peer-reviewed research papers are available for anyone to read (you may need to access through your local library) and we seek to provide regular updates on our clinical quality metrics. Not all fidelity metrics were created equally — some are more difficult to human code than others — and this transfers over to the machine learning models as well. We pledge to keep Lyssn end-users informed about our metric performance, including what metrics we may report better than others. This is an ongoing effort and challenge we enjoy tackling.

How does Lyssn evaluate the performance of its MI metrics?

To put it simply – we compare our automated MI codes to those collected by human raters. How does the machine stack up when compared to the human? Our goal is for our automated ratings to perform just as a human rater might – essentially if we added Lyssn as a member of a human coding team, it would function just fine.

A poorly kept secret in the psychotherapy world, is that even two experts reviewing a session will not agree with each other about what the clinician did. When you think about it, that’s not surprising, psychotherapy is a spoken language intervention with lots of room for interpretation – complexity is the norm. Our goal is to only use metrics where well-trained raters can come to agreement after listening to the same session.

To get into the weeds a bit more – we assess human coder reliability (human-to-human) and compare it to machine performance by calculating percent agreement between the human and machine. This measure assesses how well the machine performs relative to human coders. This is similar to our process for assessing our automated CBT fidelity codes as well (you can read more about those results here).

Our MI metrics were derived or adapted from the Motivational Interview Skills Code (MISC; versions 2.1 and 2.5) and the Motivational Integrity Treatment Manual (MITI; version 4.2.1). For those who are interested, we will explain the similarities and differences between these coding approaches in an upcoming blog post.

We assess two main MI fidelity domains:

Session level ratings (empathy and collaboration) and summary scores
Clinician and client behaviors (open question, reflections, client change and sustain talk, etc.)

MI session-level ratings and summary scores

Empathy and collaboration are two session-level ratings that Lyssn provides for every session. In a highly empathic session, a clinician demonstrates deep understanding of what the client stated, including reflecting back content not explicitly stated by the client. Whereas a highly collaborative clinician encourages collaboration and power sharing regarding what’s discussed in the session.

In addition to empathy and collaboration, Lyssn also provides MI recommended summary scores. These summary scores include metrics for calculating MI adherence, non-adherence, and percent reflections, questions, and ratio of reflections to questions.

The figure below shows the percent agreement between machine ratings and human ratings. At Lyssn, 80% is our benchmark that we aim to hit for all of our metrics.

MI clinician and client behaviors

Where several of the session-level and summary metrics look at the session as a whole, MI clinicians and client behaviors are also coded at the sentence or utterance level. When fidelity coding with the MISC, every single sentence is tagged with a behavior. Some behaviors are labelled as MI adherent or non-adherent and other behaviors can be informative based on the type of session. For example, there is a ‘giving information’ code that we might see more often in case management when the clinician is providing information and resources regarding housing or strategies for navigating the legal system.

In addition to clinician behaviors, MISC coding also focuses on client behaviors, specifically client change language. These are client statements that are indicative of movement toward (change talk) or away from (sustain talk) behavior change such as alcohol or substance use. The figure shows both clinician and client behaviors machine-human agreement (client change language listed below the dotted line).

We plan to continue to grow and develop our metrics

We feel confident that advances in artificial intelligence and machine learning can augment psychotherapist’s ability to grow, learn, and develop. Alongside the technological advances, ongoing research and development of our metrics is vital. Every week we continue to add to our human coded data that help inform our models, taking important steps to ensure we are coding diverse clinical settings to provide an adaptable platform across clinics and speech contexts. We are excited to continue to share Lyssn metrics updates with you in the future.

Thanks for taking the time to read more about Lyssn. If interested in future updates, you can sign up for blog updates.