Over the past few years, there has been a concerning rise in rates of depression, anxiety, and other mental health disorders, especially among adolescents and young adults. In addition, the current state of our national healthcare system is not set up to offer equitable access to all, depriving a huge portion of the population of the help they need. The new platform, Kintsugi, has been taking important steps to address these shortcomings by developing AI learning models that detect signs of depression and anxiety from speech samples.

Today on the show I welcome Rima Seiilova-Olson, Co-Founder and Chief Scientist at Kintsugi, to talk about the current state of mental health care and what Kintsugi is doing to offer support to the many individuals who need it. I talk with Rima about her difficult experience with postpartum depression, her subsequent struggle to access mental health care, and how these events (combined with her expertise as a software engineer) led her to co-found Kintsugi. Rima goes on to explain how Kintsugi can be used as a tool by mental health professionals, and the benefits of incorporating it into clinical workflows. We also discuss some of the biggest challenges of working with speech data, the systems they are putting in place to combat bias, and how psychiatrists are helping them validate their speech data. To learn more about this incredible technology, and the life-altering impact it could have, be sure to tune in today!

Key Points:
  • Get to know today’s guest, Rima Seiilova-Olson, Co-Founder and Chief Scientist at Kintsugi.
  • The set of circumstances that inspired Rima to create Kintsugi.
  • What it means to bring quantifiable and scientific measures into the field of mental health.
  • How Kintsugi detects signs of depression and anxiety from speech samples.
  • Understanding the benefits of incorporating this technology into clinical workflows.
  • The integral role of machine learning in this technology.
  • The limitations of traditional healthcare systems.
  • How Kintsugi is helping more people gain access to mental health care.
  • Why Kintsugi uses psychiatrists to collect and validate data.
  • The biases that can occur in models trained on speech data.
  • The measures Kintsugi has put in place to mitigate these biases.
  • Some of the biggest challenges of working with speech data.
  • Rima’s insights on the potential financial, clinical, and emotional impacts of this new technology.
  • Rima’s advice to other founders of AI-powered startups.
  • Why it’s so important to combat the trend of digital health companies prioritizing financial gain over clinical impact.
  • The expected impact of Kintsugi over the next five years.

Quotes:
“My co-founder and CEO, Grace Chang, and I put our heads together and decided to start by bringing quantifiable and scientific measures into the field of mental health. Which has mainly been qualitatively and subjectively driven for many decades.” — Rima Seiilova-Olson

“Instead of these clunky tools, we give [healthcare providers] seamless tools that can analyze depression or anxiety in their patients.” — Rima Seiilova-Olson

“I think the main impact that we, as founders, are excited about, is the emotional impact of our technology. Which is not quantifiable, but we believe it is going to be immense.” — Rima Seiilova-Olson

“By connecting patients to access, we’re going to have a profound effect on society. [A society] that is observing skyrocketing trends in the rates of depression and anxiety, especially among young adults and adolescents.” — Rima Seiilova-Olson

“We’re observing interesting trends where certain companies prioritize financial gains and revenue over clinical impact and the clinical outcomes for the patient. And those stories not only affect that one startup, it affects the whole industry.” — Rima Seiilova-Olson

“I think every single AI startup in healthcare needs to prioritize the ethical implications of their product. So that as an industry, we cover some of the damage that has been done by some of our colleagues.” — Rima Seiilova-Olson

Links:

Transcript:

[INTRODUCTION]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:34.2] HC: Today, I’m joined by guest Rima Seiilova-Olson, founder and chief scientist at Kintsugi, to talk about identifying mental health challenges from speech data. Rima, welcome to the show.

[0:00:45.3] RSO: Thank you for having me Heather.

[0:00:46.7] HC: Rima, could you share a bit about your background and how that led you to create Kintsugi?

[0:00:52.3] RSO: I am a software developer by training and have been working in technology for quite some time in big data with structured data and when Alex Nuc came out in do12, of course, structured data was no longer cool and everyone wanted to work with unstructured data.

And some experiments with unstructured data NLP and single processing just led me to falling in love with network and this concept of function approximation and I started working on some of my pet projects and it also coincided with the time in my life when I struggled with postpartum depression that led me to seeking out mental health services.

And the combination of those two experiences and my struggles to access mental health services led to my decision to work on a technology that can in some way improve the field of mental health and as a technical person, myself and my cofounder and CEO, Grace Chang, both of us put our heads together and decided to start with bringing quantifiable and scientific measure into the field of mental health that was mainly qualitatively and subjectively driven for many decades.

[0:02:22.5] HC: So what Kintsugi do and why is this important for mental healthcare?

[0:02:26.3] RSO: So at Kintsugi, we detect signs of depression and anxiety from speech samples free-form speech. Whether a person is talking about their day, about their work, regardless of the topic, we analyzed the way the person is producing variety of full names and sounds and used that as a proxy into what’s happening in the person’s brain as it’s affected by psycho motor retardation or any other conditions that take place during depression or anxiety episodes.

And we are a deep tech company funded by national science foundation grants initially and we worked on first establishing, is it even possible to discern from voice and find the signal that would be indicator of depression and anxiety and now, we have established that it is possible and we’re now trying to integrate it into clinical workflows and mental healthcare settings, so that there is a tool to objectively quantify mental health severity.

So as we know any other area of healthcare relies on a variety of tests, such as blood biomarkers, ultrasound, FMRI, X-ray and those tools objective tests help clinicians arrive at decisions about diagnosis and treatment but in mental healthcare, in psychiatry, doctors don’t have anything objective to rely on. So it’s really difficult to make any progress if you can’t measure the progress, if you don’t have any tools in your arsenal to rely on. So we’re creating the first tool for the mental health professionals.

[0:04:31.6] HC: And what role does machine learning play in this technology?

[0:04:35.4] RSO: Machine learning plays an essential role. As we know, you know, neural networks are universal function approximators that can find the relationship between the independent variable X and the dependent variable of Y and when it comes to unstructured complex data, such as voice or speech signals and your other methods other than deep learning, they’re not effective.

There has been quite a bit of work by researchers done in the past decades before Alex Smith and before deep learning and they were not able to achieve analyzable robots for self but now, with the deep learning models being available for a variety of domains and abundance of data, speech data, that is publicly available speech data, we are able to train these models that can be generalizable to the extent it can actually be turned into a product.

[0:05:44.7] HC: So what kind of models do you train and how do you gather and annotate data for this? What form does that take?

[0:05:52.1] RSO: We use a variety of different approaches. As you know in machine learning, we need to rely on a variety of techniques, right? So we train and supervise learning models. Before that, we analyzed the data and bring it to a shape and a form that’s digestible by our neural network using first unsupervised learning and the type of specific neural network architectures very greatly.

In the same model that produces results, we can combine LSDMs, CNMs and there’s a PCA step in the middle. So it’s a combination of a variety of different approaches in a way that it’s performance results and the particular types of experiments we do depend on the, some of the demographic characters to set these papers as well because we know that the speech signal voice can be affected by certain characteristics such as gender, age or any other characters and we make sure that we create the best model that serves that particular demographic in the best way possible and leads to best results.

[0:07:10.6] HC: So this speech data you’re working with, what kind of setting is that recorded in and how do you get annotations for it in order to train your models?

[0:07:19.8] RSO: Yeah, that’s a really good question. Abundance of open source data on the Internet is only helpful when we’re talking about learning general speech characters. It would pick some patterns but the signal specifically related to the depression or anxiety can only be derived from an annotated sense and the way a lot of health sec companies train their models is by purchasing or by accessing medical data that are collected at hospitals or health systems.

AHR data is one of the examples where a lot of the medical health sec companies training data comes from. In our case, we use a variety of different channels. While data from clinical collaborations with hospitals and health systems is one of those channels, the vast majority of our data is coming from our observational studies that we conduct via our consumer application and via our in-house surveys by accessing and recruiting patients and participants from social media and from participants from the users of our consumer app.

We do this for a variety of reasons but the very first and main reason being that, we want to access patients who may not be represented very well in the AHR data because as we know, the AHR data has over-representation of certain demographic groups, certain socioeconomic groups that have regular access to healthcare. Whereas people who need mental healthcare services maybe are the ones who don’t have access to regular check-ins, who don’t have their primary care physicians because they’re food insecure or they have challenges that prevent them from taking proper care of their health.

So the way we gather data through our consumer application, let’s us go around the healthcare system and reach the patients who may need mental health services directly. Through Apple app store, our consumer application is available for download and people use it for journaling and journaling will give us, audio journaling will give us the X component, the independent variable and the application also includes exercises of filing out PH9 and got seven questionnaires that are used in clinical practices to screen patients for depression and anxiety.

So results of those questionnaires gives us a why label [inaudible 0:10:30.3] approximately losing any neural network and that’s where the vast majority of our training data comes from. If you’re interested in the validation data, that’s slightly different and has a lot more guardrails around the robustness of the labels instead of these self-reported questionnaires. I can touch upon that if you’re interested.

[0:10:53.4] HC: Yeah, so what do you do for validation and gathering validation data? That’s an interesting but important extension of this.

[0:11:00.8] RSO: Exactly, yeah. So since the deep learning models are so data hungry, we give them a lot of data that was collected through these observational clinical studies but the validation is a much more rigorous process, where we have mental health professionals with at least seven years of experience and board-certified psychiatrist, talking and conducting structured interviews with the patients and arriving at their diagnosis and the psychiatrist’s diagnosis are used as ground tools when we validate our models and in some cases, agreement between psychiatrist is actually an issue as well.

So we have a rigorous extra education process where not only one psychiatrist opinion but two or three psychiatrist opinions are solicited to determine whether this patient or the patient is depressed or not. So right now, we’re actually going through the process of conducting studies for our FDA submission as a self-rising medical device. We want to become the first FDA approved screener to be included in clinical workflows and for that particular study, we have a very rigorous process around establishing what actually is [inaudible 0:12:34.1].

Because like I said, unlike any other area of medicine, psychiatry is very subjectively driven. It has a lot of opportunity to introduce bias because the opinions are subjective to the psychiatrists and we make sure to put a lot of guardrails to make sure that the benchmark with comparing our models performance against is actually as close to the truth as possible.

[0:13:06.4] HC: So can you elaborate some more on how bias manifest with models train on speech data and some of the things that you’re doing to mitigate it?

[0:13:15.9] RSO: Definitely. So bias can be introduced in several different levels. Right now, we see bias for example in just the screening rates for mental health in primary care visits and why do I talk with primary care visits is because that’s the defacto setting where mental healthcare is provided in the US. 76% of antidepressants are prescribed in the primary care setting.

So that’s the place where a lot of people who need help are actually identified but do physicians screen for mental healthcare for depression or anxiety during the visits, not according to the literature. According to literature, only 4.5% of primary care visits are doing 4.2% of primary care visits, depression screening takes place and it’s all dependent on the physician’s subjective opinion, whether the patient needs treatment or not and that’s one place where the bias can be introduced.

As in we see from the data that African-Americans are two times less likely to be screened than their Caucasian counterparts or elderly patients are two times less likely to be screened as they’re working age counterparts. So that’s the very first place where bias is being introduced currently in the clinical practice and when it comes to speech data itself, what we can pick up in the speech is some proxies for the geographic location of the patient, various accents or various languages can have artifacts that are manifested in their speech that our model may erroneously pick up on.

So for us in order to make sure that our model is not picking up on certain signals that are not relevant to the patient or anxiety, we do conduct these experiments where we’ll look at that performance of a model for patients from certain geographic area, certain age ranges, certain genders and certain ethnic backgrounds to make sure that we serve all of those populations in a similar manner and we have a uniformed performance across them.

[0:15:51.8] HC: What other kinds of challenges do you encounter in working with speech data?

[0:15:56.2] RSO: Working with speech data is challenging due to a variety of factors such as the artifacts that can be introduced while recording that, each data. There can be a lot of background noise, the settings of the microphone of the recording device might be a little bit different depending on the manufacturer and we need to come up with the mechanisms to ensure that our models predictions are device independent.

And we need to normalize it and provide it in a certain way that disregards all the characteristics that might dependent on the hardware and they only pay attention to the factors and characteristics that are relevant for determining the diagnosis. We do that using a variety of different techniques but the main thing is to make sure that we have rigorous guardrails around accepting or rejecting the audio.

It’s just like in phlebotomy, the samples need to – need the quality standards before they can even be analyzed, right? For example, in a tube we need to make sure that it’s blood that belongs to one patient. In speech for example, it’s a little bit difficult because a person is speaking to their doctor and in the background, we have TV playing or music playing. So that’s potential for contamination, right?

It’s a contaminated sample where we have two different people speech samples. Another analogy is the tube being improperly sterilized and the quality of sample is being compromised in the phlebotomist blood sample example. Similarly, in the case of speech, we can have some background noise that is not another speaker’s voice but just a lot of white noise or a lot of variety of different noises such as dogs barking or traffic.

So that type of sample is also compromised because it doesn’t meet our standards for a quality analysis, that’s another challenge. So we need to create guardrails to make sure to discern the quality sample from what is not good enough to be analyzed and if in the other healthcare settings this is taking place in clinical workflows, in our setting, it’s a little bit difficult to create those guardrails to ensure that the person is completely insulated.

In an insulated environment to prevent any sort of background noise for example, that’s a little challenging. So that’s another consideration that we need to take into account when we’re designing our experiments.

[0:18:59.4] HC: Yeah, that quality control step I’m sure is essential to making this work. Thinking more broadly about what you’re doing, how do you measure the impact of your technology?

[0:19:08.5] RSO: Impact of our technology I believe is being measured in a couple of variety of standpoints, financial or clinical and as a founder, of course I am going to come off biased, right? The financial impact of the technology can be measured, quantified and we believe that by early identification and treatment of mental health conditions, we help prevent overutilization of healthcare system like the patients who are currently slipping through the cracks.

For example, if a patient has unidentified and untreated depression and they have diabetes, the cost for care for diabetes actually increases two to five times. If a person has depression or anxiety, the rate of utilization of emergency department services or urgent care, which are expensive, sort of misses, the likelihood of utilizing those services goes up with mental health conditions.

So by identifying everyone who needs help and providing them with care, we believe that we’re going to bring down the cost of healthcare and introduce efficiency that can contribute to reducing the healthcare stand in our country that has been growing to up to 20% of our GDP. So that’s from a financial standpoint. From a clinical standpoint, I think we’re going to affect the state of care in a manner that will force the parity between physical and mental health.

Right now, clinicians are given this clunky pitch on [Inaudible 21:01] sort of paper and pen-based questionnaires that they need to administer during the calls, during their appointments, which take 10 to 15 minutes to fill out where as average visit with the primary care physician takes about 10 minutes. So instead of these clunky tools, we give them seamless tools that can analyze depression or anxiety in their patients.

So that the doctors can not only analyze and pay attention to the chief complaint that the patient is presenting with but look at the patient’s healthcare in a more holistic manner. They can look at how they’re doing mentally and how they’re doing physically and in that holistic manner, they come up with a more effective treatment that can actually look into the root cause of the problems that the patient is having instead of just treating the symptoms they’re presenting with.

So those are financial and clinical impacts but I think the main impact that we’re as a founder, founders in the company that’s working with this technology are excited about is the emotional, the emotional impact of our technology that is not quantifiable but we believe it is going to be immense. As a mom who struggled with postpartum depression with myself and a lot of my teammates who either themselves or their loved ones have been through mental health struggles, the amount of emotional support we could provide to those patients is quite immense.

A person or a patient struggling with depression not only struggles himself or herself alone, it affects the mental well-being of people around them, their friends and family. So by connecting patients, one by one to access, we’re going to have a profound effect on the society that actually is observing skyrocketing trends in the rates of depression and anxiety especially among the young adults and adolescents.

Of course, I’m biased but I do see the impact of our technology in a way that I can have a profound effect on just mental wellbeing of our nation.

[0:23:31.0] HC: Is there any advice you could offer to other leaders of AI powered startups?

[0:23:35.3] RSO: Yes, definitely. I could offer an advice where we can be a little bit more responsible about the implications of the work that we’re doing. In digital health, we’re observing interesting trends where certain companies prioritize financial gains and revenue over clinical impact and the clinical outcomes for the patient and that those stories not only affect that one startup, it affects the whole industry.

It affects how digital health is viewed by investors, it affects how technology is viewed by regulators. So by being reckless, you can startup founders or researchers, they don’t only affect their own reputation, they affect the reputation of the whole industry. So I think especially as of right now, [inaudible 0:24:39.1] COVID pandemic, I think every single AI startup in healthcare, it needs to prioritize the ethical implications of their product.

So that as an industry, we cover some of the damage that has been done by some of our colleagues, some of the other companies in the digital health space because we do want to make sure that we win trust from not only the regulators but the patients who need us at the end of the day.

[0:25:13.5] HC: So we talked about impact some already, where do you see the impact of Kintsugi specifically in the next three to five years?

[0:25:21.0] RSO: Well, the overall impact of our technology will definitely contribute to the reducing of the economic cost the ones you had to hit on health, which is right now about 200 billion dollars per year but in the short term in two to three years, we do believe that we’re going to affect countless patients who need care by integrating into clinical workflows at various hospitals and health systems in our country.

We’re already integrated into the call centers of one of the largest health insurance organizations and the call center that serves about 20 million calls and we are providing the psychiatrists here and for functionality to the nurses who may not be specifically trained in mental health but we’re augmenting them, giving them this superpower to be able to see what the patient is experiencing mentally and connecting them to care at the time of need even if the patient doesn’t realize that they may need help at the given time.

So that’s what we’re doing already but looking ahead, once we receive our De Novo clearance through the FDA, we intend to expand to some of the largest health insurance or health systems in the country and be integrated into primary care settings into case management, with care management navigation services so that we can pick up patients who are slipping through the cracks and connecting them to care.

That’s what we see for the next years and our ultimate mission is to become the defacto standard for screening for mental health and the screening implies inherently an ability to detect patients with mental health conditions. From a research standpoint, from a machine learning standpoint, we want to take our product to the next level where we not only detect depression but we also can quantify the severity level, so we can treat our patients.

So patients with a variety of different levels of severity of the depression or anxiety can be connected to different levels of care for the improving of efficiency in the clinical workflows. So those are the plans that we have for ourselves in the next few years.

[0:28:03.6] HC: That’s exciting. I look forward to follow along where you go. This has been great Rima. You and your team at Kintsugi are doing some really interesting work for mental healthcare. I expect that the insights you’ve shared will be valuable to other AI companies. Where can people find out more about you online?

[0:28:20.5] RSO: Our website is kintsugihealth.com and our voice journaling application is available on the Apple app store, so everyone is welcome to download the app, use it for therapeutic purposes and will take advantage of the CBT and DBT exercises that we have in the application and on our website, we do have information about our technology but we’re always happy to answer any questions to any feedback from all different channels be it on LinkedIn, our team is quite responsive.

So I do encourage the listeners to reach out, we’re always recruiting. We’re recruiting excited mission aligned and interested candidates for a variety of positions especially in research. So yeah, we’d be happy to hear from the listeners of this podcast.

[0:29:16.1] HC: Perfect. Thanks for joining me today.

[0:29:18.6] RSO: Thank you for having me Heather.

[0:29:20.6] HC: All right everyone, thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:29:32.0] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.