Infants cry when they're hungry, tired, uncomfortable, or upset. They also cry when they’re in pain or severely ill. But how can parents tell the difference? To help us address this critical question, I'm joined by Charles Onu, a health informatics researcher, software engineer, and CEO of Ubenwa. Ubenwa is a groundbreaking app that uses AI to interpret infants' needs and health by analyzing the biomarkers in their cries. Charles conceived of the idea while working in local communities in south-eastern Nigeria, where high rates of newborn mortality due to late detection of Perinatal Asphyxia inspired him to create a solution.

In this episode, Charles shares insights into Ubenwa's machine-learning models and how they establish an infant's cry as a vital sign. He discusses the process of collecting and annotating data through partnerships with children's hospitals, the challenges of working with audio data, the benefits of creating a foundation model for infant cries, and much more. He also offers human-focused advice for leaders of AI-powered startups and reflects on his vision for success and the impact he hopes to achieve with Ubenwa. Tune in to discover how understanding your infant’s cries can transform healthcare and well-being for newborns and their families!


Key Points:
  • Charles' converging interests in math and healthcare, which led him to create Ubenwa.
  • What Ubenwa does to establish an infant’s cry as a vital sign (and why it’s so important).
  • The essential end-to-end role that machine learning plays in this technology.
  • How Ubenwa collects and annotates data by partnering with children’s hospitals.
  • Challenges of working with audio data and training medical ML models on it.
  • Insight into the benefits of creating a foundation model for infant cries.
  • Variations in infant’s cries and how Ubenwa’s models generalize for these shifts.
  • Valuable research Ubenwa has made publicly available as a gift to the ML community.
  • Charles’ human-focused advice for other leaders of AI-powered startups.
  • What success means to Charles and the impact he hopes to make with Ubenwa.

Quotes:

“Ubenwa was born out of the idea that, if there's something that [human doctors] can listen to to come to a conclusion [about an infant’s health], then there has to be something machines can also learn from the infant's cry.” — Charles Onu

“The real leap we made with self-supervised learning is that you now do not need an external annotation to learn. The model can use the data to supervise itself.” — Charles Onu

“AI-powered or not, – the problem of a startup remains the same. It’s to meet a need that humans have. – At the end of the day, AI is not just there for AI only. It’s only going to be a successful and useful startup if you identify a need and [solve] that problem.” — Charles Onu

“Human babies have evolved to communicate their needs and their health through their cries. We [haven’t] had the tools to understand that. Babies have been trying to talk to us for a long time. It's time to listen.” — Charles Onu


Links:

Ubenwa Health
Nanni AI
Charles Onu on LinkedIn
Charles Onu on X
Charles Onu on GitHub
Ubenwa on GitHub
Ubenwa CryCeleb Database


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[EPISODE]

[0:00:34] HC: Today, I’m joined by guest, Charles Onu, co-founder, and CEO of Ubenwa Health to talk about interpreting infant cries. Charles, welcome to the show.

[0:00:43] CO: Hello, Heather. Thank you so much. Glad to be here.

[0:00:46] HC: Charles, could you share a bit about your background, and how that led you to create Ubenwa?

[0:00:51] CO: Yes, definitely. I’ll say, creating Ubenwa was the journey of my experience in Nigeria growing up, as well as the work I began to do during my Master’s and PhD in Montreal. So, I’ll connect this to that shortly. First, growing up here in Nigeria, I grew up in a very middle-class family and got to see around me some of the real struggles in the society, from economic to education. But healthcare, I think I’d say, stayed with me the most, because it pretty much seemed like without health, what can you achieve. Without health, you can’t really do much more. So, all my life, I wanted to be a doctor. I felt that that was the most practical way I could help people and save lives.

By the time we actually got to choosing to go to med school, I realized that there was no math involved in medicine, and so, I ended up in a dilemma. I wanted to do medicine, I wanted to help people, but obviously, I really wanted to do math, because that was my best subject so far. It came naturally to me and so on. So, I ultimately didn’t go to med school. I went to study electrical engineering in undergrad. Fast forward to later, I found my way back to medicine by basically working in machine learning for healthcare. So, developing learning algorithms in the healthcare setting, asking the question of how we can interpret and understand human signals, human and physiological signals using AI and machine learning.

We started working a lot with neonatologists, and pediatricians, so doctors who care for kids in general. That was what led to the insights that, doctors today already use the babies’ cry, the cry sounds as a clue to their health. So, you’d hear a pediatrician say, “Oh, baby might not be fully recovered, the cry is a bit high-pitched. There might be something wrong basically there.” Usually, there was something that was the case.

Ubenwa was born out of his idea that, well, if there’s something that humans can listen to, to come to some conclusion, then there has to be something machines can also learn from the infant’s cry. If machines can learn that, then we open up this world for parents, for clinicians to truly understand infants from day one. Understand exactly what needs suit their cry, but most importantly, to understand when the cry is due to something more serious, like a health condition or the other. So, there you go. That’s the long-winded explanation of how I got into starting Ubenwa and working on this.

[0:03:29] HC: What all does Ubenwa do today and why is this important?

[0:03:32] CO: Yes. We are basically on the mission to establish the infant’s cry, the infant voice as a vital sign. Feature language is a window to our health. Today, we do not use baby’s cries as part of the markers we look out for when we are caring for them. So, we’ve developed first, foundation models to basically take a baby’s cry and understand why they are crying and when they might be due to a health condition or not. That’s the heart of what we’re doing.

[0:04:00] HC: How do you use machine learning in this? What role does it play in this technology?

[0:04:05] CO: Everything pretty, much from end to end. Because if you take just one subset of the problem we’re solving, which is giving the babies cry, how do you tell if this cry is due to a healthy baby, or if it’s a cry from baby who might be at risk of brain injury. What we’ve seen from clinical literature and our research as well, is that there’s specific markers that change in the baby’s cry when they are at risk of brain injury. The challenge though is that, these things are one, difficult to compute to the human ear consistently. But secondly, is that they are so complex that you can’t really use a linear method to determine them. This is where machine learning really comes in, is taking the signals, and computing a rich set of features from them to basically be able to predict whether this is a healthy or sick cry from this recording. Really, the heart of it is the idea that you can look at a wide breadth of features not just one or two things, but thousands or millions of things in the baby’s cry sounds to make this decision and do it even more accurately.

[0:05:17] HC: The goal is to be able to listen to a particular infant crying and to be able to predict whether that that infant is at risk for something more serious.

[0:05:26] CO: Yes, exactly.

[0:05:28] HC: How do you go about gathering data in order to train these types of models? Do you need to annotate data? How do you deal with the annotation piece as well?

[0:05:37] CO: Yes, exactly. As with all data problems, you need annotations to solve the downstream tasks. This isn’t an exception. Really, the only way to do this is to work with hospitals, because by working with hospitals, we are able to not only record the various cries with the consent of the parents, of course, but also get the medical annotation. So, medical exams are conducted to know babies that are clinically confirmed to be okay, and those are clinically confirmed to have one of the specific conditions that we are testing for, effectively.

We’ve partnered with children’s hospitals around the world, in Canada, in Brazil, in Nigeria, and now in the US, as well, to basically incorporate crying as part of the standard of care. Cry recording as part of the standard of care, and then, annotate it with these exams that are done in the clinical setting. With that, we are building the database, so we build the database that is allowing us to build and ship the models that are built in different patterns.

[0:06:39] HC: What kinds of challenges you encounter in working with this audio data, and in particular, in training models based off of it?

[0:06:46] CO: Several. Well, one of the first so-called challenges we faced, where I’ve just described this idea that, in many cases, in medical machine learning, so to say, you can just call upon historical data to solve your problem. For instance, you want to understand doctor’s notes and predict morbidities and mortalities, you can just refer to records. You need consent and ethics to get those records, but you can. You’re looking at medical imaging, you’re predicting cancer from x-ray images, you can refer to historical records. Well, before we came into the picture, nobody was recording an infant’s cries as part of their medical records. So, it was really a heavy lift to set up the infrastructure and the partnerships to begin to collect this with all the hospitals we work with now. And to even get the physicians to see this as a useful thing for the future of care for these babies as well. I think, though, those are just like a fundamental step of ramping up there.

In the actual analysis of the data, there’s all sorts of things. Audio data is very prone to noise. So, if you think about a picture you take, you could take a selfie of yourself, you are centered in the photo. If you did a marginally good job, you will get yourself clearly in that picture without troubles. But recording audio can be non-trivial if you’re in a noisy setting especially, and different kinds of noise can have different impacts on the sound that was recorded, ultimately. So, really developing rigorous methods for cry detection. So, separating infants cry sounds from other sounds in the recording is a key step to this process. And both detecting non-overlapping and overlapping sounds, both of which require different techniques there. That’s one of the challenges to tackle there. Not sure how much more you want me to go, but there’s other things you look at when you start to analyze the actual signals as well.

[0:08:50] HC: Once you are analyzing the signals, what kinds of challenges come up there?

[0:08:53] CO: Yes. I’ve described how we acquire the data, the challenges with that, how we separate the data from noise, and the challenges with that. Then, I’ll set maybe the last major piece is the piece in which we now want to analyze the cry sounds themselves. So, this has now become more or less standard in cry, in sound analysis, and we’ve really picked up a lot of techniques from voice analytics and speech recognition. In audio signals got us data points to where you can analyze the actual data.

So, I go back to the image example, for instance. A picture of you looks like you and contains features that we can recognize as you. With an audio recording, it’s just this high dimensional impute of numbers. We typically have to transform that again into something called a spectrogram, which is basically, you can think of it as a – we call it a frequency domain, technically, but you can think of it as basically transforming the audio into a shape and form that allows us to see the parameters that matter. Which really has to do with things like the pitch of the cry, the melody of the cry, whether there’s phonation or disphonation, and all these characteristics that we really care about. We have to actually do these transformational steps that you don’t see in other data modalities before you then pass your transformed audio into a neural net for analysis.

At that point onwards, you can use many of these standard methods. If people listening are machine learning researchers, you can use any of these parameters for images, especially to analyze that part of the audio signal, the spectrogram.

[0:10:39] HC: One thing that I read about your approach is that you’re creating a foundation model for infant cries. So, using self-supervised learning to create this. I was wondering whether you could explain that in more detail for audience members who aren’t familiar with this approach? And what kind of benefits does this approach provide? Why did you go this direction?

[0:10:57] CO: Yes, absolutely. The idea of self-supervised learning, which is what underpins foundation models, it’s basically a technique in machine learning that allows you to learn from data that does not have annotations. Traditionally, if you look at the decades from like 2010 to 2020, machine learning was all about supervised learning. That was the thing that was working at least the most outside of research circles. We would have an image, we would have labels, cats, dogs, this, and that. Then, you can build a model to predict that and you can accurately test that the model is doing well in cats, or doing well in dogs, doing poorly on cows, or something like that. But these annotations are key to get the model to actually learn anything.

The real leap we made with self-supervised learning, as the name sort of implies, is that you now do not need an external annotation to learn. The model can use the data to supervise itself, so that’s the self-supervision. I can give you a concrete example about this. What this looks like in practice, basically, is that, you can take an audio signal, let’s go back to audio. Let’s say, it’s an audio of me talking, and we’re trying to detect what I said, or if it was me at all. Let’s just do that. Which it detects it was me talking. You take an audio of myself and basically transform that audio in some way. Either you add noise to it, it’s a bit hard to hear. But if you pay attention, you can see here, it’s me. Or you could do other transformations like clip some segments of the audio out or flip it, so it’s reads from back to front, instead of from front to back, and things like that.

Basically, take my audio and augment it in some fashion. What we can do with self-supervised learning is that we will train a neural network to simply say, “Given these two audio samples, was it still the same child who was talking in both of them?” This is a very simple test, you would get several of the samples with many other people, and you will train it to identify when it’s not me in the augmented sample, and when it’s me in the augmented sample.

If the model can learn to do this well, it turns out that what ends up happening is that the model then is a really, really good representation of human voices, just generally speaking. If you think about this, this is really interesting because you didn’t train it to recognize human voices. In machine learning, what we found is that, by doing this, we train it to really to solve this task. It basically has to learn how to encode human voices in machines in a very efficient way, such that when we do now have a real task to solve, like we tried to test for Alzheimer’s, for instance. So, there’s models that can do this.

Then, we already have a model that can extract those features for me. And now, I just have to build a small layer on top of that to solve the problem. So, self-supervised learning really helps us to use on-label data in a very useful way. That would not have been possible before that. That’s what allowed us to build that lower layer of representation that we can now train our models ultimately.

[0:14:12] HC: It’s enabled you to take much larger quantities of data, just to learn that representation. But then, be able to train a supervised model with less labeled data from it.

[0:14:23] CO: Exactly, yes.

[0:14:25] HC: How consistent are infant cries? Do they vary for infants in different parts of the world? Or other challenges related to how these sounds vary? If so, how do you ensure that your models generalize to these distribution shifts?

[0:14:39] CO: Yes, great question. This is one of the most common questions, I get actually. Are babies’ cries in Nigeria different from those in Canada, for instance? The funny thing is that, the cries are not different in the ways that matter. So, every two babies sound different. If you think about it. If you heard the two cries, a few times, you’ll know who is who. But when you look at the cries from the point of view of the acoustic characteristics, the matter from the pitch to the melody of the cry, the pause durations, the level of disphonation, all of these characteristics that we tend to study while building models for sound and cries, you’ll actually find that it’s impossible to distinguish the cries of babies from Nigeria, to China, to the UK, and elsewhere.

We know these methods from hypotheses, actually, because we’ve actually tested this empirically. Today, we have launched the Nanni App, which is an app that’s used by parents to basically, as people have said, “Shazam, the baby’s cry.” It’s a cry translation app, to help you understand why babies crying in this moment, and what you can do about it immediately. So, parents really love the app. Now, we have users in over 200-plus countries around the world. So, we’ve basically had the data to really test this and ask this question at a global scale, literally. So, we see this consistently that, in those parameters, the matter to infant cry analysis, we see no differences in babies.

[0:16:09] HC: Are there other factors that might make these sounds vary? Maybe it’s related to background noise, or batch effects related to the way the sounds are captured that you do still have to deal with maybe as a noise parameter?

[0:16:23] CO: Yes. Great flip addition there, because that’s really what happens. I describe the steps we take to segment the signals where we remove the background noises that impact the cry sounds. The only time we see differences from region to region is when those noises are involved. Because what you see is things like temperature, and the pressure of the atmosphere actually changes with the background sounds, the impact it has on the microphone, or the actual noises in your background. You could be in a hospital, or you could be on the highway, et cetera. Because we have this step that filters that out, effectively, we get to really just focus on the pure cry sounds.

[0:16:59] HC: You have a number of publicly available research articles, code repos, and models, I saw these as I was looking through your website and learning more about Ubenwa. How do these efforts support your business goals?

[0:17:11] CO: Some do, and some, really just our gift back to the machine learning community. Because when we started working on this as I mentioned, nobody was collecting infant’s cry as part of medical records. So, this was the hardest part of our job for the first one or two years was, how do we build a massive database of infant cries to study them? Because that’s what you need to study, cries. So, given that, we spent so much time, effort in doing this novel adventure. We thought it’s worthwhile giving some of that data back to the community to the open-source community. Which is why we’ve launched a database known as CryCeleb, sort of celebrity babies, basically, to allow other researchers who want to study infant cries because people were writing to us from time to time to say, “Oh, can we get access to data? Can we get access to your data?” We realized that a lot people who are interested in this topic, researchers, especially in academia.

So, we open-sourced some of that database so that people can study with their own convenience, and hopefully, solve more problems for present-day and future infants as well. So, that’s on the one hand why we’re doing that. On the other hand, we share, we publish, and share all our beef in the team, fundamentally come from a research background. We’re all like AI scientists from Mila, which is the AI center in Montreal. So, publishing is part of how we’ve learned, we’ve acquired knowledge in the machine learning community, and growing fast as a community. So, we just keep doing that as a way to also give back to the community. It doesn’t impact our business code, we can build our products, we can build our technology on this, and have trade secrets on things we are building internally.

[0:18:50] HC: It’s definitely a great gift to the community, that access to data and some of the things like foundation models that can require a large amount of compute. Being able to get access to those as a researcher is definitely very advantageous to push this forward and help more people. Is there any advice you could offer to other leaders of AI-powered startups?

[0:19:12] CO: Yes, sure. I mean, I’ll say one thing, AI-powered or not AI-powered, the problem of a startup remains the same. It’s to meet a need that humans have or people have. So, I think people should just not forget that, that at the end of the day, AI is not just there for AI only, but it’s only going to be a successful and useful startup if you identify a need and you’re solving that problem. Just don’t lose sight of that. The exception is if, you’re an academic scientist, researcher, and you’re just studying the algorithms to create new ones or explore new creative questions that you create for yourself. That’s what I would say, that’s my core advice to AI founders.

[0:19:55] HC: Finally, where do you see the impact of Ubenwa in three to five years?

[0:20:01] CO: Success for us will be, one, leading to happier parents, so parents to sleep more. We know a lot of our users today write to us random messages on email just saying how much the app has helped them as a first-time mom, even grandparents who are babysitting turns out they’ve adopted our app as well, and using it, and give us like some really good feedback. So, I feel like no one hand this to the parents, actually. It’s just like really giving peace of mind to parents, helping them to know, in those inconsolable cries, should I be worried? Is something more serious going on? Why is the baby crying? What can I do about this right now? Just hoping to answer that question, especially for first-time young parents.

On the other hand, it is a downstream impact on the child themselves in the neurodevelopment. Our goal here really is to help her spot the earliest signs of troubles or issues. We’re working with clinicians on detecting neurological injury, respiratory distress from the cry sounds. The goal here is to reduce infant mortality, but also help infants to thrive because the baby might survive, but also might be neurologically impaired due to a condition that was not detected early.

So, we really want to pioneer and be that leader in the space that allows the medical community, but also the parenting community to understand infant’s cries and really act upon it. So, human babies have evolved to communicate their needs and their health through their cries. We’ve not really had the tools to understand that. Babies have really been trying to talk to us for a long time, and so, it’s time to listen.

[0:21:42] HC: This has been great. Charles, I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:21:50] CO: Yes. You can find out more about the company first at ubenwa.ai. You will find us everywhere too, on LinkedIn, Twitter, Facebook, what have you. My name is Charles Onu, so you can find me as well in all of these platforms as well.

[0:22:09] HC: Perfect. I will link to all of that in the show notes. Thanks for joining me today.

[0:22:14] CO: Yes. It was a pleasure, Heather.

[0:22:15] HC: All right, everyone, thanks for listening. I’m Heather Couture. I hope you join me again next time for Impact AI.

[OUTRO]

[0:22:25] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe, and share it with a friend. If you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]