Generative AI for Life Sciences with Simon Arkell from Ryght

In today’s episode, I am joined by Simon Arkell, the visionary CEO and co-founder of Ryght, to talk about copilots and the application of generative AI in life sciences. Ryght is dedicated to revolutionizing the field of life sciences through the power of AI. By leveraging cutting-edge technology and innovative solutions, Ryght aims to empower professionals and organizations within the life sciences industry to streamline processes, enhance productivity, and drive meaningful outcomes.

In our conversation, we discuss Simon’s entrepreneurial background, the various companies he has founded, and what led him to create Ryght. We delve into the pivotal role of enterprise-scale, secure AI solutions in healthcare, and learn how Ryght’s platform is reshaping the landscape of drug development and clinical research. Discover the intricate workings of generative AI copilots, the challenges of minimizing hallucinations and validating AI models, and why the utility of the approach at the enterprise level is essential. Simon also shares Ryght’s long-term goals and invaluable advice for leaders of AI startups. Join us, as we explore a world where healthcare and life sciences are transformed by cutting-edge technology with Simon Arkell from Ryght!

Key Points:

Hear about Simon’s background and his path to founding Ryght.
Ryght’s generative AI approach, its potential in life sciences, and the role of copilots.
The importance of enterprise-scale, secure AI solutions in healthcare.
How generative AI copilots accelerate drug development processes.
Differences between training models for life sciences versus generic AI models.
Discover the challenges encountered in AI-powered solutions.
Explore the company’s approach to customer feedback and model validation.
Strategic considerations and advice for leaders of AI startups.
Ryght’s mission to transform the healthcare and life sciences industry.
Where to find more information about Ryght and connect with Simon.

Quotes:

“We built an enterprise-secure version of Generative AI that has many different features that allow large companies and small companies to very securely benefit from Generative AI without all of the issues that a very insecure, non-industry-trained solution might create.” — Simon Arkell

“With this type of [generative AI] technology, you have the ability to completely unlock new formulas, and new molecules that could be life-changing.” — Simon Arkell

“Improving the utility of the platform comes down to the efficacy of the output. It comes down to the in-context learning, the ensembling, and the prompting. But at the end of the day, a human has to determine, in many cases, the accuracy and relevance of a specific answer.” — Simon Arkell

“It's not really about building models. It's about making sure that the right models are being utilized for the copilot.” — Simon Arkell

Links:

Simon Arkell on LinkedIn
Ryght
Ryght on LinkedIn
Ryght on X
Ryght on YouTube

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI. Brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[EPISODE]

[0:00:33] HC: Today, I’m joined by guest Simon Arkell, CEO and co-founder of Ryght, to talk about copilots for life sciences. Simon, welcome to the show.

[0:00:42] SA: Thanks for having me, Heather.

[0:00:44] HC: Simon, could you share a bit about your background and how that led you to create Ryght?

[0:00:48] SA: Yes, absolutely. So, I’m originally from Australia, grew up in Adelaide, in South Australia. But I moved over to the US in the eighties to come to college on an athletic scholarship. Ever since then, and the end of my athletic career, I’ve been a startup guy. So, have typically started companies in high tech and enterprise software. My first company was in the good old days of the dot com boom in 1998, which got some venture capital funding. So, that was a really exciting ride to go through that pretty historic period.

I had a stint in investment banking as well, where I really got to understand the venture capital market that really gave me the skill set to be a founder, and to start companies, raise capital from VCs, and hopefully build the business up and do something impactful. So, I’ve now started seven companies, including Ryght, four or five of them have been venture capital backed. In the last probably 12 or 13 years, it’s been pretty focused on artificial intelligence, which back in 2009, when I started a company called Predixion Software, it was called Predictive Analytics. And so, effectively advanced analytics and artificial intelligence in various industries. That led to me starting a company with two co-founders called Deep Lens, which got even further into the healthcare and life sciences space, where we used artificial intelligence, in this case, natural language processing, to basically go through huge silos of data for oncology providers around the US, and in real-time, match oncology patients to clinical trials for precision medicine.

So, it really used NLP to identify specific genetic mutations. Pull those out, identify other inclusion and exclusion criteria for a patient to get onto a trial, and then bring the trial to the site where they were being treated. That was a very successful outcome for the shareholders, investors, et cetera because we were able to get the tech out to 250 locations around the US. We’re matching patients to clinical trials. They never would have had access to, had that not been the case. Then we were acquired by a company called Paradigm in May of 2022. So, that was a great outcome.

Then, the new company, Ryght, we started about a year ago, when generative AI became a thing, and we realized that ChatGPT, although it was really an amazing and very different approach to some of these age-old problems that my companies have been faced with. It was a complete seachange and could be very, very impactful for this industry, which is healthcare and life sciences. We built an enterprise-secure version of generative AI that has many different features that allow large companies and small companies to very securely benefit from generative AI without all of the issues that a very insecure, non-industry-trained solution might create.

That’s where we are. We’re about a year into it. We have about 20 employees. We have six venture capital funds behind us. It’s been a pretty exciting ride, because we announced the availability of the platform just two months ago, and we already have a number of paying customers, which are midsize and large companies in the life sciences space, that have CROs who perform clinical trials or biotechs. software and data companies that sell into that industry. So, I’ve got to say it’s pretty fun right now.

[0:04:27] HC: Can you tell me a bit more about this platform and what it is you offer? Most importantly, why is this critical for life sciences?

[0:04:34] SA: Yes. So, we have a team that’s done enterprise software many times before in our careers. We’re not two guys in a garage trying to put something together and hoping customers will come. We realized that this industry needed incredible security. So, we built an enterprise-scale, microservices-based analytics architecture that can be ported from cloud to cloud, and we’re in Azure, AWS, about to be GCP, et cetera. What this industry really needs is, and take a biotech as an example. They have incredible crown jewels, which is their company data. Their job, of course, is to invent a molecule, turn it into a drug that should go and needs to go through clinical a trial to be accepted by FDA and other bodies around the world to be commercially available and be sold to cure the disease that it’s invented for.

Generative AI has the ability to really accelerate that process. Drug discovery is really interesting, and we’re not doing too much there right now. But there is a lot of work happening upstream of the clinical trial to identify at scale, a number of different molecules that could be previously hidden, because of old techniques and the sides of the human brain. But with this type of technology, you have the ability to completely unlock new formulas, and new molecules, that could be life-changing. Cancer drugs, as an example. Then there’s just a huge process of taking a drug through the FDA, which is from writing the initial protocol, which is how the clinical trial is performed, all the way through the hundreds of different documents that the FDA requires in order to accept the clinical trial. Then you have to engage sites and CROs, and there are just so many different variables that go into getting a drug through and it takes 10 years or more, in many cases, costs many, many hundreds of millions of dollars, if not billions of dollars, to get to that point where a blockbuster drug could generate billions of dollars in revenue per year.

A drug was just approved by the FDA a couple of weeks ago for cancer in immune oncology, and one course of treatment costs USD 515,000. So, the numbers are big. It’s very important to try and accelerate the process of getting a drug through that clinical trial. Because it’s so intensive on documentation and processes, it’s a perfect fit for generative AI. So, we’ve created an architecture in our software, that allows copilots to be built on top of a platform that’s already trained and very intelligent about healthcare and life sciences so that these copilots can accelerate all of the many processes in getting the drug through the trial. Then, to do some commercial targeting, identify where the patients are, which doctors should be used, et cetera.

But you can’t do that with a very basic kind of broad, shallow platform, like a general foundational model. You need models, you need data, you need processes and methodologies that are very specific to the very unique data that’s in this industry, and that’s what we’ve created.

[0:07:44] HC: So, how do these copilots work? And maybe you have some examples of a use case for one, in which it accelerated the drug process?

[0:07:52] SA: Yes. We’re seeing we’re seeing many across the board. But effectively, it works by, if you’re at a company building your own engine and platform, the question is going to be okay, which model should we use and build around? How do we handle security? How do we handle hallucination? And how do we operationalize these AI copilots so that they’re utilized securely? They utilized in the organization by the personnel, people who are supposed to use them. But just like in any other authentication system in the enterprise, we want to make sure that people who are not supposed to get access do not get access to it.

So, we’ve created all of this orchestration across different language models, different clunking and embedding of data and indexing or vectorization of that data and the retrieval and summarization. The attribution of the data that is leading to a specific response to make sure that the hallucination is not there. These turn out to be copilots, which are user interfaces that could look similar to ChatGPT or Gemini, but it could also be just a software application that has this technology embedded in it, or it could be the processing of huge amounts of unstructured data in a pipeline. The copilot could be the data pipeline that is now doing OCR and machine learning to make sense and structure of huge amounts of unstructured data.

We’ve seen kind of the end user knowledge worker has an example of a copilot being something like, “Hey, help me generate a drug protocol or help me write her proposal to respond to this RFP to win a clinical trial.” But typically, these are 40, 50, 60-page documents with many different sections and those sections have collaboration across many different stakeholders in an organization. So, the smart agent is the copilot that helps write and generate the initial drafts of each section of the document but then helps route the access to this proposal in this example to the right people who should be contributing to it.

Now, on the other side, you have this data pipelining, where you may have the need to query as an end user specific attributes of huge amounts of unstructured data. Imagine if you had 100,000 PDFs or faxes sitting in a repository. Well, now this technology is a huge step forward from natural language processing, which is what I did in my last company, where you could build one model very accurately to do one narrow thing. But as soon as you had new things that you needed to look for, you would have to write new models and these can take months and millions of dollars. But this generative AI has the ability to look at all of the unstructured data. It doesn’t just do it in a zero-shot or one-shot approach. It may go back and iterate in order to create structure from facts, literally hundreds of thousands of faxes or PDFs that can now be queried by the AI, put into dashboards, and put into a workflow.

These are huge unlocks of what is 80% unstructured in this industry. There are huge amounts of data in healthcare and life sciences, and most of it is unstructured and hard to get to. So, just massive unlocks across the board we’re seeing.

[0:11:15] HC: So, I think you’ve all already hit on some of this. But I’m curious about some of the fundamental differences between training models for life sciences versus a generic LLM, like, like ChatGPT. Not for the end use case, but for actual training it, and what it takes to train and use it. What are the most important differences?

[0:11:34] SA: I think the easy question that we get a lot is along those lines, say, “How do you train the model? Why is that different?” In many cases, you don’t need to train a model. In many cases, you don’t need to fine-tune the model at all. I think, this is moving so quickly, that months ago, that was the way to get intelligence. But as soon as you finished fine-tuning a model, or training a model, its knowledge ends.

So, there’s the concept of the think of it as the brain that is indexing the data and then retrieving and summarizing the data from many different sources. But that’s not the actual content we’re looking for. The content is in how you chunk and embed the data, how you store it, how you retrieve it, summarize it, and then what you do with it. So, that’s the operationalization of this intelligence.

Our website says, we’re enterprise generative AI with a PhD in life sciences. It’s not just because we’re taking life sciences-trained models, although we are in the open source, there’s 100,000 available models. How the hell do you make sense of all of that? But we spend a lot of time going through and then finding the very best models for this industry in addition to foundational models. We haven’t even had to fine-tune models, in most cases for our customers, because we’re getting very accurate results that are highly relevant by having the right data clunked, embedded, indexed, and available. And the way that we manage this data, especially unstructured data for the lab, or for genomics, et cetera, we can pull that together in this orchestration that creates the right answer for the application or the end user.

Again, it’s not just about training or fine-tuning a language model. It’s utilizing many language models under the hood, orchestrating across them, and then working with and managing the data to get the right responses, and we’re finding that that’s a far superior method than just, “Hey, look at my data. I just fine-tuned you on this particular type of life sciences data.” It’s been really eye-opening for us and our customers to think, okay, there’s so much you can do, and it changes because there are new standard operating procedures every week or month that happened to be coming out in ways to optimize and improve, in addition to new language models that are coming out, and we’re always keeping a finger on that pulse to make sure that we’ve got the latest and best models under there, under the hood, and that they’re replacing anything that may be obsolete, and then kind of doing the plumbing to make sure that our customers through the APIs are always getting the latest and greatest.

It’s a pretty convoluted answer, I know. But it’s a very complex orchestration that has to happen, and it’s not just about the language model.

[0:14:24] HC: It’s a very interesting answer, though, and somewhat surprising to me. It’s not surprising that data is important, but it is somewhat surprising that you can focus almost exclusively on data and just follow along with LLMs, grab the latest one, update your stuff as needed, and really put your resources into data.

[0:14:44] SA: Yes. It’s amazing. We’re getting the best response from customers who are most knowledgeable about this, which is really interesting. We have one very significant customer right now that has very big strategic plans with all of this stuff, and their chief scientific officer said, “I’ve been trying to build this orchestration engine, but mine just didn’t work. Yours is so much better. I’ve just got a shittier version of what you have and I’d much rather use yours. So, I can focus on building the AI, and letting you take care of all of that plumbing under the hood.”

We hear that time and time again because when people have spent time trying to build and deploy in AI, they realize how hard this stuff is. It’s not just build it once, it’s maintaining it on an ongoing basis. It’s very complex. They just want to build cool AI. They want to do stuff that’s not below the pay grade. Much in this, in the way that this is impossibly a bad example. But in the old days, when I was selling e-commerce, it was hundreds of thousands, if not millions of dollars. You buy the hardware, you get a colocation agreement in a data center, you do payment gateways, you write the HTML, and you build an e-commerce site. Well, then, Shopify came along and now you just get to do the fun stuff. Build a store, sell your products, and get going without worrying about all of this stuff under the hood. At a very basic level, we’re doing that. We’re abstracting all of that complexity away.

You can get APIs in different tech from Amazon, and Google, and Open AI, Azure. But how do you know you’re picking the right things? What happens if one becomes obsolete? What if you have just one language model, and that’s obsolete? Or a better one comes along and you replace it? What’s the technical debt that you’ve now built around the old model? We remove all of that hassle factor and we just make it easy to build on top of us and just get to value quickly. So, we have customers who are literally going live in a week or two, with copilots that are enterprise-class and ready for them just to start moving the needle in their business without screwing around with all the back end.

[0:16:51] HC: So, one of the largest challenges with generative AI is something you already mentioned. Hallucinations. You mentioned attribution. That is part of the solution here. But I was curious if you could expand on that, on how you minimize hallucinations, because of that being such a large challenge.

[0:17:06] SA: Yes. It’s tough, isn’t it? Because hallucinations, you don’t even know they’re happening in many cases, because the engine can be very confident about its wrong answer, and we’re in the position of, again, improving the utility of the platform comes down to the efficacy of the output. It comes down to the in-context learning, the ensembling, and the prompting. But at the end of the day, a human has to determine, in many cases, the accuracy and relevance of a specific answer. We’re doing this time and time again with our customers, comparing them to different models, and different systems out there.

But if you’re getting a response that you want to query or question, you can very easily see which 10 articles were summarized to give that answer. You can have the link to them. You can see and read the actual article itself, or data enterprise data itself because you’re connecting to your company’s enterprise data to get these results as well. Then, you’re able to use this human reinforcement to kind of register within the software, whether you thought that was a good answer or not.

So, lineage and attribution are terms we use a lot. And we think that along with the human reinforce kind of feedback is the way to make these systems more accurate over time. I don’t think there’s any silver bullet. It’s not magic out of the gate, but you need to have that attribution so that you cannot make important decisions based on the wrong data.

[0:18:44] HC: How do you go about validating your models overall? Do you have a systematic process for getting feedback from users in order to understand how well you’re performing? Or how do you tackle this?

[0:18:56] SA: It’s an interesting one. Firstly, customers want different things, and there are three big variables that go into kind of utility of the approach here at the enterprise level. So, either accuracy of the model becomes the most important thing. But what if that’s not fast? What if it’s super expensive? How accurate does it really need to be in order to kind of move the needle for your use case?

The second one is cost and the other one is speed. So, if you’ve got accuracy, costs and speed, how do you balance between those things? When we talk about validation of the models, keeping those three variables in mind, we have worked with commercially available models like GPT 3.5, and 4 Turbo, which is the API version of ChatGPT under the hood in OpenAI/Azure. But we also are huge proponents of the potential of open source. We partner very closely with Hugging Face which is validating, and managing, and testing these very different open-source models that are now available. And we deploy these, and as you know, across our orchestration engine, we have many different language models.

One is specifically tuned for DNA. A new one that just came out as DNA-centric, but also is multimodal and able to handle many different types of omics. We’re just seeing just a constant evolution in the different models that we have. So, when you’re validating those, firstly, you can validate one and test it against its competitors. But the landscape changes a week later when something new comes out. So, it’s just a constant thing, as I said.

We have tooling that checks the effectiveness of the answers. We have the ability to dynamically adjust the prompts and the reasoning instructions to get the results better. We’re continually running automated tests and allowing users to provide feedback on answers, as I mentioned before. We’re giving them the ability to kind of optimize and validate data as well. So, again, a very long multi-pronged answer, but this isn’t easy stuff, this is complex. And at the enterprise level, it’s even more complex.

[0:21:09] HC: Right. It’s definitely not as simple as classification accuracy or something like that, where you have straight-up ground truth.

[0:21:16] SA: Exactly. It was really easy when we were doing predictive analytics at Predixion. It was what is the accuracy of this supervised machine learning model to predict a patient’s readmission risk in a hospital after discharge? Well, okay, AUC .95. We knew it was accurate. What are the input variables? How does that change by location? But it seemed complex at the time, but in comparison to this, there are a lot more moving parts that are so dynamic in generative AI. Again, if this is being deployed at big drug companies, you better have your act together.

[0:21:48] HC: What other kinds of challenges do you encounter in building models for life sciences?

[0:21:53] SA: Again, it’s not really about building models. It’s about making sure that the right models are being utilized for the copilot. But again, that solution is very different depending on the target. So, the target may be just automating or providing efficiency and cutting costs out of a very complex process within an organization. I use routing of a proposal to different stakeholders within a company earlier on. But I think what’s both complex, but also very exciting is that you have the potential for what I call the string of pills strategy, where as you build these copilots, think of them as widgets that help accelerate and take friction out of the day to day of thousands of workers at a company. At one point, do you create smart agents to bring them together and start to get a critical mass of these copilots such that the intelligent agent can leverage them and get a one plus one equals three type of outcome?

One of our customers is this CRO, and again, for non-life sciences people, a CRO is a contract research organization that performs the clinical trial for the drug company. They do that on a global basis. The big ones have over 20,000 employees that are getting paid to go out and identify patients, recruit patients, consent and enroll patients, get them on the trial, manage them on the trial, make sure that everything happens, data is captured, et cetera. It’s a very people-intensive process.

So, one of our customers is a CRO that wants to automate as much as possible for their own processes, but also use the AI to communicate more effectively with the drug company, which is their client. We call them the sponsor. So, the sponsor can benefit using – and we have biotech clients right now at Ryght using the technology in ways that I’ve described earlier. But then, what if you could use the string to pill strategy to have the two organizations work together much more harmoniously, and at a macro-strategic level, for this to proliferate around the industry? Then, the utilization of AI by vendors and sponsors becomes much more automated, much more efficient. Then you start to see a big unlock strategically in the industry that could really accelerate and change the way drugs are developed and deployed into humans. So, it’s pretty exciting.

[0:24:26] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:24:31] SA: Yes, don’t do it. No, I’m kidding. It’s super fun right now. If you’re starting a company, and so assuming you’re talking about founders. I love talking to other founders in this space because there’s just so much happening. If you’re looking to get funded by angels and then VCs, you have to be very, very deliberate about what it is you want to achieve. The addressable market is obviously huge, and we raise money about a year ago from six VCs very quickly. But it was just in a window where I think there was a love affair with generative AI. But then it closed pretty quickly after we raised. Thank God, for us.

I think a lot of investors sat back and started to think, “Okay, this is getting really noisy. Everyone and their dog says that they have AI. Everyone’s an AI company all of a sudden, and how do we try and filter out the noise to get to the signal?” I think now companies must have real traction. In VC world, that means commercial traction. So, if you’re a company that’s using any form of AI, whether that’s machine vision, natural language processing, generative AI, these companies to get funding, if that’s the aspiration, need to have a very, very clear value proposition, a huge addressable market, which I think we all seem to be in at the moment. So, that’s not going to be too big of an issue. A team that is very accomplished, preferably has worked together before, and some commercial traction in the form of revenue repeatability. Without those things right now, I think, and this could change later on in the year. But right now, it’s very hard to get funding without those founding principles.

Now, if you’re starting a company and bootstrapping it, my advice, and I love these companies. My advice would be to find a small area to focus on very, very deep and narrow, and just get really good at closing deals in a repeatable fashion for customers in that space. It could be, the widget could be USD 20 a month. But if you get enough of those and you show growth, and product market fit, that’s the way to turn that into a big deal. There are companies that did raise tens of millions of dollars, if not hundreds of millions of dollars over the last couple of years, up until the window closed, and many of them have done extremely badly and have not found product market fit. So, I think a lot of investors felt that they wasted the money. They panicked and pulled the trigger too quickly. So, there’s a lot more thoughtfulness going into the funding environment right now. But it’s a fun space to be in and I wouldn’t have it any other way.

[0:27:17] HC: Finally, where do you see the impact of Ryght in three to five years?

[0:27:20] SA: I think we can impact drug development, clinical research, commercial targeting healthcare providers, and healthcare payers. The entire ecosystem is ripe for this technology. The problems have not gone away. Previous approaches and technologies have just scratched the surface. I just think there’s just a massive unlock that is possible. And we plan to be the company that provides that unlock across that entire ecosystem. So, it’s pretty exciting times right now. In the US, healthcare alone represents 18% of GDP and it is literally the most broken industry I’ve ever seen, and I’ve worked in a number of them. That also is exciting, because there’s just so many opportunities.

At Deep Lens we came out with AI that did real-time matching of patients to clinical trials that had a newish architecture, was the microservices architecture in the cloud. People thought we’d invented fire because the industry just seemed so antiquated and was working off old architectures. Faxes literally still happen in this industry, fax machines, which just amazed me 15 years ago that those were still around and they’re still around. They’re just so many opportunities everywhere in the industry, and we plan to create a lot of value for customers and save lives. It’s an exciting time to be doing this.

[0:28:45] HC: This has been great. Simon, I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:28:52] SA: Yes. So, we’re on all the socials and Ryght is spelled R-Y-G-H-T. Just remember, it’s spelled wrong. So, it’s Ryght. Ryght.ai and we’re great @ryghtai or @ryght_ai on just about every social media platform. But we’d love to talk to any companies in the healthcare life sciences industry who are providers or CROs or software companies or are real-world data providers. Because we can accelerate their AI initiatives and we’d love to chat.

[0:29:20] HC: Perfect. Thanks for joining me today.

[0:29:21] SA: Thanks, Heather. I appreciate you having me.

[0:29:23] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I help you join me again next time for Impact AI.

[OUTRO]

[0:29:33] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter. [END]