Today I'm joined by the Founder and CEO of Aquabyte, Bryton Shang, to discuss his mission to improve and enable fish farm efficiency and sustainability. Bryton fills us in on the role machine learning plays in monitoring underwater fish farm environments, the challenges of gathering and annotating data to build and train their models, and how their human-in-the-loop QA process converges to find solutions. Tune in to discover how Aquabyte’s mission-oriented, multidisciplinary, and multimodal nature impacts recruitment, and hear Bryton’s astute recruitment advice for leaders in the field. Aquabyte is a stellar example of an AI-powered startup looking to create a better, more sustainable future for the world at large.

Key Points:
  • Bryson Shang’s background and what led him to create Aquabyte.
  • Aquabyte’s mission to enable efficient and sustainable fish farming.
  • The role machine learning plays in monitoring underwater fish farm environments.
  • How Aquabyte built their ML models.
  • The practical challenges of training their models and their solution-finding systems.
  • How Aquabyte’s mission-oriented, multidisciplinary, and multimodal nature impacts recruitment.
  • Bryton’s recruitment advice for other leaders of AI-powered startups.
  • His vision for Aquabyte’s impact in the next three to five years.

Quotes:
“[At] Aquabyte, we're focused on how machine learning and computer vision can help fish farmers be more efficient and sustainable.”

“There’s definitely a mission-oriented aspect to [Aquabyte], which is attractive to a lot of folks that are looking for more of a mission-oriented bent.”

“AI is a broad label, and I think the business domain in which you apply AI and how you apply it is really important, ultimately, to the success.”

“By having autonomous fish farms, or even on land where you can have much more scalability of fish farming, then you can really increase the supply of fish.”

Links:

Transcript:

[00:00:00] HC: Welcome to Impact AI, the podcast for startups who want to create a better future through the use of machine learning. I’m your host, Heather Couture. Today, I’m joined by guest, Bryton Shang, Founder and CEO of Aquabyte, to talk about improving fish farm efficiency. Bryton, welcome to the show.

[00:00:17] BS: Yeah, great to be here.

[00:00:19] HC: Bryton, could you share a bit about your background and how that led you to create Aquabyte?

[00:00:23] BS: Sure. So, as you mentioned, Aquabyte, we’re focused on how machine learning and computer vision can help fish farmers be more efficient and sustainable. I don’t actually have any background in fish or fish farming. But my background is more as a technologist. So, just to maybe share a bit about the histories. I formally studied operations research and financial engineering at Princeton and ended up working in quantitative algorithmic high-frequency trading after graduating, which is very much a data science type focus. And then over the next couple of years, went more into the startup side, as a technical co-founder of a couple of companies. One of which that was leveraging data science for brand licensing.

Another one that where I was a CTO was actually, I guess, similar to your company was using computer vision for histopathology. And yeah, really got more into the machine learning and computer vision side of things. And then Aquabyte, I incubated out of NEA, which is a VC firm here that I have worked with one of the partners to come up with a new idea around applying machine learning to a new industry that could be impactful, and really, I have learned about the world of fish and fish farming. Over half the fish we eat comes from a farm and how these types of applications of machine learning computer vision can actually be really impactful to be able to enable a new type of farming and make these farms more efficient. So, happy to share more about the background and the story of Aquabyte as well.

[00:02:05] HC: What does Aquabyte do? And why is this important for food sustainability?

[00:02:10] BS: Yeah. So, if you think about where your food comes from, so a lot of us eat fish, over half the fish we eat comes from a farm. This is in the context of global overfishing and the need to just produce high-quality protein. We’re all just covered by 70% water, yet 5% of our protein is coming from the ocean, and the need to sustainably more, produce more protein if we’re running out of arable land. So, what these farms are doing is, imagine in the fjords of Norway, where we operate, they have these massive fish pens. So, imagine these floating pens in the ocean, where they’re growing hundreds of thousands of fish at a time. And otherwise, kind of putting little fish in, growing in over the course of a year and it becomes big fish, and it’s harvested and ultimately goes here to your local Whole Foods and is eaten.

The idea that over this year, year and a half where it’s grown, how do you know what’s happening beneath the surface? How do you know what the weight of the fish is? How it’s growing? How healthy it is? This is virtually impossible, but it because the fish are beneath the surface. And so, simple camera that’s monitoring the fish, monitoring the growth, the health, if there’s any parasites, this is providing new data that’s helping the farmer farm more efficiently. And so, we have a subscription service where we work with a lot of these farms to monitor the growth of the fish and provide that as a SaaS service.

[00:03:48] HC: What role does machine learning play in that monitoring?

[00:03:51] BS: So, in that, you have a camera that’s underwater that’s taking pictures of the fish. And we’re taking pictures of thousands, tens of thousands of fish a day, and we’re analyzing those images. Actually, for every fish, determining the weight, looking if there’s parasites, looking at health indicators, and this is something that otherwise would be prohibitive to do if you didn’t have, for example, computer vision model that was identifying key points on the fish. Identifying the lice on the fish, wounds. We have other models that are taking the stereo image of the fish and figuring out a 3d model that then is determining the weight of the fish. This itself is – machine learning and computer vision is actually enabling this to happen.

We’re also using it in terms of building more decision support tools, where if we can determine how the fish is growing, we can forecast growth, which is also based on a model of its own, or we can use it to, for example, determine decision support, when do I need to treat the fish? How much do I need to feed the fish? So, machine learning and computer vision affect many different levels of how the company operates.

[00:05:09] HC: How do you gather and annotate data in order to train these types of models?

[00:05:13] BS: Yeah, so that’s quite interesting and it was a challenge for us. So, if you think about these fish swimming underwater, fish swims by the camera, you take a picture of it, but then the fish is gone. How do you actually know? For example, say I’m trying to figure out what the weight of the fish is. How do I ground truth that? It’s quite difficult because there’s no easy way to ground truth.

In a normal computer vision application, for example, you could probably just verify what you actually annotated. The image is actually what it is, the ground truth. But in our case, we need to gather that ground truth data separately. And so, at a certain point, it literally involved us going into farms, physically weighing each individual fish, taking pictures of them, and really collecting this very expensive and difficult data set. But that’s one of the examples to the extent of how we actually built these models.

In other cases, where, for example, identifying parasites on the fish, all fish have natural parasites. For example, we work with a lot of salmon farms, which have salmon lice. We’re also classifying the lice into different stages. So, this is a very specific and technical annotation from a marine biologist of this sea lice. This is also required by the government to be able to maintain proper food safety, so they have to monitor this. For that, we hired marine biologists that are going in and helping us create annotations of the sea lice, and then that occurring to a dataset that is then training this model to automatically identify the lice on their own, and now operates as a QA process.

Also, as a part of not just one-off research annotations, but as part of a human-in-the-loop process where we’re creating this virtuous loop of additional annotations and a QA process to accrue to a dataset that is reducing deviations that the model may have. So, both real world’s getting out there, collecting the data. It’s one-off annotations for new models, and then it’s also, yeah, this human-in-the-loop process. I’d say, that’s mainly for the computer vision models. And then for other types of machine learning models, like for example, we’re trying to determine, for example, statistically, what the average weight of fish in a pen is, we’ll get other sources of data. For example, when they harvest the fish, they know what the weight is. So, we can actually back-test and figure out about appropriate model that gives the correct ground truth weight. And so, we’re collecting external datasets that’s ultimately allowing us to get to a better and better answer.

[00:07:55] HC: And working with this type of data, with images of fish underwater, with the associated data in order to ground truth it. What other kinds of challenges do you run into, as you try to train models?

[00:08:08] BS: There’s a number of practical challenges. I wouldn’t say – I mean, certainly getting get an initial straw man of a model up, there’s some initial work to doing that. But then where the rubber meets the road is really how do you make this work in different oceanographic conditions. So underwater, particularly in computer vision, the turbidity issues runoff and particles in the water that affect visibility. That’s one challenge.

We’re also dealing with animals that have specific behavioral components and also spatial temporal components where we have this – imagine we have this camera in a fish pen, and we need to be in the right place at the right time to collect the right representative sample of fish that’s going by the camera. So, how do we actually get a representative sample, that itself is challenging. How do we make it work for different breeds and species of fish in different locations, that may have different morphological differences or differences based on geography and local ocean conditions?

So, a lot of that work is just adapting the system to be robust enough to work in all different conditions. And, again, I mentioned a lot of real-world, very physical challenges that we need to do, and that relate to, yeah, really the viability of like getting a business like this up and running. Because, I mean, if you have to deal with all of that from day one, and you’re really trying to run a business that depends on having the proper data and accuracy that can be challenging, just because there’s all these different factors that go into that. But I would say that ultimately, some of these challenges are solved. Technically, some of these are solved on the business side in terms of then addressing what level of deviation are acceptable from a business perspective, and then also from a technical perspective? What’s our path to that and ultimately converging and solving these challenges?

[00:10:10] HC: So, on the technical side to solve, handle the diverse set of imagery and fish that you need to deal with, is the solution mostly surrounding data collection and making sure you have the right data set and then sanitate it? Or is it more about adapting your algorithm to be more robust?

[00:10:26]BS: It’s being able to have a process by which if we follow the process, we almost get to a deterministic solution. So, what I mean by that is, for example, like if we introduce our system to a new environment, I’ll just take the example of like detecting these parasites on the fish, these sea lice. We have this human-in-the-loop QA process, where we can detect deviations in the algorithm because we QA the output. Because the QA happens, we guarantee that ultimately, we will converge on the right solution. So, that is a process, this virtuous data loop that then guarantees that the system will work. So over time, then we see the convergence and the error rates and ultimately, that’s a solution for us.

In other cases, it requires us to actually do bonafide science research. For example, where are the fish in the pen? Do they go up during the day and go down at night? How is it affected by thermal conditions, like changing temperature, sunlight? That itself is an actual bonafide science question. And so, actually partnering with research institutions to actually run controlled studies. But also, we have a number of these systems out in the field. We can run a lot of natural experiments as well to converge on the solution. But a lot of this is good old-fashioned data science and deductive reasoning, to kind of figure out how can I get as close to a model as truth as possible, because the type of systems we’re dealing with are, in some ways, non-deterministic systems. No one knows how the actual behavior of a fish is, and it’s not like we can 100 percent ground truth, what we’re doing. So, it’s making the best inference based on the information.

A lot of it is also decision-making under uncertainty. But I’d say – so, that’s on the technical side. Just with that, I mean, we literally have staff that’s dedicated on the ground to go in and collecting in the field, additional data that’s helping ground truth our models.

[00:12:42] HC: How do you validate them? How do you make sure that they are robust? Any specific strategies you have there?

[00:12:49] BS: Yeah, so I’d say – yeah, I mentioned kind of the QA process for a lot of the computer vision can be done visually for things that are like, for example, the weight of the fish. I mentioned, we have a process that we compare when the fish is harvested, that’s the ground truth weight. And that’s a data point where every time they harvest the fish, we can gather that information and to validate and refine our models to make sure that they’re accurate. And then I’d say, yeah, it’s then being able to make sure that the system is stable as we scale like, going from 20 systems to 200 to 2,000. Making sure that we can manage quality at scale, because certainly, the first initial systems you can validate hands-on. And then eventually, you need to have a process by which you monitor and address and make sure that like the number of edge cases is not increasing. But you can have a converging number of edge cases that’s manageable that the R&D team can work on at any given time.

So, I’d say we approach it from a number of different aspects and these are company-level KPIs. So, it’s not just the responsibility of the R&D team, but its the entire company to make sure that we’re hitting these proper quality and performance milestones for the customer.

[00:14:11] HC: So, hiring for machine learning and data science can be quite challenging right now, due to the high demand for professionals in this field. What approaches to recruiting and onboarding have been most successful for your team?

[00:14:22] BS: For us, we’re in a bit of a unique position in terms of us being more of a mission-oriented company. A lot of what we do relates to food sustainability. For example, in our last fundraising round, the Nature Conservancy, was one of our investors. So, there’s definitely a mission-oriented aspect to it, which is attractive to a lot of folks that are looking for jobs that are looking for more of a mission-oriented bent.

I’d also say it’s a multidisciplinary problem because we integrate hardware, software algorithms, fish biology, and that recruits eclectic sort of folks where we have folks who are working on the computer vision aspect, working on machine learning models, working on more data science models, and often, the multimodal nature of the problem where any one problem can be solved in three different ways, I think is compelling to a lot of folks where – in contrast, for example, you might have another company or position where it’s just focused on a single problem, like, “Okay, train the CNN and get it to be as good as possible”, versus our problem. If we want to improve the weight accuracy, does that mean, we improve the computer vision aspect when we determine the key points on the fish? Does that mean, we need to improve our statistical sampling of the fish? Or does that mean we need to improve kind of our assessment of how do we go from those points on a fish to the ultimate weight?

So, I think, for a lot of folks that kind of more fundamental problem solving and less agnostic of the tool that you use, whether that’s computer vision, machine learning, or data science to ultimately solve the problem. I think, that has been attractive. And then yeah, I’d say in terms of recruiting specifically, yeah, I mean, referrals through other engineers and through our investors, and generally, staying abreast in thought leadership and general best practices, I mean, or how we’re finding good folks.

[00:16:28] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[00:16:32] BS: Yeah. I mean, I think one of the things that maybe we could talk more about or – yeah, I think, is important in addition to the technical side, is also the business side of how these AI companies succeed. Because I think, I mean, AI is a broad label, and I think the business domain in which you apply AI and how you apply it is really important, ultimately, to the success. And what I mean by that is to take that maybe the extreme example, like in autonomous vehicles, I mean, you really need the car to make like no mistakes, just because the implications of making mistakes are so high. I think for us, I mean, yes, we misweigh a fish that’s, or we misfeed the fish, that’s challenging, but it doesn’t have as dramatic of a business implication.

So, I think, depending on the application, and the risk level you have affects the applicability of the AI. And fundamentally, because these models are uncertain, they’re better applied to domains where you can live with that uncertainty and be successful, and launch a business that can succeed with a limited data set where you can, over time, improve those models, but you don’t need to be 100% accurate from day one to do that. I think that depends on both picking the right domain, but also, how you approach and socialize it with the customers that they understand. I mean, when they’re buying a system that weighs the fish, it’s not like a thermometer. It’s a system where it also requires your involvement to position the right camera in the right place. And that ultimately, these are animals, and there’s uncertainty in that, and that’s just part of the nature of this type of service.

I think it’s important that folks who are kind of leading these types of companies are also taking the business side into account and not just only relying on R&D and the technical side to resolve all the issues. Because then, it could result in, for example, a very long time before you actually start selling or getting to revenue, because you feel like you have to hit such a high threshold to get started. Whereas you can probably get started a bit earlier by having a bit lower expectation, but then, socializing it properly. So, that would probably be, my biggest advice, is to also look at the business side and together with the technical side together, to figure out how to make these types of startups successful.

[00:18:55] HC: That’s definitely some helpful advice. Finally, where do you see the impact of Aquabyte in three to five years?

[00:19:01] BS: What we’re really working towards at Aquabyte is to ultimately drive radically better fish farming, and what that means is going from a more – ultimately, fish farming came from fishermen, and it’s more of an experiential way of growing the fish and moving towards more of a data-driven way of growing the fish, and more autonomous way where a lot of these farms are along the coastline because people go to the farm to grow the fish. By having autonomous fish farms, or even on land where you can have much more scalability of fish farming, then you can really increase the supply of fish.

And that’s really important because if you look at just the demand for healthy protein, we’re not making more healthy protein fast enough. So, we really need these innovations that can allow new types of productions. And Aquabyte’s contribution and all of that, as we hope that the data we’re providing is allowing these farmers to farm a lot more efficiently and working in getting closer and closer to this notion of a fully automated fish farm where then you could just really expand production significantly.

What I mean by that, is actually almost us going from being a – providing data about the fish and the growth and the health, to actually helping with decision support around how do you feed the fish? How do you treat the fish? How do you harvest the fish? And ultimately, even controlling the farms themselves where these self-contained systems where we automatically feed the fish the right amount, we then treat them at the right time and have this system that ultimately can operate in the middle of the ocean. I think the impact of that will then be that the fish that you and I eat every day for food, we can actually get that at a reasonable cost and continue to have that as a healthy source of protein.

[00:20:51] HC: This has been great. Bryton, your team at Aquabyte is doing some really interesting work for food sustainability. I expect the insights you’ve shared will be valuable to other AI companies. Where can people find out more about you online?

[00:21:05] BS: So, we have a website. You can take a look at aquabyte.ai. We also have a TV episode that came out with the CTO of Amazon, Now Go Build. So, definitely check that out. If you haven’t seen it, you’ll see what a fish farm looks like. And then of course, if you’re interested, please reach out and we’d be happy to chat and share more about what we’re doing.

[00:21:27] HC: Perfect. I’ll link to those in the show notes. Thanks for joining me today.

[00:21:31] BS: Yeah, thanks. Thanks for having me.

[00:21:32] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you’ll join me again next time for Impact AI.


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.