One of the most powerful impacts machine learning can make is helping to solve environmental challenges all around the world. Today on Impact AI, I am joined by the founder of Greyparrot, Nikola Sivacki to discuss how his company uses machine learning to improve recycling efficiency. Learn all about Nikola’s background, what Greyparrot does, their services, the importance of their work, the role machine learning plays in it, how they gather and annotate data, the challenges they face, how they develop new models, and so much more. Tune in to hear the newest AI innovations Nikola is most excited about before hearing his goals for Greyparrot in the near future. Lastly, get some valuable advice for running AI-powered startups.


Key Points:
  • Welcoming Nikola Sivacki to the show.
  • Nikola shares a bit about his background and how it led him to create Greyparrot.
  • What Greyparrot does, what services they offer, and why it is so important.
  • The role machine learning plays in this technology.
  • How they go about gathering data and annotating it for their purposes.
  • What they are trying to predict with the data they are gathering.
  • Challenges they encounter in training machine learning models and how to overcome them.
  • A breakdown of how his team plans and develops a new machine learning model or feature.
  • Nikola shares how Greyparrot measures the impact of its technology.
  • The two groups of machine learning developments Nikola is most excited about.
  • Nikola shares some advice for other leaders of AI-powered startups.
  • Where he sees the impact of Greyparrot in three to five years.

Quotes:

“Greyparrot basically monitors the flow of waste materials, recyclable materials in material recovery facilities, and offers compositional analysis of these materials.” — Nikola Sivacki

“It's very helpful, – if thinking of a new product, to start with a data set that is really tailored to answering the main uncertain question that is posed there.” — Nikola Sivacki

“Start thinking about data from the start. I think that it’s very important to understand the data in detail.” — Nikola Sivacki

“Our goal is to improve, of course, recycling rates globally so that we can reduce reliance on virgin materials.” — Nikola Sivacki


Links:

Nikola Sivacki on LinkedIn
Nikola Sivacki on X
Greyparrot


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people and planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

Today, I’m joined by guest, Nikola Sivacki, Co-Founder and Head of Deep Learning at Greyparrot, to talk about recycling efficiency. Nikola, welcome to the show.

[0:00:41] NS: Hi, Heather. Thank you.

[0:00:43] HC: Nikola, could you share a bit about your background and how that led you to create Greyparrots?

[0:00:47] NS: Sure. I studied electrical engineering, and computer science. I got interested in neural networks. This would have been 20 years ago. In the following 20 years, I’ve worked on various challenging problems around applying machine learning and specifically neural networks in solving some usually computer vision domain problems, like for example, OCR systems and high-noise scenarios, landmark building recognition systems, and open set, under open set constraints, and these were state-of-the-art systems back in, this would have been 2012 and 2018.

Around 2019 with two colleagues of mine. We decided to start Greyparrot as a way to tackle socially impactful problems, and specifically we looked at the recycling industry at the time, and waste industry being in such a difficult situation. We’ve seen news of waste being shipped across the planet, and we thought maybe technology could improve the process of optimizing all these materials. We found various ways in which deep learning specifically could tackle some of these problems.

[0:02:01] HC: What does Greyparrot do? What services do you offer, and why is it so important?

[0:02:06] NS: Yeah. We have several aspects of the product that Greyparrot basically monitors flow of waste materials, recyclable materials in material recovery facilities, and offers composition, compositional analysis of these materials. This comes from a basis that we cannot improve and manage what we cannot measure, and the waste industry and recycling industry traditionally has had a lack of this information that can help them optimize their processes.

More specifically, we have a product called the Greyparrot Analyzer, which is a system that includes a hardware device that takes images of moving conveyor belts with waste material, which can be partially sorted, not sorted, or sorted, and offers this compositional data at a high time resolution. This data can be used for offline analysis to understand what is happening with the efficiency of the whole system, but it could also send alerts and notify plant managers, for example, of failures in the system.

We then have the Greyparrot Synk, which connects this system to existing and new machinery. Of course, the ever-growing database of these compositions from more than at this point, 15 countries. It can also serve as a source of insight to regulators and manufacturers of various forms of packaging and brands to understand the recyclability of various products. Yeah, this would be, generally.

[0:03:52] HC: What role does machine learning play in this technology?

[0:03:55] NS: More specifically, machine learning is used at the core of the recognition system where deep networks and in the form of object detectors, classifiers, and retrieval systems. These systems recognize the location of waste on the belt and the different types of waste. The compositional data comes directly from the deep learning model. I would say maybe additionally, we use broader machine learning methods and techniques which are somewhat orthogonal to deep learning as such. For example, active learning, active data selection in conjunction with deep learning.

[0:04:37] HC: One of the key components of machine learning is, of course, training data and validation data. How do you go about gathering this data and annotating it for your purposes?

[0:04:46] NS: We, as I mentioned, have our hardware system which has a camera device installed which captures images at the high frame rate. We use the system to gather our own data and that system is used for inference to provide compositional data as well, which means that in addition to having access to all these systems running currently in 15 different countries, we can tag data for training, but we can also use models, predictions to select interesting data for further annotation. When it comes to annotation itself, we have external partners with dedicated trained annotation teams who have learned the problem domain with us and are really instrumental in building a high-quality data set.

[0:05:40] HC: What types of things are you trying to predict, or classify and detect, as you mentioned? Is this like paper versus plastic, or is it something more granular?

[0:05:48] NS: Yes. We have recently launched an 89-class taxonomy, and indeed it is more granular. There are types of paper, types of plastic, types of metal, and so on. Some different aspects, like colors and so on. It could be also part of it. Yeah, but you’re right. It’s some types of paper, plastic, and metals.

[0:06:11] HC: What kinds of challenges do you encounter in working with these images that you take of the different types of ways and in particular in training machine learning models based off of them?

[0:06:20] NS: Yeah. I guess one important property of our problem is a so-called long tail distribution problem, which means we have a relatively small number of broader. Let’s say super categories that are very frequent. Then a long tail of rare categories. That defines certain things for us in terms of our workflows, but in addition to that, when it comes to data itself, there are some universal challenges that most of these categories share which is at these facilities of course, like these objects are manufactured often to look identical. For example, a can of a drink can look identical to another manufactured can, but as it goes through this whole life cycle and when it ends up in a facility like this, it is crushed and dirty, and there are probably no two aluminum cans that are the same at that point.

This variation in appearance in the way that the materials and objects are crushed is one of the main challenges. Of course, the environments in which they operate these material recovery facilities don’t always have the best conditions for capturing images. The environment can be dark, it could be dirty, and so on. One aspect can be also waste occlusion, waste can be occluded partially by other ways. There are ways to of course solve this. Yeah, so those will be some of the main challenges.

[0:07:52] HC: These complexities with the different appearances and so on, does that mean that you need a larger, more diverse training set or are there other ways to approach this?

[0:08:00] NS: Yes, basically. We are always growing our training set. The models can also help us find interesting data to make this process more efficient. We have found that approaches like synthetic data, for example, is not very useful in our case. It is just very difficult to render synthetically this type of variation and early on, we have done some experiments with synthetic data and have seen that the level of control just cannot match the real data that we can capture. So, yeah, that would be one.

I would say, approach that looks like it might work, but in fact in a problem like this, the variation is just too large for this. Maybe one day with some advanced, more advanced systems, we could generate very realistic crushed versions of objects, but currently nothing can compete with real data that we find.

[0:08:58] HC: Yeah. It definitely seems something like, yeah, the number of ways you can crush a can. The number of ways it can be rolled over, stomped on, that is color change, dirt and mud, and whatever else could potentially be on it. That’s a huge diversity that would be quite a large challenge to synthetically model.

[0:09:17] NS: Yes, it is. It can be a challenge in the past. Of course, it was a challenge even for annotators to sometimes annotate these distinctions. This has in the past motivated us to improve our camera system to have a very strong control over the quality of images that we get in order to maximize an annotator’s efficiency at doing this task.

[0:09:40] HC: How does your team plan and develop a new machine learning product or feature in particular? What are some of the things you do early in the process?

[0:09:48] NS: That’s a very interesting question. I would say for a problem like ours, where there are a lot of uncertainties early on, it’s very helpful to start, if thinking of a new product, to start with a data set that is really tailored to answering the main uncertain question that is posed there. That data set may not be very large, but it can be designed to contain the variation that we want to see whether the model can learn. We have done this several times in the past, and usually, this always comes, of course from the client, and the client would come to us with a problem, and with a question. Can we recognize this?

This would be one of the first steps is to gather a test data set that, or I should say, a data set that contains the variation that’s important for that product and not having to contain the variation that we know could be learned with more data. I hope that makes sense. It does not have to be a very large data set, but it needs to answer the key question if possible.

[0:11:01] HC: So then, with that data set that has a variation, what comes next? What do you do with it?

[0:11:05] NS: Of course, one key thing is at the same time and to always be thinking in terms of how does the prediction that the model produces, how does that create value, how does that solve this problem? Of course, you could then, what we would of course do is what is the metric that connects the model’s predictions to the value? There can be some surprising aspects to that where you may not need very, let’s say, a very high F score, F1 score, or you may care more about precision than about recall. Thinking about how would you calibrate the model and how would you interpret its predictions that would provide the value in the end?

The values could be alerts which can have false positives, for example, but need to have high recall. That would be one example. Thinking in those terms about the model that you’re building is also then very important. Beyond that, if the initial system is promising and can show that it can seems to be able to solve this problem. Then of course, we want to get more realistic data and our boxes, we would deploy our box, our production box then and gather real data, and proceed with annotating real data for that task.

[0:12:19] HC: Once you have some annotated data, is there a clear next step?

[0:12:23] NS: Of course, when we have annotated, you would proceed with training your model. Typically, you may not need to have a full engineering solution in place, but the reporting that you do in this second stage needs to describe the value and the predicted, the metrics that you’ve agreed on. What we’ve seen so far is that we don’t have to have the detailed engineering solution ready at the very start, but it needs to have this key setup, the key question answered. Once that’s, of course, as I said, performing, then a data strategy comes in. We deploy our boxes, we see how do we gather more data and scale the solution in terms of accuracy and production, deployment, workflows, and so on.

[0:13:11] HC: Thinking more broadly about what you’re doing at Greyparrot, how do you measure the impact of your technology to be sure that you’re on track to accomplish what you sit down to?

[0:13:20] NS: As I mentioned, we have our Greyparrot analyzers, which are monitoring different parts of the material recovery facilities. They, of course, monitor things like residue lines, which is the line where waste is basically not sorted and goes to landfill. We monitor whether there is any valuable material that ends up in residue where it shouldn’t be. In that sense, there’s a direct monetary relationship with recovered material away from the residue line.

In addition to this, our systems also, by measuring compositional data, they can also automate sampling processes. Currently, a lot of sampling is done manually on about half a percent of waste, whereas our system can monitor 100% of waste flows. There’s an efficiency gain and reliability gain on that end as well. Those will be the two main aspects. We measure monetary diversion from landfill. We also measure CO2 impact of this as well, and the efficiency aspect, as well is something that we’ve seen having a lot of order of magnitude, potentially a difference there.

[0:14:29] HC: Machine learning is advancing quite rapidly now. We certainly see new stuff in the headlines almost every other day, it seems. Are there any new developments in computer vision or AI more broadly that you’re particularly excited about?

[0:14:43] NS: There are some. I would say, two groups of developments. There are new architectures appearing in object detection all the time. There is a plateau of productivity in that aspect. Some very established areas of like active learning and out-of-distribution detection. There is also always new research appearing, but when it comes to more recent advances, I find that probably two fields that are not quite there to be applicable to our problem, but I follow them quite closely. There could be something interesting appearing there. One would be the zero-short capabilities of some of these new generative multimodal architectures.

I think it’s still quite far from being applicable to something that we do, but being able to give an example to a network and with maybe an additional context of a certain product type or let’s say a design packaging type and telling it, “Oh, detect this without having to retrain the model or update any databases.” That could be quite impactful. However, the research is still not there for us to confuse something like this.

There’s another area that I’ve been monitoring closely over the past year, a few years, which is self-supervised learning, which I feel is still not quite there yet. We’ve seen a lot of benefits in large language models from self-supervised objectives. I feel for vision like we haven’t really seen those benefits yet, even though there are self-supervised methods that outperform supervised by a small, small relative, small margin. Maybe there can be something might appear there.

In our case if these objectives could include things like maybe video and tracking. That would be quite interesting for us. Again, I feel like there’s still a few things missing there before this would make a significant impact for us, but we do follow these areas. I think the more old, less exciting areas that offer incremental improvements is something we’ve seen so far having a bigger impact for us.

[0:16:50] HC: It’s great to keep up on the latest literature because even if it’s not ready today and a year or two, some of these might be ready for prime time and be able to advance what you’re developing.

[0:17:02] NS: Yes, of course.

[0:17:03] HC: Is there any advice you could offer to leaders of AI-powered startups?

[0:17:07] NS: I guess this could go maybe a bit back to the question of how to plan a new product or feature, maybe around, one of the things we’ve seen is really the usual advice to startups is the product market fit. But from an AI perspective, I would say there are a lot of uncertainties in this early period. It can be sometimes attractive to pursue a more certain path with improving accuracy, maybe with new methods, even if it’s not necessarily needed or maybe over-engineering early on. I think that’s quite a common pitfall. Not doing these certain things early on, but staying in that uncertain place and trying to build certainty by answering the questions around product market fit, whereas these are points, especially engineering becomes critically important later during scaling.

I think when it comes to these aspects, it’s usual to rewrite things, to redesign things, but that aspect of value to the customer product market, that is a key question to stay in that discomforted state and build, of course, certainty along that path. Also, start thinking about data from the start. I think that that’s very important to understand the data in detail. There are many tools now that one can use to analyze data and some of them can bring a distance between you and the data, and they may not be as useful, but really intimately understanding the data and thinking about how to get better data, more data, and how your model predictions in the end create value. Those would be like a few suggestions related maybe to developing a product., but, yeah.

[0:18:57] HC: Finally, where do you see the impact of Greyparrot in three to five years?

[0:19:01] NS: I guess, the goal that we’ve always had was really to reduce waste to landfill and the emissions, causing emissions reduction, of course, as a result. Our goal is to improve, of course, recycling rates globally so that we can reduce reliance on virgin materials. Also, of course, we are now in the process of undergoing a wider international expansion and building on a collaboration with some of our partners to achieve these goals. But, yeah, this would be, I think, continuing what we’ve been doing so far and scaling our systems further.

[0:19:38] HC: This has been great, Nikola. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:19:46] NS: Find us on a greyparrot.ai. Yeah, that will be the best location.

[0:19:50] HC: Perfect. Thanks for joining me today.

[0:19:52] NS: Thank you.

[0:19:53] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:20:03] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]