Image credit: Shutterstock

Four reasons why you might need to build your own computer vision solution.

Several facets must be considered when planning a computer vision solution. This technology is changing quickly, so it’s natural to wonder whether it’s worth building it yourself or purchasing an existing solution.

There are a growing number of tools available with advanced image analysis and AI capabilities: CellProfiler for biologists; QuPath, Halo, and Visiopharm for histopathology; Cognex ViDi for industrial applications; and Landing AI’s Landing Lens for a variety of industries. These tools provide an easy means to annotate images, train models, iterate to improve, and often even deploy these models.

Landing AI even outlined 10 reasons for buying instead of building – everything from maintenance and support to increased productivity.

So why would you even fathom building your own deep learning capabilities?

Usually, you wouldn’t. If there’s a platform or toolkit that can solve your task, I highly recommend using it. It’s generally not worth reinventing the wheel.

Deep learning projects are complex, expensive, time-consuming, and unpredictable.

However, there are a few scenarios in which building your own computer vision solution is the only way to go.

1) No Existing Toolkit or Platform Can Solve Your Task

The first reason is pretty simple. If there is no existing toolkit or platform that provides a solution for what you need, you will have to build it yourself.

There are a few common reasons why an existing solution might not exist. Most platforms are focused on the common use cases: classification, object detection, and segmentation, which cover the majority of computer vision problems.

However, not all applications fit into these three categories. The most common reason for needing something different is a unique data challenge that cannot be solved solely by preprocessing. Some examples include:

  1. Weak labels where you can only assign a label to one large image or to a group of images
  2. Distribution shifts between your training and inference data like staining or lighting variations
  3. Imbalanced data such as a long-tailed distribution where there are few examples of some classes
  4. Multiple modalities of data like images combined with omics or geospatial data.

These are just a few examples that lie outside the typical use cases.

Toolkits and platforms are not designed to handle every situation – just the most common ones. So if your project is an outlier, you’ll likely need to build the solution yourself.

2) You Need to Build the Data Pipeline Anyway

Another thing to consider is that computer vision models aren’t the only thing you’ll be building. You also need a data pipeline, which can vary from very simple to extremely complex, depending on your data.

The data pipeline is needed to collate, clean, and annotate your data. To get it ready for machine learning. Sometimes, this component is far more complex than the computer vision models – particularly if it involves aggregating data from various sources.

If data engineering will be a significant part of the work and none of the toolkits or platforms can do exactly what you need, then you will need to build it yourself. And if you need to build a software team for data engineering, then you might be best to also bring on machine learning expertise. Chances are you’ll be iterating on both the data pipeline and models to develop your solution. This will be easier to do successfully if the team is working together.

3) Deep Domain Expertise is Required

The data itself is very important. Some computer vision applications don’t require much domain knowledge: Is it a cat or a dog in the image? A car or a pedestrian? But others are entirely dependent on domain expertise.

Radiologists, pathologists, microbiologists, agronomists. For many applications, domain expertise is needed, not only for annotating images, but also for understanding what’s going on in the images, what types of features a model should focus on, what variations should be accommodated, and why the latest model is getting it wrong.

Domain expertise is essential in developing a successful solution. Just as iterating on the data pipeline requires the data and modeling teams to work together, integrating domain experts into the team is also essential. If they are siloed in different organizations, progress will be much slower.

4) You Plan to Expand Your Capabilities Significantly in the Future

Another thing to consider is your future plans. Using a computer vision platform may be best if this is just a single project. But if you plan to develop more applications afterward, it will be beneficial to build the expertise in-house.

A computer vision team is a long-term investment as there are multiple complex components to a successful solution. I already mentioned the data pipeline and modeling. But you also need to be able to deploy and maintain your models, monitor for data drift, and update your models when needed.

Using a platform to handle these responsibilities will reduce your team’s time investment, but it will also impact your flexibility.

If your organization’s future is heavily dependent on computer vision, you want to be in control of that future.


Existing toolkits and platforms are great for solving many computer vision applications – most, in fact. So be sure that you research and evaluate them when deciding on the best path forward for your organization.

If you do decide to build your own computer vision solution, be prepared for it to be complex, expensive, time-consuming, and, at times, unpredictable. But it is the only way to go if the alternatives aren’t feasible, your data pipeline is unique, domain expertise is essential, or you want to grow in the future.