During this conversation, I am joined by guest Gard Hauge, CTO of StormGeo, a weather prediction specialist and researcher with a background in software. We discuss Gard’s extensive research and its application at StormGeo, his historical experience with the company’s evolving relationship with machine learning, how weather and markets are related, and more. Touching on challenges in the field, my guest reveals the growing volume of data he deals with on a daily basis. We discuss the fundamental role of data engineering alongside machine learning, the key role of third-party data, and more before Gard shares his perspective on the future of StormGeo. He also delves into his experience to give informed advice to listeners. Join in to hear more from this thought leader today! 


Key Points:
  • Gard Hauge, CTO of StormGeo, shares his background and introduction to StormGeo.
  • The topic of his Ph.D. research: weather prediction.
  • Products and services offered by StormGeo beyond weather prediction.
  • The evolving role of machine learning at StormGeo and how it is integrated today.
  • How weather and markets are related.
  • Investments StormGeo is making into generative AI.
  • Gard’s relationship with data collection and processing.
  • The biggest challenges he faces especially in relation to the volume of data.
  • The fundamental role of data engineering in building successful algorithms.
  • The key role of third-party data.
  • Advice for other AI startup leaders.
  • Gard’s predictions for the future of StormGeo.

Quotes:

“The data pipeline is something we put a lot of effort into developing over the last decade. Actually, streamlining how we actually process and make this data available in products and services is key.” — Gard Hauge

“The amount of data we face is typically doubled every two years. So, we need to be quite smart in handling and processing data and what we're actually archiving for machine learning.” —  Gard Hauge

“Everybody talks about AI and machine learning. But our experience is that 80% to 85% of the work is basically data engineering, and that's a key fundament if you want to build successful algorithms.” — Gard Hauge


Links:

Gard Hauge on LinkedIn
StormGeo
StormGeo on LinkedIn


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI. Brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast I interview innovators and entrepreneurs about building a mission-driven, machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[EPISODE]

[0:00:32] HC: Today, I’m joined by guest, Gard Hauge, CTO of StormGeo, to talk about weather intelligence. Gard, welcome to the show.

[0:00:43] GH: Thank you. Really happy to be here, Heather.

[0:00:44] HC: Gard, could you share a bit about your background and how that led you to StormGeo?

[0:00:47] GH: My job at StormGeo started more than 20 years ago. During my grad studies in the late nineties, I was basically joining StormGeo as a part-time employee. And of course, at that time, StormGeo was a very small company. It was a startup company with 10 employees. At that time, it was a traditional weather company delivering weather forecasts through television, and other channels.

In 2000, I basically finished my Master’s Degree and went back to university and did a PhD, which I finalized in 2003. So, I returned to StormGeo and I basically bought the company throughout my whole professional career. Of course, during the last 20 years, a lot has happened in the organization. Like I mentioned, we were 10 employees at that time. Today, we are more than 700 employees in StormGeo, and grown to be a global company with a lot of different products and services.

In my PhD work, I focused on the numerical weather prediction and was basically a tossed coming into StormGeo to develop our commercial weather prediction suite for StormGeo. A lot of mathematical methods, of course, numerical prediction, high-performance computing, that was implemented into the product suite of StormGeo.

I would say back in the time, 20 years ago in Europe, the commercial weather market was just starting off, basically, and being developed into the commodity and the commercial side we are seeing today. So, it was very exciting years to be a part of developing the company in the startup phase, I would say, into what it’s becoming today. I spent many years as a software developer, primarily on the backend side of things, creating data and weather models that was basically used to provide the products or services for [inaudible 0:02:35].

In 2009, I became the head of development or CTO, basically, held that role for the last 12 to 15 years. So, in my end of the organization, we’ve been growing up to more than 170 people now, working with research and development in the organization. Out of these, 50 work as data scientists primarily targeted to work with machine learning and AI developments. I also had a long period where I worked with acquisitions. StormGeo has been growing for the last decade through acquisitions and sitting in the role as CEO has really allowed me very close to the business side of what we’re doing.

Also, for a period during 2017 and [inaudible 0:03:15], I was head of strategy as a vice president, where I focused on the business development and M&A side of our business. So, standing on the outside of R&D really gave me a different perspective and insight into what we’re dealing with from a technology perspective. What does it take? What kind of value proposition do you need to actually succeed with developed product and services?

As you understand, even though I’ve been with the same company for a long time, it’s not been the same role. We’re always been on the move and always been growing. So, it’s been kind of an exciting journey so far. I’m really looking forward to the future as well.

[0:03:53] HC: So, what does StormGeo do today? What products and services do you offer?

[0:03:57] GH: Yes. Basically, we have moved far beyond the traditional weather forecasting that we started off with. Today, we provide weather intelligence and analytics to a lot of industry. For instance, offshore wind, shipping, oil and gas, electric, utilities, traditional media segments, to name a few. Of course, as a company, our products are in this intersection of, I would say, two major megatrends we are facing today. It’s the green shift and also the digitalization of society, and especially the industries we are working towards.

Incidentally, around 80% of our business sits within the marine segments. This is the shipping domain and as I mentioned, offshore wind, and oil and gas. So, we are a leading provider of global services to the shipping industry, where we kind of service more than 12,000 vessels on a day-to-day basis. We are also one of the biggest niche players in the marine forecasting services to the oil and gas industry globally, which is course, in essence, where we have a lot of very advanced products serving these industries.

Of course, we started off in Norway. But today we have 27 offices in 18 countries, and of these, nine is typically what we call Operational Center, which are served by people 24/7. So yes, I think that’s a very short perspective of what we do.

[0:05:26] HC: You’ve been with StormGeo for a long time, that the capabilities of AI were much different in the late nineties, and the early 2000s. How did you think about the role of machine learning in those earlier days of StormGeo?

[0:05:38] GH: I would say, first of all, we didn’t call it machine learning back in the days. But of course, coming from the weather industry. This big data principle has always been a part of the companies. Handling massive data amounts and actually building very systematically around data availability, data cleaning, data quality has been in the heart of everything we are doing in StormGeo.

Also, when I worked as a backend development with the numerical weather prediction, it was a natural kind of extension to work with, typically a statistical method to automatically improve the output we saw from traditional mathematical models. So, that’s been always been part of the DNA of the company. But of course, this has been developed more and more of the years to come. And of course, over the last, I would say, six to seven years, we’ve seen a very rapid kind of development of that, also, within StormGeo. We didn’t call it machine learning back in the days, but we’ve always been kind of heavily invested in the kind of using mathematical and other statistical methods to kind of automate and improve the things we’re doing within the company. Of course, the key topic here is always to improve quality of data, but also kind of automate processes that we can automate by algorithms and other methods.

[0:06:59] HC: What about today? What role does machine learning play at StormGeo?

[0:07:02] GH: I would say, kind of today, machine learning is a fully integrated part of our work stream, and how we actually operate, serving products to the end users. So, on a day-to-day basis, we are analyzing more than 15 terabytes of data. So, that’s typically fresh data coming to our data pipeline. It’s anything from weather model, it’s weather observations, ocean data. But also, I would say we have more than 500 and a unique data sources or third-party data that is part of the data phenomenon, to actually serve day-to-day services.

So basically, the data pipeline is something we put a lot of efforts in developing over the last decade. Actually, streamlining how we actually process and make these data available in products and services is of course key. But I mentioned machine learning being part of the work stream. So, we have a lot of different directions to this. Of course, the things we started off with 15 to 20 years ago with improving quality of weather forecasts. That is, of course, still with us. But what we’ve been seeing over the last five to six years with all the new advancements in open-source use of machine learning libraries, has also really helped StormGeo as an organization.

Building on top of that, we’ve developed something we call Deep Storm, which is basically the StormGeo repository for processing, and also developing the frameworks around the machine learning within the company. We are also using this typically to predict the energy consumption and energy pricing in the electricity market. This is very volatile markets, and heavily impacted by weather. What we’ve seen is that machine learning algorithms can really help us to predict these things a lot better than we could historically.

Another example is that we use machine learning algorithms to route a ship from, for instance, New York City to Bergen in Norway. So, by using advanced algorithms, we can basically save fuel and also avoid dangerous weather and other criteria that you set as boundaries for how you operate things. But I would say kind of another key area where we’ve really have spent a lot of efforts for many, many years is advanced decision support for industry customers. This is typically tying the third-party data I mentioned together with all the various data we are processing on a day-to-day basis.

So, by having this proprietary data access point, it gives us the opportunity to really create the tailored algorithms. For instance, offshore oil and gas rig, and connect this with the day-to-day weather. So, by understanding the behavior of an oil rig, or how sensitive their operations are, given a specific weather machine, we could really make optimized decision support systems for other industries we are selling.

This is just a few examples of what we’re dealing with, more kind of linked towards traditional weather. Lately, we also invested a lot in working with satellite analytics, and using high-resolution satellite images becoming commercially available to companies like StormGeo. Of course, that’s typically high-resolution satellite data provided by big players like [inaudible 0:10:26] or similar, giving us high-resolution images don’t do 30 centimeters. What we’re using these for are typically for vegetation detection along power lines.

Basically, a very different use case from the vendor perspective, it’s just kind of an example of how we kind of go further away from the traditional weather issue we’re dealing with and bringing new insights based on the weather, which still sits in the core of everything we are doing here. Also, lately, we are really investing more heavily into kind of the new opportunities, Generative AI gives us like ChatGPT. I really believe this will be a game changer for a lot of companies and indeed StormGeo, and the way we develop products and services. So, not going into details. We have a lot of initiatives here that will quite dramatically change the way we do our day-to-day work today, but also kind of end up in new products and services in the close future. So yes, I think that’s basically, at a high level, sums up the things we’re doing.

[0:11:35] HC: You’ve mentioned a few different types of data that you’re working with. How do you go about gathering these different data sources? And do you need to annotate it in order to use machine learning?

[0:11:44] GH: I would say there’s two different use case internally. So, of course, I managed the massive data processing pipeline we have built, parsing in all the real time data. That’s one side of it. But [inaudible 0:11:56] is basically the big archive, we have been building over more than two decades in StormGeo. So, we are sitting on several petabytes of data that we typically can make available for major machine-learning projects. The Deep Storm repository I mentioned, that’s kind of the framework we’ve been developing to actually run machine learning projects and actually build the algorithms on top of that. But I would say, the orchestration system that is pulling the right data and transforming it into kind of readable formats for either products or machine learning project is kind of being a key thing in this.

I would say on the real-time side, we use a lot of advanced database technologies to serve our products. But on the machine learning side, we use different approaches depending on what we’re dealing with. But as far as it’s possible, we try to work with the same endpoints and APIs as the major processing pipelines we are dealing with in StormGeo. So, we created a common way of handling data in products, and more or less typically sits on the same infrastructure, working with development of machine learning projects.

Of course, on machine learning project, we also tap into the very, very big repositories we have on other third-party customer data or weather data. I would say kind of working with weather data is in many ways easier, because they’re very well structured. They are divided into mathematically grid, so to say, with latitude and longitude and a timestamp. In my many ways, they are highly structured and easy to work with. So, that gives us an advantage to handle quite massive amounts of data without a lot of problems, so to say. I think that’s a high level on how we do it.

[0:13:53] HC: So, the weather data, as you said, is quite structured. But I’m sure there’s other challenges related to handling it, and training machine learning models based on it. What are some of those challenges?

[0:14:03] GH: I would say kind of the most problematic is actually the amount of data, right? With kind of advancements in computer technologies, and the increased compute power, this traditional weather models is running at higher and higher resolution. So, the amount of data we are facing is typically doubled every two years or something. So, we need to be quite smart in the way we handle and process data and what we’re actually archiving for machine learning purposes at the latest stage. So, data volume is an obvious challenge here.

I would say, a very important part of what we do in the machine learning domain is also the access to third-party datasets. And of course, these datasets don’t have the same quality when it comes to quality control, reliability, and a lot of other characteristics. So, we have a lot of challenges in the way we’re handling these data, and have spent a lot of resourcing in the way we are making these datasets robust, and also being able trusting these third-party data. I would say data QC has been kind of a key challenge, but a very important part to actually serve as something very important to be used for training purposes for machine learning models.

Also, within our shipping segments, we will see data from more than 10,000 vessels on a day-to-day basis. And these datasets are not very big, but they are highly complex in the way they are structured. So, it’s a mix of different nuances in the way we’re doing things. But I would say, the good thing are more or less agnostic to the way we build our infrastructure. So, we could handle a massive amount of different file formats and different ways of dealing with the third-party data. So, in this intersection between real-time data and collecting data, it’s also about building the right availability of datasets, which is always a challenge. So, it depends on, in the end of the day, how we want to use this.

[0:16:06] HC: Data engineering is often the larger challenge even larger than machine learning and projects like these for StormGeo, that it sounds like that that’s very much true because of the volume and the complexity of the data you’re working with.

[0:16:18] GH: Yes. That’s an excellent point, right? If you see all around us, everybody talks about AI and machine learning. But our experience is that 80% to 85% of the work is basically data engineering, and that’s kind of a key fundament if you want to build successful algorithms in the end of the day. So, I can’t really stress hard enough how important it is to have a solid data engineering team working with this. This is something we also realized in StormGeo, I would say, seven, eight years ago, where we formed a team we call data ops, which is basically responsible for streamlining all these data streams you’re seeing and making it easier for people purely working with the machine learning algorithms to access data and deliver projects on that. So, I think that’s a good point and a very important element that leads to what we are doing in StormGeo.

[0:17:10] HC: How does your team plan and develop a new machine learning product or feature? In particular, what kinds of actions you take early on in that process?

[0:17:18] GH: I would say, of course, we are highly commercial in StormGeo. So, it’s always, to some extent, triggered by a question from customer or potential customer. So, in many cases, more or less, since I joined StormGeo, we have been developing a lot of the core algorithms in partnership with customers. That means that they have a unique problem to solve, but in combination with our skill set, our data, they provide us with insight that allows us to work with a problem from our new perspective. So, I would say, that’s typically the starting point. Then, the next phase is digging deeper into the data they typically provide us with.

In most cases, we have a research project in partnership with customers, or run internal initiatives. But it always starts with the data, and trying to understand the problem you’re trying to solve. I will also say, understanding the value proposition you’re bringing to the table with the algorithms your developing is a key thing in this these discussions. As you probably know, coming from the domain yourself, it’s very far from having a good idea and a concept to developing something that really makes an impact for our customers.

So, understanding the domain you’re trying to develop something for it, is very, very important for StormGeo. Of course, if we get access to third-party data, it’s the data quality control. I would say, that’s the key thing in order to progress projects further on. Most companies we work with that comes with data, they don’t they have a professionalized way of delivering the data. So, in many ways, there is a lot of interaction between us and customers before we reach a target, where we have a dataset that we can actually start developing algorithms.

In some cases, you get limited amount of data. In other cases, you get very rich datasets with very high time frequency and many years of data. So, it all depends on the use case, of course, what you can do on this. And of course, depending on the problem at hand, and the data you have available, you basically select the methods for target. So, typically, we start with quite pragmatic and simple approaches like regression methods and things like that, and then we move on to more advanced thing if we see that’s doable or necessary.

Of course, our data scientists, normally, they want to experiment with the most advanced methods. But we’ve also seen that in many, many cases using quite simplistic methods also give quite successful results. I would say, a last thing in this project that has been becoming more and more important over the last, I would say, two to three years, is information security, and basically, the awareness from our customers around how you actually hand the third-party data you’re given access to.

A couple of years ago, nobody really cared about how you use the data. But what we see now is everything you do around cybersecurity and handling of third parties is also part of the project. So, working according to ISO 27001 standards or similar has been becoming more and more important for us when we develop these projects. Because, as you know, there is a kind of race for information, and if you have access to privileged information, typically, customers also want to know how you’re handling it. That’s also part of the, I would say, the design phase of developing new algorithms in this domain. It’s all tying back to how we handle infrastructure, how we handle the data pipelines, as well as how we develop the algorithms sitting on top of that.

[0:21:00] HC: One thing I’ve observed in working with a variety of different teams is that in the beginning, the machine learning developers often don’t have a great understanding of the data. They’re not the domain experts for that particular type of data. Had your machine learning developers collaborate with meteorologists or other domain experts in order to pick up that knowledge to enable them to train better models.

[0:21:22] GH: This is a good point. Without the domain experts, you can’t really succeed in these projects from my perspective. So, what we’ve seen is that these machine learning developments, they are always team efforts. You don’t have individual data scientists or developers that can do everything. It’s about bringing the right people into the same context and actually understand the problem you’re trying to solve.

As I mentioned earlier, this is one of the most important things we do early on in these projects. Typically, if it’s working with external sets, it’s kind of a combination of them being active part of developing it, or we need to kind of get our people at training in the domain where we are targeting. I would say, kind of working in the shaping and geophysical industries, a lot of our data scientists, they are coming from our PhD background in Mathematics, and mathematical, statistical, or geophysical sciences, so they have a lot of the ground base to work with weather as a starting point.

Of course, again, understanding the customer problem you’re trying to solve is a key thing, and bringing the customer side into this discussion is always important. It’s, of course, also the most challenging part of it, because it takes a long time to become a domain expert as you stated. Of course, that’s also why we have our primary focus area, so we don’t step too far outside what we know we can deliver quality to. It’s a difficult question, but something we try to work systematic events.

[0:23:00] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:23:04] GH: I’ve talked about a lot about the data fundament. So, understanding the data you’re working with, and build a solid data fundament for actually develop your AI or machine learning algorithm. I think that’s a stopping point.

I would say, another thing, kind of working with StormGeo and been with the company more or less since inception, and growing from 10 employees to 700. It’s also about getting your value propositions right. It’s a very long distance between a fantastic idea or a fantastic algorithm to actually sell a service that someone is willing to pay something for. Understanding the industry, the customer you’re trying to sell this to, is also a key thing to get right very early on. I think, a lot of startups really fail on this point.

Also, I think, increasingly important, as I mentioned, it’s information security, how will you actually protect third-party data if you have access to that. That’s becoming critically important. So, start early to think about the design of your algorithms, design of your infrastructure, and how you actually do all of this. It’s all about trust these days, right? So, if people don’t trust your solution, and how it’s interacting with your data assets, it’s very hard to succeed commercially in this domain.

Of course, the last but the most problematic in many cases. It’s really about attracting the right talents. It’s a real race outside on actually tracking people with the right skill set to work with machine learning projects. So, I think thinking on traditional, kind of pull-in people that you can develop in combination with senior people is one way of doing it, at least. And that’s been the way StormGeo has been building our pool of very strong resources, as we’ve been developing for the last decade in this site. I think that’s the key takeaway from my end, at least.

[0:24:58] HC: Finally, where do you see the impact of StormGeo in three to five years?

[0:25:02] GH: Of course, that’s a very interesting and difficult question. StormGeo has been on a fantastic journey for many, many years, and we’ve shifted ownership for many times over the last decade. But finally, I would say, we found our home in Alfa Laval, the new owner two years ago.

So, shifting from private equity, to an industrial owner is also a very important part of the future of StormGeo. Having an industrial owner gives some very different type perspective when you’re planning and developing products and services. Instead of taking only a one year ahead of time, you have, I would say, the luxury to think three to five years ahead when you plan and think how you develop the organization. But I would say the interesting thing now, by becoming part of Alfa Laval, our owner, is that they work in this IoT space, right? They’re working on this digitalization journey where they want to connect all their equipment into the digital domain. And of course, StormGeo will be an engine to do this transformation, I would say in the Alfa Laval future.

So, taking our expertise and knowledge into the IoT domain to be more advanced services in the interaction between StormGeo and the other, I would say, business units within Alfa Laval segments. I’d say, also, on the machine learning domain, I think we’re just seeing starting face off, the Generative AI is like ChatGPT that I mentioned earlier. I firmly believe this will be a real game changer in the way we act and operate a lot of the machine learning projects. So, just to give you one example, how we today write manual tasks by humans, we already know [inaudible 0:26:48] can we can automate a lot of these processes by using these kinds of new AIs being developed by third parties.

We also do connect to a lot of third-party solution when we develop these things moving forward. Also, we see kind of GitHub open up new opportunities in their repository. So, there’s a lot of tools becoming available that I think will be an integrated part of what we do in StormGeo, given that they actually comply with the laws and regulations when it comes to information security and GDPR, which is a big thing here in Europe.

I would say, machine learning in the combination, and of course, growing the company in the heart of the segments we are already dealing with, is in the essence of the three to five-year perspective of StormGeo. So, as I mentioned being installed do mean sitting in this intersection between IoT, climate change, and digitalization, and creating this machine learning-enabled workflow is really something, at least, I find super interesting and motivating. I’m pretty sure that will be a very integral part of the future of what we do here in StormGeo.

[0:27:57] HC: This has been great. Gard, your team at StormGeo is doing some really interesting work for weather intelligence. I expect that the insights you’ve shared will be valuable to other AI companies. Where can people find out more about you online?

[0:28:09] GH: It would be on our webpage. It’s a stormgeo.com or to our LinkedIn pages or equivalent. So, we have a lot of social media channels. You could also follow StormGeo. Also, feel free to contact me at StormGeo if you have specific questions.

[0:28:24] HC: Perfect. Thanks for joining me today.

[0:28:27] GH: Yeah. Thanks for having me.

[0:28:28] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you’ll join me again next time for Impact AI.

[OUTRO]

[0:28:39] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]