Blog: What I’ve Learned Working with 12 Machine Learning Startups
8 lessons about products, data, and people
I have worked with 12 startups. They have spanned verticals from fintech and healthcare to ed-tech and biotech, and ranged from pre-seed to post acquisition. My roles have also varied, from deep-in-the-weeds employee #1 to head of data science and strategic advisor. In all of them I worked on interesting machine learning and data science problems. All tried to build great products. Many have succeeded.
Here is what I learned.
It’s about building products, not about AI
As a card-carrying mathematician, I was initially most motivated by the science of machine learning and the challenge of coming up with creative new algorithms and methods.
But I soon realized that even the most accurate machine learning models don’t create value on their own. The value of machine learning and AI is measured in the context of the products that they power. Figuring out how to do this efficiently is what building ML-driven products is really all about.
It’s about the problem, not about the method
If the goal is to build a product, then machine learning and AI are a means to an end. What matters is how well they solve your product problem, not what method you are using. Most of the time, quick and dirty solutions will get you pretty far. Don’t train a deep neural net when a simple regression will do just fine.
When you focus on the problem, you may sometimes discover that machine learning is not the right tool to solve it. Many problems turn out to be mostly about process. Even in these situations, data scientists can contribute a lot of value as they naturally tend to take a rigorous, data-driven approach. But that doesn’t make fixing a poor process with AI a good idea. Fix the process instead.
Look for synergies between data and product
The real value of machine learning rarely comes from taking an existing product and peppering it with predictions from a machine learning model. Sure, this will add some incremental value. But in strong AI products, machine learning is not just an add-on. It is an engine of value creation, and the product is built with the engine in mind: the product and the data must operate synergistically.
When done well, this results in a powerful virtuous cycle that I have called “product/data fit”: the product efficiently realizes the potential value of the data, while continuing to generate the necessary data to improve the product further.
In particular, AI can’t just stay siloed in the data science and engineering teams. Other parts of the organization, from product to the executive level, need to be engaged in the conversation to accelerate the value creation process. This takes significant education and engagement beyond what engineers are typically accustomed to from building software, even in a startup.
Data first, AI later
Machine learning and AI need lots of data, and more importantly, high quality data. If you are building a product from scratch, think about data collection from day one. If you are introducing AI technologies to an existing product, be prepared to invest a lot in data engineering and re-architecture before you get to the AI part.
That doesn’t mean that you must front-load all the work before realizing any value. Better data operations means better analytics, which is critical for any organization to learn and improve. Leverage these wins to demonstrate value and generate organizational buy-in. And when your analytics are rock-solid, you are ready to start thinking about machine learning for real.
Invest in effective communication
Building great products takes great product managers and support from executives. While many are enticed by the power of AI and deep learning, few non-technical folks really understand these technologies. Effective discussion of machine learning and AI requires significant understanding of statistics, creating a communication gap that often leads to unrealistic expectations.
One key ingredient is maintaining an ongoing conversation about business metrics and how they translate into modeling metrics. This puts a lot of responsibility on the product manager, but equally so on data scientists, who must develop domain expertise and a deep understanding of business considerations to be truly effective.
Quick and dirty isn’t actually that dirty
As I mentioned above, quick and dirty methods will get you pretty far. Partly, it’s because today’s quick and dirty is yesterday’s slow and precise. Tools like word2vec have become nearly as easy to use as a regression, and powerful new tools are constantly being introduced. A solid understanding of the various building blocks and the glue between them is essential for any data scientist.
One consequence of this explosion of open source tools is that in most cases, developing proprietary ML platforms is not a good idea. Sure, you should have proprietary algorithms that take well known building blocks and adapt them to your problem and your domain. But leave the deep learning research to the folks at Google — focus on the business problem, remember?
When in doubt, show the data
The most important activity in early stage product development is getting market feedback. But machine learning needs lots of data, and that takes a long time to acquire. This presents a problem: how do you get market insight about a data product without having much data?
The best solution is often to simply show the data to your users. Humans can only process small amounts of data at a time, so it doesn’t matter if you don’t have much. How do your users engage with the data that you show them? Where do they gloss over and where do they want to dig deeper? Exposing information that was previously inaccessible can be very powerful, and will give you strong guidance on the potential business value of your data.
Trust is a major factor in the success of most technologies. Ultimately, every technology is used by humans and must be trusted by humans. In the context of machine learning applications, some of these humans may be concerned about their job being automated away. Others are relying on information supplied by your technology to make an important decision.
An AI product that compounds these concerns, for example by attempting to make decisions for a human rather than empowering human decision making, will lead to a quick erosion of trust.
Trust is easy to lose and hard to regain. Build products that people trust.