Ian Maddox
Founder, CEO @ TallBlueFox Multi-founder, multi-exit, ex-Googler

In my role as a Google Cloud Solutions Architect, I had the opportunity to answer a myriad of questions about Google Cloud Platform technologies. Machine learning was often at the forefront of these inquiries. While many people see machine learning as a panacea, the reality is far more nuanced. My job was to help technical leaders and implementers align their expectations with what machine learning can actually deliver, emphasizing that the power of this tool is closely tied to the specificity of the problem it’s intended to solve.

Many technology leaders and individual contributors have had the experience of being tasked with meeting a broad business objective using ML. It is crucial to reduce that goal to one or more questions that can be answered by an analytical engine. For example, if the objective is to increase sales 10%, one cannot simply “sprinkle some ML on it.” You need to define specific, actionable targets such as:

  • Increase average cart value (ACV) by upselling related products using an ML recommender
  • Increase outbound marketing engagement by choosing marketing content and frequency based on customer data using a classifier
  • Increase funnel throughput by dynamically split-testing content and layout using clustering or reinforcement learning
  • Reduce out-of-stock and overstock events by predicting demand and adjusting inventory in advance of spikes using time series forecasting
  • Understand customer feedback at scale using sentiment analysis and clustering

When evaluating a company or product that claims to use ML, consider what the actual application is. Often the claims are aspirational, smoke and mirrors, or mundane. There are some truly stunning applications of ML out there, but more often it’s just another tool in a developer’s toolkit and not always the best one for the task at hand.

Another point to consider is the actual work required to implement a real-world ML system. Whether you’re building something from scratch or using pre-built models and systems, the lion’s share of the work is devoted to sourcing and properly preparing input data. And once your ML system is generating results, it is crucial to continually monitor and validate the model’s output.

One anecdotal example of a real-world ML application and how the actual work broke down: I was the sole developer of a timeseries predictive analysis platform. The final project resulted in approximately 30,000 lines of code. The actual Tensorflow model generation of this application was a few dozen lines in total.

In this instance, only a tiny fraction of the code and effort was dedicated to creating and using the actual machine learning model. In terms of dev time, about 70% was spent on data science, 25% was spent on output analysis and closed-loop feedback. Only about 5% went directly into actual model development.

These figures will vary based on the project and platform used, but the fact remains that the bulk of the work is in data gathering, manipulation, validation, and results analysis.

Understanding the intricacies of machine learning is vital for anyone looking to leverage its capabilities. From defining specific, actionable goals to preparing and validating data, the real work lies in asking the right questions and supplying good data to get accurate answers. The more rigorously these aspects are handled, the more effective your machine learning initiatives will be.