Companies often downplay the crucial importance of an optimal data science organization design when starting their AI journey. Under pressure by FOMO, management often rushes to hire a newly graduated young PhD to “tick the box” of Artificial Intelligence in meetings with clients and investors. The hired “genius”, used to the academic environment but a complete alien in a company, is placed in a corner, completely isolated from the rest of the organization. After a couple of months, slowly but surely, companies realize any concrete outcome is far from materializing any time soon and move the new hire under some department such as marketing or finance, hoping a line manager can “fix it”.
Needless to say: the story does not end well for anyone involved. Based on my experience I propose six simple rules to set up a data science organization designed to seamlessly integrate with the business and deliver significant value.
Surprise, surprise: strategy first
My very first question in meetings with clients is, “What are you aiming to achieve with a data science organization?”.
First, we must align the early Data Science efforts with known company priorities and pain points to get the needed internal support. For example, do you urgently need to improve targeting or cross-selling to your customers? Do you have issues with managing the inventory? Do your competitors buy raw materials cheaper than you? Logistics? Quality? Pick one key priority that would make the difference, mobilize the organization and provide focus.
Data Science as an independent entity leading cross-functional projects
Your data scientists will often collaborate with Marketing, Finance, IT, Operations but should not report to any of these. Set up your data science organization for success by positioning it as an equal partner, tasked to develop solutions with other departments. Make Data Science accountable for their business objectives and strategy but structure projects with mixed teams, embedding business and scientists.
Infrastructure for Data Science organization
Most early efforts with Data Science fail because companies lack reliable data collection (sensors, logging, user-generated content) and data flow (pipelines, ETL), the most critical components. Even when databases are in place, data are often inconsistent and have huge gaps or undocumented changes, making data impossible to compare across years. Make data easily accessible, ideally leveraging cloud architectures, so that data scientists don’t have to ask permission or, worse, ask for data extractions to the IT department.
While this setup effort may seem to slow you down at first, doing Data Science with inaccurate data is like opening a restaurant sourcing raw material from trash: as everybody says, “garbage in, garbage out”.
Exploratory Data Analysis first for quick wins
The good news is that, once data flows are reasonably reliable and accessible, your data scientist can start generating insights well before developing any Machine Learning.
A good data scientist will lead the crucial data cleaning and Exploratory Data Analysis phases, understanding key metrics and their sensitivity to various factors. She will propose simple, efficient rules for decision-making while, at the same time, integrating the data flow with missing information.
With a good foundation in place, Data scientists can quickly develop robust Machine Learning models and generate new, innovative capabilities and services.
Designing a Data Science organization: Roles and profiles
What is the best profile for most companies?
Fueled by academic dynamics, many students specialize as Machine Learning Engineers, Modelers, Causal Inference specialists or Data Analysts. Unfortunately, in the real world, flexibility and versatility are way more important. One of the significant issues with hyper-specialization is the hidden cost of handing over information and tasks, wait times, meetings about meetings. Moreover, since nobody owns the product end-to-end, the team slowly generate sub-optimal outcomes.
Unless your company is Amazon or Google, you probably don’t need an army of specialists. Rather than looking for three specialists, one for each phase of the data science cycle, hire three “full-stack data scientists” skilled to cover all the different functions and follow the process “end-to-end”. Start with people able and willing to do the “dirty” work of cleaning data and guide the pipeline building, and they will gladly optimize and deploy excellent Machine Learning models.
Changing the culture
The organization must share a culture of learning and experimentation as pointed out by many in the industry.
Among your company’s executives, whoever has had to do with any software development in the last few years has already been exposed to powerful concepts such as AGILE and Minimum Viable Product. Unsurprisingly, in Data Science, a simple version of the product must work well end-to-end before adding complexities and functionalities.
Executives must be open to learning by doing through an adaptive, iterative and evolutionary development process. In addition, they have to be comfortable with ambiguity and unexpected outcomes of experiments.
A/B testing is showing your idea was not that good after all? Indeed, you will be glad you have avoided disaster by full-scale deployment: learn and move on.
If your management team is full of high-ego stars constantly fighting for resources to support their pet projects, then data science might not work well in your company. However, should your team use facts and objective data for decision making, then a data science department can give your company a substantial competitive advantage.