Data Science — the need for productivity tools
Dec 17 2017
In my last post on Data Science — the new era, I have described how traditional data science is undergoing change in the Enterprise. In this post, I will describe the Enterprise Data Science demand creation and how difficult it is to fill data science positions. I have used Forbes articles and IBM’s Quant Crunch for my analysis.

The Demand

Data Scientists are the next generation expensive workforce and takes the longest time to fill the positions. Data Science and Analytics (DSA) is the market where data scientists play a huge role along with data engineering and data developers. According to McKinsey, the DSA job listings is projected to be around 2.72 M in US.
As the demand for DSA jobs increase, it puts a lot of pressure on the supply of DSA talent in return. We have interviewed several head of data science departments in multiple enterprises and they share the common pain.
“Gosh! I wish hiring data science talent is easier”
Today an average DSA job listing can be around $100K+ with benefits aside. For every experienced professional in this field, there is a huge competition among multiple Enterprises. 81% of all DSA job postings request workers with at least three years of prior work experience. The strong demand for experienced candidates, combined with the strong growth of many DSA roles, creates a chicken-and-egg problem within the DSA job market: there aren’t many opportunities for workers to gain the DSA-related experience that employers are requesting.
Given the above problems in demand, there is a need for Data Science productivity tools.

Today’s Data Science Productivity

Most of the data scientists today spend their time at different stages from data discovery, producing the ML models and finally optimizing them. However, if you carefully observe, this is the first stage that involves data scientists depending on engineering and devops teams. So, the following are some of the challenges for today’s DSA org.
  1. Lack of Collaboration: There is no ease of collaboration among cross functional teams with different skillsets. For eg: A data scientist who is best at Statistics, but may not be good at scaling vs. A data engineer who is best at scaling, deployment but may not be good at Statistics.
  2. Silos Operation: Often, the teams involved in the life cycle of DSA are cross functional teams like Data Scientist, Data Engineer, Data Devops who most of the time operate in silos.
  3. Duplicated Work: Most of the times, work gets duplicated among different team members knowingly or unknowingly as the priority for the team is execution rather than optimization
  4. Standalone Scripts: Scripts gets written among cross functional teams inside DSA and often one script cannot be used for a different ML pipeline/model
  5. No Standardization: There is no standardization of frameworks that people rely on to set strict rules rather it is play as you go
  6. No End to End solution: Often vendors focus on a small problem inside data science, but do not provide an end to end solution for data science. Ultimately, taking models to production is a cross team collaborative effort that needs end to end integration
  7. Headache With Scaling and Deployment : One in three conversations, data teams are worried sick about how their models will scale and continue to perform well at scale.
  8. Data Wrangling Fatigue: PhDs minted from premier institutions, data scientists today spend a lot of time in plumbing disproportionately rather than in core algorithms.
  9. Feature Engineering Nightmares: Current lack of reusability of features via a feature catalog renders constant feature refinement a chore.
  10. A/B Testing Guesswork: Being able to experiment consistently across unbiased, representative variables is crucial for reproducible results between different model algorithm choices.
Given these problems, there is a need for end to end Machine Learning Life Cycle Deployment platforms for production.
Datatron’s AI Platform provides one for the same. For more information please contact info[AT]datatron.io We make data science teams more productive by at least 30%.

Benefits of Machine Learning Life Cycle Data Platforms

  • Increase data science team’s productivity by at least 30%
  • Faster iterations and experiments yield higher quality models
  • Use language agnostic operators
  • Leverage streaming data with different arrival latency
  • Achieve dynamic models through online learning
  • Faster on-boarding of new team members
  • Automatically promote/demote models based on KPIs
  • Ability to automatically test, manage and remove models