Data Science — the need for productivity tools
Dec 17 2017
In my last post on Data Science — the new era
, I have
described how traditional data science is undergoing change in the Enterprise. In this post, I will
describe the Enterprise Data Science demand creation and how difficult it is to fill data science
positions. I have used Forbes
IBM’s Quant Crunch
for my analysis.
Data Scientists are the next generation expensive workforce and takes the longest
time to fill the positions. Data Science and Analytics (DSA) is the market where data scientists
play a huge role along with data engineering and data developers. According to McKinsey, the DSA job
listings is projected to be around 2.72 M in US.
As the demand for DSA jobs increase, it puts a lot of pressure on the supply of DSA
talent in return. We have interviewed several head of data science departments in multiple
enterprises and they share the common pain.
“Gosh! I wish hiring
data science talent is easier”
Today an average DSA job listing can be around $100K+ with benefits aside. For
every experienced professional in this field, there is a huge competition among multiple
Enterprises. 81% of all DSA job postings request
workers with at least three years of prior work experience. The strong demand for
experienced candidates, combined with the strong growth of many DSA roles, creates a chicken-and-egg
problem within the DSA job market: there aren’t many opportunities for workers to gain the
DSA-related experience that employers are requesting.
Given the above problems in demand, there is a need for
Data Science productivity tools.
Today’s Data Science Productivity
Most of the data scientists today spend their time at different stages from data discovery,
producing the ML models and finally optimizing them. However, if you carefully observe, this is the
first stage that involves data scientists depending on engineering and devops teams. So, the
following are some of the challenges for today’s DSA org.
- Lack of Collaboration: There is no ease of
collaboration among cross functional teams with different skillsets. For eg: A data scientist
who is best at Statistics, but may not be good at scaling vs. A data engineer who is best at
scaling, deployment but may not be good at Statistics.
- Silos Operation: Often, the teams involved in the life cycle of DSA are cross
like Data Scientist, Data Engineer, Data Devops who most of the time operate in silos.
- Duplicated Work: Most of the times, work gets duplicated among different team members
unknowingly as the priority for the team is execution rather than optimization
- Standalone Scripts: Scripts gets written among cross functional teams inside DSA and
often one script
cannot be used for a different ML pipeline/model
- No Standardization: There is no standardization of frameworks that people rely on to
strict rules rather it is play as you go
- No End to End solution: Often vendors focus on a small problem inside data science, but
provide an end to end solution for data science. Ultimately, taking models to production is a
cross team collaborative effort that needs end to end integration
- Headache With Scaling and Deployment : One in three conversations, data teams are
about how their models will scale and continue to perform well at scale.
- Data Wrangling Fatigue: PhDs minted from premier institutions, data scientists today
spend a lot
of time in plumbing disproportionately rather than in core algorithms.
- Feature Engineering Nightmares: Current lack of reusability of features via a feature
renders constant feature refinement a chore.
- A/B Testing Guesswork: Being able to experiment consistently across unbiased,
variables is crucial for reproducible results between different model algorithm choices.
Given these problems, there is a need for end to end Machine Learning Life Cycle Deployment
platforms for production.
Datatron’s AI Platform provides one for the same. For more information please contact info[AT]datatron.io
We make data science teams more productive by at least 30%.
Benefits of Machine Learning Life Cycle Data Platforms
- Increase data science team’s productivity by at least 30%
- Faster iterations and experiments yield higher quality models
- Use language agnostic operators
- Leverage streaming data with different arrival latency
- Achieve dynamic models through online learning
- Faster on-boarding of new team members
- Automatically promote/demote models based on KPIs
- Ability to automatically test, manage and remove models