Moving from "data centric" to "model centric" — the new operating model for Enterprises
Jan 23 2018
Every two days now we create as much information as we did from the dawn of civilization up until
2003, according to Schmidt. That’s something like five exabytes of data.
The amount of data collected today is tremendously huge and continues to grow as we try to
understand more things about our users and for the business. Today, it is entirely possible to know
complete behavior for a user — the website he logs in, the Starbucks shop he goes to, his route to
office and back home, the restaurants he visits on weekends etc. Given this amount of information
that can be captured, today’s Enterprises try to capitalize the market by looking at what will work
for users in a particular region based on the time of the year with the data available at hand.
Today’s Data-Centric world
Being inundated with so much of user data led to the invention of big- data a decade ago. This
fuelled development of systems like Hadoop, HDFS etc. completely based around calculation on large
amounts of data, in a scale of TB and above. But Hadoop calculation was still slow. Organizations
felt the need to respond quickly and near real time to users about their needs. Spark, Flink were
born out of that need which enabled extremely fast, parallel, in-memory computations. Being able to
address needs of users near real time is still a current open problem that is actively being tried
to be solved in the industry using various methods.
Let’s take an example to demonstrate how projects are carried out in the industry today. Let’s say
Chris is manager of Data Engineering team in Phil’s Coffee. He recently determined that Phil’s
Coffee shop on Market St. and 8th St. in San Francisco has a high concentration of customers between
6:00 AM and 9:00 AM in the morning. The same is the case with another Phil’s Coffee shop at Mission
and 2nd St. There is no other Coffee shop in between these two.
Looking at this pattern, he submitted an initial proposal to his management to open up a new coffee
shop somewhere in the middle. He suggested with data that this will evenly distribute the load on
the 2 existing coffee shops. The coffee shops will be able to serve more customers, shorter wait
times will attract more customers and in turn increase the revenue of the company.
The management was quite satisfied with the idea and asked him simple questions:
Where to open the new coffee shop exactly ?
Will this be an ROI opportunity ?
If ROI opportunity, how much will be the return ?
How much time it will take to start getting return ?
Chris was not afraid of the above questions at all, because he knew he would be able to answer those
questions with data and give approximate predictions to his management. He came back to desk and
simply translated the questions to: concentration of customers going to office in a region, cost of
infrastructure at various places over years, number of customers to likely go to the new cafe in the
future and number of customers added to the old existing cafes in the future.
This data allowed Chris to make informed decisions on what path to take and how aggressively to
proceed with the plans. It also helped management to take decisions based on a criteria and have
goals in mind. The management also based future goals and decisions based on outcome of
implementation of this idea.
Data-Centric is the old world
Today, big enterprises hire tons of data engineers and managers like Chris, to study data aggregated
from various sources and look at flaws in systems and where improvements can be made. Large sql
scripts are in place which run overnight to produce a dump of big data in the morning. A data
engineer goes through this dump of data and tells his manager potential improvements that can be
The big drawback of basing study on data is that the engineer is able to look at what happened in
the past day, week or month only. His study is completely manual and he has to use his judgement to
calculate areas where he can make maximum revenue in the coming days ahead, or look at areas where
he made maximum loss in the past and improve on it.
Even though the data engineer may have scripts in place to tackle problems like this on a daily
basis, manual intervention is still needed to understand what data is dictating at any point of
time. No automated systems exists that are intelligent enough to tell that this is good or bad.
Data-Centric not enough today:
The biggest advantage of being data centric is that data keeps you informed about your decisions in
the company. The big enterprises are able to use data to form an idea and implement it. They are
also able to base results on data again from the implemented idea to check output. They can compare
prediction with actual results and tell how close they are to expected output.
Same as Chris and his management team, being able to base your decisions on data is good and keeps
you informed about the performance of your product. But there is much more possible to do in the
current world based on this data. The data is so huge that it is possible to tell what is going to
happen next based on history of what has happened in the past. We can make machines do these
calculations for us which is simply referred to as "machine learning models".
Let's say that if Chris is able to input this data to a Machine Learning model to predict number of
customers coming in at a particular point of time in each store. Using this information, staff
quantity required to serve a shop can be determined a-priori. This will not create hassle, not drop
customers, instead attract even more customers to the store. This will also help Phil's Coffee Shop
determine staff capacity at any point of time quantitatively and optimize on it.
Model-Centric is the new world:
Being able to predict and operate on it is the new world we are going into in the near future. The
data is enormous to be able to make a machine learn about behavior of a user and predict simple
things for him. Machine learning can also operate on concentration of users in a particular region
for a particular product and make informed insights about number of users coming in to a store
between 8:00 and 9:00 AM or price of stadium tickets near start of match etc.
Imagine the big enterprises are in a place where their data engineers do a very smart job in terms
of predictions. Their daily routine looks like: they look at data and are able to take a look at
informed and reliable predictions from machine learning models already in place. This would help
them to keep the live streaming infrastructure in place and use appropriate models to always keep
the revenue growing.
Being able to produce more than one model in a month in a large organization is still seen as a
challenging task. Being able to operate on multiple models at the same time without worrying about
production space is still a far off vision. What if it becomes super easy to produce any model that
we want and make it production ready within couple of weeks ? What if a data engineer looks at
multiple models at the same time and fiddle with anything without fear of hurting production very
Being able to operate in model space the same way as we operate in data space will be a huge boost
to industry. It will transform the way we look at how to improve our products, what project is going
to fetch maximum revenue, prevent misadventures from occurring. It will transform the thinking from
what is currently happening to a new approach where this is going to happen and let’s define a
project to address this cause or implement a new feature with maximum benefit.
If you are interested in taking your enterprise to Model-centric, talk to us at info[AT]datatron.io