The applications of data science and software engineering in Healthcare are vast. From applying Bayesian Statistics in modeling the human genome to predicting breast cancer in images through neural networks, there is a huge opportunity for data scientists and engineers to grow and learn in this industry. But it is important to understand the various specializations within Healthcare Data Science and the opportunities that exist within them. By knowing more about your specialization and the required skill set for that domain, you are more likely to land a job and be successful. Here are the main sub-domains of Healthcare Data Science along with the skill sets in demand and companies leading innovation in that domain.
- Computational Biology (Academic Research)
This is the fastest-growing sub-domain of Healthcare Data Science, and also the hardest one. Computational Biology involves the use of computational techniques to explore and understand biology. From exploring the human genome to predicting early-stage cancer from RNA Data, a lot of computational biology research is centered around Molecular Biology and predicting patterns at the molecular level. The most in-demand skills in this industry include Python, R (and RShiny), Bash/ Linux, and Perl (although Perl is slowly dying) along with an in-depth understanding of Bayesian Statistics and bioinformatics tools such as samtools/ bamtools, etc.
The organizations innovating in this field are mostly labs that are associated with universities. UCSF, UC Berkeley, MIT, and Washington University in St. Louis are leading institutions in Computational Biology Research.
- Computational Biology (Industrial Research)
The industrial research side of Computational Biology is focussed on research on therapeutic applications and pharmaceuticals. This industry involves an extensive application of Machine Learning, Statistical Techniques, and AI to perform pre-clinical trials experiments (genomic experiments) to explore therapies for cancer and other important life-threatening diseases. The most in-demand skills in this industry include Python, R, Bash/ Linux along with extensive experience in Machine Learning, AI, and Biological Data.
The organizations leading this domain are bigger pharmaceutical companies such as Novartis, Amgen, Gilead Sciences, etc. An interesting trend that has emerged in the last decade is that companies are now acquiring fast-growing research startups rather than innovating in-house. Building an innovative lab inside a big company is very hard since things are slower and departments are specialized. This trend has also given a huge boost to the credibility of biotech startups which were seen as high-risk investments because the success of biotech startups is dependent on the discovery of a successful therapy, which may take 2-3 years after the startup is formed.
- Clinical trials and field Biology
The clinical trials domain of Healthcare Data Science is a very heavily regulated domain because it involves human lives. This field involves testing of therapies on humans, data collection in fields, analysis, and causal inference. While Clinical Trials feed into the Industrial Research division, I have written about it separately because of the special skill set required in this division. You need to be an expert in causal inference, statistics, data collection, and management. The most important skill set is Python, R, and SAS. Being in the clinical trials division is incredibly rewarding because of the opportunity to interact with the people who will be most impacted by your work.
While pharmaceutical companies lead their clinical trials, there are other smaller and medium-sized organizations leading their clinical trials or helping companies with field operations.
- Data Science in clinics and hospitals
This domain is one of the most underdeveloped domains of Healthcare Data Science although the development and application of data science are happening rapidly. It involves the use of data science and engineering to accelerate analytical operations in clinics and hospitals. From predicting abnormalities in heart-beat data to analyzing breast cancer MRI Scans for signs of tumor, there is a huge opportunity for data scientists to automate the process of analyzing images and data to predict clinical outcomes.
But it is also important to understand that you can’t just automate these workflows easily. The incredible amounts of variance in biological data makes it very difficult to build models that have an acceptable amount of accuracy. In Biology, the acceptable amount of accuracy is way above any other industry because of the cost of the mistakes. In a ML-powered model that predicts loan decisions, a mistake would cause you to not get a loan (which is not nice but it wouldn’t take your life). A mistake in a hospital-based model can cause a loss of life. Thus, the current models in hospitals and clinics are paired with pathologists for a second round of verification before the results are released. Thus, one must be computationally skilled and have an incredible amount of knowledge and patience to succeed in this domain.
While the research-based domains of Healthcare Data Science are less regulated (which allow for better exploration of biological data), the regulations increase as the risk of human lives increases. Another important aspect of Healthcare Data Science is that one must be very knowledgeable to successfully build models. Unlike Finance that was built by humans, Biology was not built by humans and it’s something that we’re exploring every day. We don’t know everything and thus, you must know when you’ve hit a black hole in your research and development process. Otherwise, you are likely to classify outliers and mistakes in your model as ‘noise’ when it’s an important side-effect that must be taken into account.
So what’s the best way to get involved? – The best way to get involved with an opportunity to learn and grow is to join a growing research-focused biotech startup. With the COVID-19 pandemic, labs like Color Genomics are having a hiring spree. With strict healthcare regulations, it can be very hard to build products that comply with FDA regulations. At Datatron, Model accountability and governance is our prime concern. We’ve built an extensive set of features in our platform to help our clients meet the standardized regulations and keep their data science models accountable, such as compliance with the 21 CFR Part 11. Reach out to us for more information!
Thank you for reading! Connect with me to learn more about Healthcare Data Science.