Written by Jodian Brown, Ph.D., Computational Chemistry, IRTA Postdoctoral Fellow OD/OIR/OITE, National Institutes of Health
Data science – it is a field of study that has exploded over the past few years. Consequently, there is a lot of interest from our trainees. To provide tangible insights into strategies trainees can undertake to transition in this field, the Office of Intramural Training and Education (OITE) recently hosted a panel workshop on Careers in Data Science and Computational Biology.
To some the field of data science may seem new, yet, a core group of scientists may oppose that notion. This core group includes, but is not limited to, computational biologists/chemists, bioinformaticians and even geographers. These professions have been harnessing computational approaches and power to make sense of scientifically-relevant data for decades. However, the exponential rise in smart technology (such as smart phones and smart cars) has been linked to a significant surge in the need for persons that can use computational approaches and power to efficiently use and analyze large amounts of all types of data. And from this need is born the term data scientist. Harvard Business Review dedicated an article centered on the role of this job in the 21st century.
A tangible percentage of this rise in generated data can be attributed to biomedically-relevant sources. Over the past two decades, advances in scientific tools and techniques (e.g. high-performance computer clusters, molecular structure elucidation, and genomic sequencing) have drastically increased the data and knowledge within the biomedical enterprise. Thus, at this juncture we need scientists who can integrate their scientific background and interests with computational tools and approaches to tackle these vast data.
What skills should you consider developing if you are interested in pursuing a data science career? During the Careers in Data Science and Computational Biology workshop four main pillars important to this transition in data science were identified as:
- ability to understand and employ mathematical and statistical approaches
- programming ability
- at a minimum, peripheral knowledge of computer architecture
- ability to effectively communicate your work
Translation of the above pillars into practical approaches may include taking mathematics/statistics courses (e.g. machine learning or deep networking) as well as learning programming languages such as Python or R. The next step after improving your mathematics and computer language knowledge is to find a project of interest that ideally is related to your research and use available computer resources to execute project. Often, learning about computer architecture may occur on the fly but it is strongly recommended that you commit to understanding the basics. Various computer platform, analyses and visualization software are freely available (watch video of workshop for some suggestions). Here at the NIH there is a number of resources that you may access. The NIH’s High-Performance Computing (HPC) team offers several free classes (which provides introductions to supercomputing in science and Python) as well as maintaining the HPC cluster that some trainees may access with the appropriate project and proper approval from supervisor (Note: PIs pay for such use). The NIH Data Science Mentoring program accepts applications from NIH individuals who want to mentor or be mentored in data science. For more information on this mentoring program (which needs mentors) you can contact Ms. Lisa Federer via her email email@example.com or Dr. Ben Busby at firstname.lastname@example.org. Dr. Ben Busby is also a great resource for those with proficient programming abilities who would like to apply them to hackathon projects.
A noteworthy caveat is that skills listed above may be easier to acquire if you are earlier in your career (e.g., postbac and graduate student) as you may possess more time and/or flexibility with regards to your research responsibilities. In contrast, senior trainees such as postdocs and research fellows may have more time constraints and project responsibilities. Nonetheless, if you are a senior trainee or employee it may be amenable to construct data science projects that are directly related to your research. Furthermore, the application of user-friendly computer software is highly recommended if an extensive programming background is not present.
The landscape of data science is broad and the depth of skills involved will depend on the subspecialty. A researcher in data science may be responsible for generating, extracting, analyzing and/or visualizing data and/or developing the tools to do so. Most data scientist positions will often rely on more than one of these subtasks. Thus, as you begin to explore and acquire skills in the field of data science, you can determine your preference of being on the side that makes “biomedical sense” of the data or that develops the tools or both.
The panelists at the OITE-hosted Careers in Data Science and Computational Biology workshop provided great insights into the advantages of having data science skills whether you are interested in an academic or non-academic career. In addition, specific tools that trainees can assess and use to improve their data science skills were highlighted during the panel discussion. A video recording of this workshop can been found here.
Finally, remember that there are other resources, including career counselors who are happy to talk with you about career exploration, are available here at the OITE. You can schedule an appointment with one of our career counselors via visiting this link.