NEW YORK (GenomeWeb) – Researchers from Johns Hopkins University have launched a new specialization on the online education platform Coursera that will teach basic computational and statistical skills for analyzing and exploring genomic data from high-throughput biological experiments.
The newly-minted Genomic Data Science specialization features six non-credit courses that provide a coherent introduction to some common tools of the genomic data analysis trade and teach pupils to use these to answer biological questions and perform some basic analysis of genomic data, course instructors Jeff Leek and Kasper Hansen told GenomeWeb.
It is not a replacement for traditional degree programs in computational biology, bioinformatics, or biostatistics, they said. Rather its envisioned more as a way of helping newcomers to the high-throughput biological data domain get up to speed on basic data handling and analysis within a relatively short time frame, they said.
Course instruction covers the use of some of the most popular and widely used software packages for genomic data analysis including Bioconductor and Galaxy. The course also includes a general introduction to concepts and applications of genomics technologies, provides some command line training, and teaches some basic statistics. In addition to Leek and Hansen, courses will be taught by other instructors from Johns Hopkins' School of Public Health, the McKusick-Nathans Institute of Genetic Medicine, and the Center for Computational Biology, among other departments.
Each course is about four weeks long and will run every month starting from June 1. Students will be told during class which open source software packages to install and where to get them. A seventh capstone course at the end of the sequence will challenge students with a more comprehensive project and offer them a chance to apply the lessons and skills gleaned over the length of the course. Much like in a regular class setting, students will be tested with weekly quizzes and will be required to participate in a small project at the close of each course, besides the capstone, to gauge their knowledge and determine whether they'll pass the course or not — it's not graded in the traditional sense but students do have to achieve a minimum score to obtain their certificates.
Although technically anyone who wants to can take the course, the ideal target students are wet lab biologists who are starting to use technologies such as next-generation sequencing in their projects or individuals from more of a straight computational background who are interested in genomics and looking for a relatively painless and flexible way to get their feet wet, said Hansen, who is an assistant professor of biostatistics and genetic medicine at JHU. The sequence would also be a good complement for students who are working towards primary or postdoctoral degrees in biology, molecular biology, or genetics and need some computational help with their projects or are just looking to supplement their training, he added.
Completing the course could also provide a resume boost. "Knowing how to do some of those things on the command line or knowing a little bit of python or R or Bioconductor is a very marketable skill these days because of all the data being generated by both industry and academics," Leek noted. "It would be a useful additional credential for, say, a wet lab person to have when they are looking for a job or a useful set of skills for them to have if they are in a job that deals with computational biology or with analysis of sequencing data, and so forth."
Classes in the genomic data specialization don't necessarily build on each other so students don't have to take them in order or even take them one at a time, however there are some "loose dependencies" between some courses, Hansen noted. Each class will be offered multiple times throughout the year so students can take them in their own time and at their own pace. The first two courses in the sequence — introduction to genomic technologies and genomic data science with Galaxy — will start on June 1. The next two courses will roll out on July 6 and sessions for the final two will start on August 3. Students who successfully complete all of the courses in the specialization receive a verified certificate for their efforts.
The cost for the first course is $49 and subsequent courses and the capstone are priced at $99 for a total of $643 for the certificate. Students can pay for each class individually as they register or they can pay the entire cost up front. It's also possible to take the entire sequence for free but such students don't receive certificates nor are they eligible to participate in the capstone project.
Aside from a working computer and internet access, it is helpful for prospective students to have some programming experience and a working knowledge of math and some biology. However, if experience from the data science specialization is anything to go by, individuals with little or no experience could still successfully complete course though they would probably have to work harder than students with a little more experience under their belts, the instructors said. However, "the goal is to make it as accessible as we possibly can [and] to hit the largest group of people that would be interested in this and get them up to speed," Leek said.
This is the second data-centric massive open online course (MOOC) to come from the university. Last year, Leek, an associate professor of biostatistics at JHU, and two colleagues from JHU's biostatistics department launched a data science specialization that introduces students to concepts and tools in the data science domain and trains them to use these tools to ask questions, make inferences, and generate results.
Both course sequences share some DNA in that they are similarly structured, students learn to use some of the same tools, and both have similar recommendations in terms of initial experience. However, the data science sequence is longer — nine courses in total and a separate capstone — and has a much broader scope than its genomics counterpart including focused courses on areas such as machine learning and regression modeling. Both specializations are complementary and completing the two would be highly advantageous so ambitious students are certainly welcome to try.
Since its launch, the data science MOOC has grown to be one of the most successful and financially lucrative specialization sequences in the world, drawing record enrollment numbers and also running more frequently than any other sequence of courses on the site. Leek told GenomeWeb that about two million students worldwide have enrolled in the course and hundreds of thousands have completed classes.
At the end of the month, when the next capstone class wraps, about 2,000 people will have completed all 10 classes in the sequence. For comparison, the number of students that will have completed the sequence at the end of the month is much larger than the number of students currently enrolled in the largest data science program and probably more than the number of students enrolled in any single program in a university.
"It's been outrageously successful, something we did not anticipate at all when we built it," Leek said. The new genomics data science specialization is an opportunity to take advantage of lessons learned from the first outing and move into a new area with huge growth potential, he said. "It's very exciting because it gives us an opportunity to scale education in a way that you couldn't do without the [Coursera] technology."
Furthermore, given the current and anticipated impact of genomic research and precision medicine, training people to understand and make sense of biological data is going to become increasingly important and MOOCs are one way to equip people with the requisite skills, Hansen added. Many people in academia are at least thinking about MOOCs but very few are actually doing them, he said. "If you are excited about having an impact on the practice of genomics, this is an incredible opportunity."