BOSTON -- Speaking yesterday at the Bio-IT World Conference here, Jill Mesirov, associate director and chief informatics officer at the Broad Institute of MIT and Harvard, addressed trends and technical challenges in genomics in a keynote presentation.
Mesirov argued that genomics remains a field rife with accessibility hurdles -- chiefly because of the unwieldiness of the data.
To understand Mesirov's point calls for a bit of contextual information: Since the completion of the Human Genome Project (HGP) in 2003, private companies have been intent on unlocking the mysteries of DNA through mapping -- or "sequencing" -- the thousands of genes that make up the human genome. The HGP's efforts have led to important discoveries in the quest for cures of diseases such as cancer.
Jill Mesirov. Photo credit: Maria Nemchuk, Broad Institute
The Internet has been integral in helping medical science progress toward these goals. A couple of years ago, a company named 23andMe (the name is a reference to the 23 chromosomes on human mitochondria) crowdsourced research efforts by collecting self-reported biological trait information from more than 10,000 participants over the Internet. The information has helped researchers find associations among human traits, such as between eye color and hair color.
Thanks to the proliferation of data like this, it's never been cheaper to sequence the human genome. In the early days of the HGP, it was estimated that a single human genome sequencing would cost about $3 billion (or, to put that figure in perspective, 3 Instagrams). It wound up costing only about 10 percent of that amount. In 2007, the cost to sequence the human genome fell to a mere $1 million. Today, a human genome can be mapped for about $5,000 -- and the cost is fast approaching three figures.
Accordingly, "More and more types of data are being acquired by sequencing rather than other platforms," reports Mesirov. What's more, the data is higher quality, too, containing much less "noise."
The abundance of data, together with the growth of computation and networking, have made it possible to integrate the work of various labs and research projects. To drive the point home, Mesirov highlighted several important genomic discoveries in cancer research and other major areas of biomedicine made possible through integrative studies over the past seven years.
To support this work, however, Mesirov says that research biologists require better data management and better data identification capabilities -- including better visualization for large, integrated data sets.
Moreover, says Mesirov, "The workflows and algorithms are becoming much more complex... which means we're making greater demands on computing power."
Mesirov estimates that there are between 7,000 and 10,000 bioinformatics tools available for download on the Web, along with more than 5,000 data repositories. Getting these tools and these data to work together has proven difficult, overwhelming biologists -- especially, as Mesirov notes, biologists who aren't programmers.
To reduce the data complexity and inaccessibility research biologists have faced, the Broad Institute has stepped into the bioinformatics space with a "cooperative" solution -- GenomeSpace.
GenomeSpace is a cloud-based, open-source data management center that offers what Mesirov calls "a lightweight layer of interoperability." GenomeSpace supports several bioinformatics tools, all integrated to allow easy accessibility, easy conversion, and frictionless sharing. The Broad Institute's goal with GenomeSpace is to make the newest bioinformatics tools and most modern data management and identification methods available "to any working biologist."
Mesirov is careful to emphasize that GenomeSpace is not a monolith. Whereas a single "megatool" would offer limited flexibility as new methods are developed, GenomeSpace allows the tools it supports to maintain their unique identities. On GenomeSpace, data management tools look the same and feel the same as if you were using them directly -- except with the interoperability benefits of a cloud-based infrastructure.
Nine years ago this month, researchers completed the first human genome sequence, heralding a new era of biomedical research. As integrative data tools like GenomeSpace catch on, Mesirov predicts that in less than 10 years from now, biomedicine will enter a renaissance of accessibility.
--Joe Stanganelli is a writer, attorney, and communications consultant. He is also principal and founding attorney of Beacon Hill Law in Boston. Follow him on Twitter at @JoeStanganelli.