Working Towards Open Science

We live in an interesting time in Science. In the past, the conducting of science required expensive equipment, and immense amounts of time which made the ability to conduct scientific inquiry wholly out of reach to anyone outside of Academia. Several things have changed in society over the last twenty five years as technology has grown which have led us to the precipice of a fundamental change in the way that scientific inquiry is conducted.

First, the people have access to far more information than we’ve ever imagined before. This is due to the Internet, and Internet-based movements like Open Courseware, which makes the ability to learn far easier, not to mention cheaper. Recently, the California University System sought to use Open Courseware to reduce the cost of Education, citing the high costs of textbooks. As someone who has, in some way or another, been tied to a University for the last seven years, I can certainly agree with that sentiment, but more importantly, it makes the information used in gaining a college education available to everyone. Is it a replacement for a College Education? Not likely, since much of the benefits to a college education are inherent in the working with peers, and with the professors, but for some people it’s enough to help.

This has also been met with the recent movement towards opening up Scientific Journals. However, on this point, we still have a long way to go. A journal that my wife’s advisor has been published in on several occasions, Molecular Biology and Evolution, costs around US$141 per year to subscribe as an individual (US$678 as an Institution), less if your a student or a postdoctoral researcher. Frankly, this is cheap. Very cheap. Another Journal where he has been Published, the Journal of Morphology is only available to Institutions, and costs a blistering US$5533 per year to subscribe.

Still, there has been movement here. Both the journals mentioned above allow for articles to be made downloadable via the Internet for free, and it seems that the MBE journals subscription is more to cover membership dues into the organization which publishes it. Things are changing where many grants which are based on public funds (ie, grants for the National Science Foudnation, then the papers resulting from the grant must be made freely available to the public. This is fantastic. In my opinion, Science needs to be conducted freely, and out in the open. First, because then it benefits the most people, and second, openly conducted science is the best mechanism to further drive scientific development. And while many scientists do interesting trade in publication credit and such, ultimately, I believe that most scientists agree. However, if you’re not lucky enough to be affiliated with a university, there is going to be a large percentage of journals which you simply can’t read, because historically, the cost of membership to a journal has been too high. And the scientists do not even see any of the money from the publication of their materials.

We have more access to the data, and to the results of science than ever before, but there is more to the development of an open scientific infrastructure than simply the papers that result from scientific inquiry. The next step is Open Data. Luckily we’ve come a very long way in this respect as well, at least when it comes to research on Genetics. The National Center for Biotechnology Information offers a convenient place for scientists to upload genetic sequences they’ve made in order to allow others to carry on work with that sequence information. GenBank, NCBI’s sequence database, contains tens of millions of sequence records from species ranging from humans to hagfish, porpoises to platypii.

Why do projects like GenBank exist? Well, first, some scientists receive grants to sequence an animals genome. The methods they use for this are generally imprecise, and there is a lot of what’s known as “Shotgunning” involved, meaning that they throw enzymes at DNA and see what sticks. Some genes are easier to extract than others, and depending on the perceived value of the gene, the desire of a scientist to extract it changes. For instance, in Catherine’s lab, they feel that the 18S and the 28S ribosomal genes are particularly valuable for deep historical phylogenetics, and they’ve got some data to back that up. For that reason, the lab has developed protocols to extract those two genes in their entirety, something that many others do not feel is worthwhile. It probably doesn’t help that the protocols need to be modified depending on the species being extracted, and there are a limited number of researches using these genes. Genetic research is still very much changing, and I lack the knowledge of biochemistry to speak any more to the difficulties inherent in the practice. The point is that the people doing the sequencing may not get everything, and they make their data available so that others can do the analysis.

So, with the data being made available, the need for complex lab equipment to perform certain types of analysis on genetic material has been greatly lessened. Certainly if you can do your own sequencing you have an advantage, but it’s no longer necessary. I suspect that many other fields have similar open data initiatives, but genetics happens to be a field today where the sheer quantity of data being produced, and needing to be produced is mind-boggling, so it makes a particularly good example.

This takes us to our third level of what’s required for conduct open science. First, we had the Open Knowledge. Then, we had Open Data. All that’s left now is Open Tools. It just so happens that we already have mechanisms in place to supply this, in the work done by the Free Software Foundation and the Open Source Initiative. In the field of Statistical Biology, there are a large number of software tools that are used by most researchers in this field, and save for one or two notable exceptions, this software is all Open Source, much of it copyleft.

With this, we now have all we need to do real science. We have the ability to learn. We have the data we need. We have the exact same tools used by the academics themselves. Science is doable by the layperson, in a way that it has not been before.

This is not to say that Academia is without merit. Most scientific inquiry will still be done in Academia. The scientific inquiry done in privately-held corporations will rarely be released to enrich all society. The funding necessary for certain types of inquiry will always be easier to get in academia. But, for those people who are interested, who have an itch to scratch, they can do their own research. On their own time. And they have the power to discover amazing things. Academia will always be the heart and soul of science. I never see that changing, but the laypeople need to be able to benefit from, and contribute to, science. There are still battles to be fought, regarding truly free access to research and such, but we’ve come far, and I don’t see the movement toward Open Science slowing down any time soon.

We can still hasten it’s coming though. Make sure that research done using public funds must remain available to the public. Not only the papers published as a result, but the data generated for the research. We should do what we can to make it not only easier, but also more valuable to participate in this Open Scientific Community. It’s for all our benefit.