An 'executive summary' of BioinfoTools aims
Answers to frequently asked questions about BioinfoTools
Read Dr. Jacobs' occasional opinion piece
Learn what BioinfoTools is developing
Links, tutorials & free software
Getting in touch with BioinfoTools
Copyrights & liabilities
Copyright © 2001, 2002 BioinfoTools.com
Ruminate /'ru:mineit/ v. [L ruminatus pa. ppl stem of ruminari, -are, f. as RUMEN: see -ATE.] 1. v.t. Turn over in the mind; mediate deeply upon. MI6 2. v.i. Chew the cud. MI6 [etc.]
Taken from The New Shorter Oxford English Dictionary (1993).
Articles at other sitesAmerican Museum of Natural History "The Genomic Revolution" exhibit
Interview with author's of O'Reilly's 'Developing Bioinformatics Computer Skills ' book.
This occasional column is inspired by the "Ask Tim" column of publisher Tim O'Reilly of O'Reilly books fame. Whereas Tim's column is primarily in response to letters he has received, items here may be inspired by nothing much at all. This column will be occasional -- that is, whenever I (Grant Jacobs) find time and inspiration (whatever that is) to put down some of the stray thoughts in my mind. My ruminations are unlikely to bear the full weight of the definition to the left -- they'll be more light-hearted. But hopefully, they'll raise issues worth mulling over.
You're welcome to write in. Please don't be offended if you don't get a reply promptly, or even no reply at all, as I am likely to occupied with other matters. But I will read everything posted. Posts that raise interesting issues may get a reply via this column. Posts should be sent to firstname.lastname@example.org.
Topics will span a wide range of areas, such as computational biology, ethical issues in research, computing, computer hardware, business practices and possibly other areas of science (e.g.. cognitive neuroscience). Don't feel shy if your topic or question seems unusual or irregular. If anything, topics which are different enough to jar me out of my train of thought and make me say "hey, now thats a thought!" are more likely to receive attention.
Copyright Grant Jacobs, BioinfoTools (2002-)
Bioinformatics is a much hyped, mythologised discipline. It isn't that people are actively trying to mythologize it, nor that everyone has these views. Its just that as many new people join in from technological, management and business backgrounds, the view of bioinformatics they appear to have differ from its original foundation in theoretical or 'first principles' biology. Even some biological researchers seem to share this view.
To see past these myths one needs to peer into Bioinformatics' past and view its progress since its beginnings. I'd like to explore how much of the current view of bioinformatics differs from the actual origins of the field and how this might affect bioinformatics in the immediate future. In some ways this rumination might be better titled "Is technology taking over in bioinformatics (at the expense of theoretical biology)"?
Having been trained by one of the early bioinformatics scientists (bioinformaticians?) and having studied in the area for around 10 years now, I believe I have a fairly useful perspective on where bioinformatics came from, roughly how it has progressed and, from this, a perspective of where it might be headed next.
Myth 1: Bioinformatics has arisen in the last 5-10 years
"Bioinformatics is a new science, which arose in the last 5 to 10 years or so". We've all heard phrases along these lines at seminar and conference talk introductions and by groups trying to persuade their powers-that-be to fund them. True? If not, how does is it likely to affect bioinformatics immediate future?
Let's break this statement into two parts: the "newness" and the exact age. If you view science on the grand scale of hundreds or thousands of years, bioinformatics could hardly be anything but new. However, if you compare its age with the sciences it partners, particularly molecular biology itself only a few decades old, you might be surprised to find its been around for a fair while. Bioinformatics "in science" (but not in name) began in the late 1960's early 1970's, which we'll look at in more detail below.
On a more personal note, I feel remarks proporting bioinformatics to have very recent origins must be rather galling to the pioneers in the field, most of who have worked in bioinformatics all their careers and have since retired (at least officially!). I feel these remarks are a reflection of the hype over the last 5-10 years. We ought to give the early workers the credit they deserve and understand better where bioinformatics has come from so that we might better understand what we are doing.
Depending on where you draw the line, bioinformatics has been around since the late 1960s early 1970s and certainly was established by the 1980s. I can hardly claim to know of all the early workers, but below I list enough to satisfy sceptics that the field was in fact active. Don't feel offended if your favourite star is missing; this list would be very long if I included everyone! Early researchers of the late 1960s - early 1970s era include Margaret Dayhoff, Russell Doolittle, George Rose, Michael Levitt, and Andrew McLachlan (I must admit my bias here: Andrew was my Ph.D. supervisor). Somewhat later contributors from the 1970s onwards include Joe Felenstein (phylogenetics), Michael Waterman (sequence analysis algorithm development), Temple Smith (sequence analysis methods), Cyrus Chothia (analysis of protein sequences and structures), Drs. Chou, Fasman and Robson (of secondary structure prediction fame), Walter Fitch (RNA structure prediction), V. I. Lim (organization of protein structures and secondary structure prediction), Needleman and Wunsch (sequence comparison and searching), Roger Staden (sequence analysis), David & Jane Richardson (protein structure). And on the list goes...
The first sequence database is surprisingly old. Margaret Dayhoff founded the Protein Identification Resource in the late 1960s. This far-sighted move was the first of the sequence databases. Initially it was published in printed paper form as the famous blue-covered "Atlas", it later evolved into the PIR sequence database. With her colleagues she detected early examples of conserved protein sequence motifs.
There are bioinformatics text books over ten years old whose bibliographies are testimony to the busy activity of bioinformatics research in the 1980s. Sitting on my shelves are well-worn copies of Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit (von Heijne, 1987) and Nucleic acid and protein sequence analysis: a practical approach (ed. Bishop & Rawlings, 1987).
The journal CABIOS (Computer Applications in the Biological Sciences), which has since become Bioinfomatics, has been around since 1985. For those interesting in finding early papers, much of the early literature in bioinformatics was published in JMB (Journal of Molecular Biology), NAR (Nucleic Acids Research) and to a lesser extent PNAS USA (Proceeding National Academy of Sciences USA) which are still strong publishers of this field. (Or at any rate, these are the journals I taught myself Bioinformatics from.)
As an aside, recently I came across a paper in Science in 1986 by R Lewin entitled "The DNA databases are swamped"! Those of you who are familiar with the rapid increase in volume in these database in the late 1990s will find claim this quite amusing. If only Dr. Lewin could have forseen the genome projects!
So, it is clear that bioinformatics itself has not been "created" over the last 10 years. So what has been created over the past 10 years? I'd suggest:
Myth 2: Bioinformatics is biology + computing
A linguist might argue that bioinformatics is, strictly speaking, a field which restricts the application of informatics to biology. Informatics in its narrowest sense is about manipulation of data, without necessarily understanding the meaning of that data being manipulated; its about the computer methods used to manipulate the data, not the data themselves. Under this definition, Bioinformatics would be primarily about bringing existing or modified variants of existing informatics methods to biological applications. These methods in many (most?) cases do not in themselves have much, or indeed any, knowledge of biological principles.
By contrast, early bioinformatics work was almost invariably founded on biological concepts from the onset. A biological issue was raised and then a technique to address that issue was presented. That is, theoretical biology was the foundation on which bioinformatics was built. I fear this is being lost in the mass-data and technology-hype driven bioinformatics. It seems to me that unless companies and research groups are careful many will waste time and money "stamp collecting and cataloging". Certainly the organized data is useful, but only if it is applied with biological principles.
I am not saying that new technology is not useful it is but that it is not the whole picture. I am also not saying that all bioinformatics has stopped having a biological focus far from it! but newer-comers seem to see mainly the new hyped large-data and technologies issues. They appear not to see the less trendy theoretical biology-based work as being relevant to them.
One reason for this "first principles" orientation of early bioinformatics is that many of early biologists were emigres from "hard" disciplines, in particular physics and chemistry, along with a few mathematicians. These folk were used to fields with underlying layers of principles upon which further work could be built. In addition, molecular biology was still trying to establish the basic understanding of itself, as it were, which encouraged bioinformatics to do assist this venture.
Since then we have seen emerge a whole generation of molecular biologists who were, on the whole, comparatively ignorant of the theoretical (bio)chemical and (bio)physical underpinnings of molecular biology (compared to early researchers, who were frequently "true" chemists or physicists in they own right). This has lead to modern biology until fairly recently being guilty to some extent of not deriving further underlying principles from the data generated from experimental studies. The large amount of data being generated at present would, I'd like to think, bring us back to the need to raise the level of "first principles" understanding of that data.
Some might argue that this early style of work is better labelled "computational biology", a term I favour myself (biology, using computers as the tool as opposed to general informatics on data which happens to be biological). While perhaps elegant, pidgeon-holing like this would only serve to further divorce what I believe ought to be the underlying layer of all bioinformatics ventures.
Theoretical biology is (or should be) the language of communication amongst the players in bioinformatics teams. And certainly at least the group leaders should have a theoretical biology foundation to ensure that real biological science results at the end of the day.
I imagine bioinformatics as being a bridge, with biology on one side, computing, statistics, etc. (as the toolkits) on the other upheld by theoretical biology acting as a bridge pile enabling communication between the two sides. Without the pile, the bridge has a rather long single span and is liable to collapse.
By omitting theoretical biology and retaining just Biology + Computing (or statistics or whatever it might be), one is asking for a superman-like leap with a single bound to be taken. That somehow the "tool component" (the computing, etc.) is supposed to magically wave its wand and suddenly solve previously difficult biological problems. I have serious trouble with this idea. The problems are biological problems after all: no amount of clever computing is going to remove the biology unless there are biological principles behind it. More than just databases and high-powered computers are needed.
Put another way, all fields have their underlying disciplines: I worry that with all the focus on technology, bioinformatics is in danger of forgetting that its underlying layer is theoretical biology. It doesn't sound as trendy as bioinformatics, but it is essential. Chemists and physicists rarely ignore their theoretical components; they look to them for answers.
Most biologists, group leaders, managers and CEOs seem to swallow the hyped technology-based bioinformatics with ease. Gloop. Down it goes. I wonder how many see that theoretical biology lies under most (all?) good bioinformatics?
Part of the problem no doubt lies with the over-exercise of the point that there is large amounts of data and that this needs new methods. While this may be true (to an extent - other sciences have far worse data problems), this ought not to be done at the expense of discarding the underlying theoretical biology.
As this rumination has already gotten on beyond a reasonable length, I'll explore in another article how bioinformatics workers, developers, teachers or users, can help themselves by attempting to explain their work in purely biological terms. if you can't do this, you very likely do not know what you are doing and may well be doing something entirely inappropriate!
Copyright © BioinfoTools 2001, 2002. Last revised 12-Dec-02 12:27 AM (v3b).
If you have any problems with, or comments about, this website please read the Webmaster's page.
If you continue to have problems, email: email@example.com
This web site has been made & verified using BBEdit 6.5.2. Developers may find some tips & discussion on the Webmaster's page.