Hello. This is why.
Hello. This is why.
Finally, after 98345 tweets, the entire Saccharomyces cerevisiae genome is on Twitter.
S. cerevisiae is a yeast commonly used in brewing, wine-making, baking, and biological research in the lab. Now we have virus (HIV), prokaryote (Escherichia coli), and eukaryote (S. cerevisiae) genomes on Twitter.
HIV – 70 tweets
E. coli – 34767 tweets
Yeast – 98345 tweets
People who use Twitter can appreciate how much data can be stored in a tweet. A single tweet has a limit of 140 characters. When I see that tweeting the E. coli genome took 34767 tweets, I understand that it took me about 4 years to tweet that many times. I can visualise and appreciate this better than learning “a genome would fill a few squintillion books”. What books? How big are the books? How thick? How many pages? I can’t visualise it because I’ve never seen that many books at once. Actually, is squintillion a thing? But I can compare these genomes with my own Twitter output and appreciate the size a bit better. It’s not perfect, but hey I was just bored one evening and thought tweeting genomes could be interesting.
I’m now taking a break from tweeting genomes, at least for a while, so my Raspberry Pi is free to be used as a brain by Deckard The Robot. He’ll be coming to life over the next few months.
I’ve been having a discussion where the term “homology” was thrown around very loosely by someone who should probably know better. Considering deep homology and gene co-option are among my favourite topics in biology, I obviously could feel a rant coming on. Rants are often transformed into blogs these days. Those who have at least heard of the word might think it has something do with things being similar. Those with an interest in biology, especially evolutionary biology, probably realise it’s more complicated than superficial similarity. Sadly, I see the word used in various ways all the time and thought I’d share some thoughts. The word “homology” has been used for over 150 years and has meant different things to different biologists, with similarity of characters being the common theme. Most modern biologists use the word to refer to similarity that is due to common ancestry: two characters are homologous if they are derived from the same ancestral character in their most recent common ancestor. Any characters that exist in related lineages can be assessed for homology, including genes, chromosomes, genomes, cells, limbs, regions of the brain, behaviours, and the developmental programmes that result in these characters. Ancestors are rarely available for examination so homology is usually an evolutionary hypothesis rather than a direct observation.
Even among biologists, the correct definition of homology is still occasionally an issue. Some researchers write of “functional homology” when describing similar functions of traits. Some examples of supposed “functional homology” will be truly homologous in the sense of common ancestry, while others will be non-homologous but convergently similar. In some of the literature, homology still refers to characters that are merely similar, regardless of ancestry. If homology is defined by similarities, there may be a gradient of homology for any given character. Some characters are presumably more similar than others. If they are a slightly similar, are they only slightly homologous? If they are very similar, are they very homologous? Where do we draw the line? When do two similar traits become similar enough to warrant being described as homologous? This subjective issue is avoided entirely when common ancestry is used to define homology. We may not know for certain if two characters are homologous, but they either are or aren’t. This approach makes the concept of homology simpler to define and have researchers agree upon, but requires rigorous investigation to determine if homology exists in any given character.
Understanding ancestral relationships in any kind of comparative biology usually involves recognising the differences between homology and homoplasy (as in the figure above). Homology is similarity because of common descent and ancestry. Homoplasy is similarity because of independent convergent evolution. Definitions must be clear. Some related characters are orthologues, arising from lineages splitting and diverging. Others are paralogues, arising from gene duplication. Some genes can also be xenologues if they have arisen from horizontal gene transfer. Understanding homology is essential in comparative biology because of the practical applications of such knowledge. Homology can be used in constructing character matrices for phylogenetic analyses. Also, finding functionally equivalent orthologues of human genes in model organisms has an important role in medical research. A geneticist studying fly orthologues of our genes needs to be sure that he/she has the correct homologue. The same can be said for a medical researcher studying human orthologues in mice that may influence the likelihood of getting cancer or Alzheimer’s disease. It is vital that evolutionary biologists understand what is truly homologous.
One level of homology is that of genes. When genes are replicated, their daughters can undergo independent evolutionary change much like individual organisms can. Phylogenetic analysis is as possible on individual genes as it is on species. Because genes can replicate, either within the same genome (paralogy) or because of a speciation event (orthology), divergent genes can evolve independently but they are homologous due to their common ancestry. Homology doesn’t only occur at the level of genes. Over generations, phenotypes can change considerably. Morphological characters in different species are homologous if they arose from an ancestral state. They may be highly derived and superficially unrecognisable as homologues, they may even have novel functions, but the modern definition of homology is concerned with their relationship with one another rather than superficial or functional similarity. After agreeing on the concept of homology by common ancestry, it’s a relatively simple concept to understand when considering a single level, e.g. a morphological character or a gene. Homology is simply the continuation of characters. The complications arise when the genetic and morphological levels of homology are integrated. Developmental genetics involves understanding the relationship between morphological characters and their genetic basis.
The modern evolutionary synthesis reconciled genetics and the evolution of morphology (and other phenotypic traits such as physiology, behaviour etc) by natural selection. But before the influence of modern evo-devo, developmental was relatively poorly understood compared to traditional genetics and was seen as a black box that transforms the genetic information into three-dimensional, morphological structures. In the last two decades, evo-devo has replaced this black box development with an appreciation of the mechanisms responsible for generating morphological structures from genetic information. How genes are used in development is as important as what genes are available, and lineage-specific differences can come about due to changes in spatial or temporal expression of genes as well as by the evolution of the genes themselves. Development is complex, often involving many genes influencing the expression of each other, and highlights important information about homology. Developmental mechanisms may be conserved even if complete structures don’t form in some species (rudiments and vestiges) and can differ even for structures that are homologous. This suggests that there is a third level to consider, between genes and morphology (or other characters of the phenotype). Can entire gene regulatory networks be homologous? Does this have implications for the relationship between genes and morphology? How can we identify true homologues if there is a disassociation between the genotype and the phenotype? These are questions I find fascinating.
Disassociation between genotype and phenotype
Wagner argued that homology at the levels of genetics and morphology are similar, as morphological characters are equivalent to genetic loci. Just as there may be different alleles present for a gene in a population, there may be different states for a morphological character. A gene and a morphological character can be duplicated during a speciation event. The gene would be an orthologue. The morphological equivalent would be a bat’s wing and a cat’s anterior legs, which are homologous characters in related species. But duplications can also occur within a species. Gene duplication can create paralogous genes. These genes are certainly homologous and have a common ancestor, but both descendents occur in the same genome. The morphological equivalent would be when morphological characters become repeated, such as teeth or extra limbs.
It is reasonable to expect that the genetics of a morphological character can evolve and thus evolve the morphological character itself. Therefore, if a morphological character has evolved, it must be because the underlying genetics have evolved. When homology is applied to phenotypic characters (e.g. morphological structures, behaviours, modes of communication), those characters existed in the last common ancestor. So both levels can be thought of as equivalents of one another and both are relatively simple to appreciate conceptually. Indeed, it isn’t surprising that similar features persist over evolutionary time and in multiple species (homology), especially if the developmental basis of that feature has also been conserved. It also isn’t surprising that different selection pressures can bring about similar features in organisms that do not share a most recent common ancestor (homoplasy). The more surprising observation is that homologous features can be formed from non-homologous developmental processes, and homologous developmental processes can be found forming non-homologous features. It is the relationship between the two levels that complicates our understanding and makes this such a strange issue.
Thinking at two levels of homology (morphological characters and the genes involved in their development), it appears to be a paradox. It doesn’t make intuitive sense that homologous morphological characters are brought about by the expression of non-homologous genes. It is not difficult to imagine a situation where this paradox causes two biologists to disagree over the supposed homology of a morphological character. If one relied on comparing gene expression between species, and the other relied on bone structure or another morphological feature, the paradox could confuse matters. A careful approach considering multiple lines of evidence is clearly required, but which lines of evidence? Is it as simple as genes vs morphology? The relationship between genotype and phenotype is remarkably complex. Developmental processes can evolve independently yet result in the same phenotypic character. This disassociation between the genotype and phenotype has been referred to as “phenotypic drift” or “developmental system drift”. Such a disassociation through evolution can make the search for homologous characters difficult. It can be easy to mistake morphological characters as being homologous just because homologous genes are involved in their development. Inversely, truly homologous morphological characters may be overlooked if it is realised that their genetic or developmental bases are different. It is also important to remember that genes do not operate in isolation. Researchers must consider networks of genes and the role they play in the development of morphological structures.
Homologous genes and non-homologous phenotypic characters
There are many examples of homologous genes being used in the development of non-homologous phenotypic characters. Most developmental regulatory genes of metazoans are more ancient than their developmental roles are. Homeobox-containing genes predate the origin of metazoans yet are often involved in patterning phenotypic structures that are unique to metazoans. Clearly their roles in development have evolved over time with new roles being gained and old roles being lost in some lineages. The segmentation in Drosophila melanogaster, Schistocerca americana and Aphidius ervi is putatively homologous, yet there are genes essential for segmentation in the fruit fly that play an entirely different role in the locust and wasp. The genes fushi tarazu and even-skipped are pair-rule genes in the fly, which divide gene expression into half-segments of the embryo. In the locust and wasp, these genes are involved in the development of the central nervous system rather than body segmentation.
It is a recurring theme that homologous transcription factors can have different roles in different taxa. Orthologues of distal-less, engrailed, and orthodenticle in echinoderms pattern different morphological features than they do in arthropods and chordates. In arthropods and chordates, distal-less is expressed during limb outgrowth and plays a role in proximodistal patterning, engrailed is involved in neurogenesis in the central nervous system, and orthodenticle has a role in the specification of anterior structures. In most echinoderms, distal-less and orthodenticle are expressed in the podia and engrailed is involved in skeletogenesis. But evolution has altered the expression and roles of these genes even among echinoderms. In the Asteroidea (sea stars), distal-less is expressed in the larval brachiolar arms. In the Echinoidea (sea urchins), engrailed is involved in rudiment invagination. In the Holothuroidea (Sea cucumbers), orthodenticle is expressed in the larval ciliated band. These changes in expression and role correlate with novel morphological features such as brachiolar complex of sea star larva or the sea urchin’s rudiment ectoderm invagination. Pre-existing genes have been co-opted for new roles in echinoderms.
Regulatory genes rarely have one role in a developing organism. The Notch signalling pathway is highly conserved and found in all metazoans. In Drosophila melanogaster, it is used in the development of wings, ommatidia, and bristles. These morphological structures are clearly not homologous, yet their development has common genetic features. Throughout the Metazoa, the Notch pathway can be found in the development of characters as diverse as feathers and T-lymphocytes. True conservation also occurs, such as the Hox genes and their role in patterning the anteroposterior axis in animals as different as fruit flies and humans. But these genes often have multiple roles. Although one role can be highly conserved, often there are divergent unique roles for these genes in different lineages.
Non-homologous genes and homologous phenotypic characters
Instead of homologous genes having roles in producing non-homologous morphologies, some homologous morphological characters are produced by non-homologous genes. Sex-lethal is a master regulatory gene that controls sex determination in Drosophila melanogaster. In other dipterans such Ceratitis capitata and Musca domestica, Sex-lethal exists but isn’t used in sex determination and is expressed during a different stage of development. Phylogenetic analysis suggests that the role in sex determination is the derived condition. Where even-skipped was co-opted to be used in the development of a novel morphological feature, Sxl has become involved in a developmental process that already existed. Sex determination in the Drosophila lineage existed before Sxl.
In most tetrapods, programmed cell death separates digit primordia during embryonic development. This creates interdigital space, allowing the primordia to develop into individual digits. In urodele amphibians, differential growth of the digits separates them, without apoptosis creating interdigital space. As a morphological feature, the digits of urodeles and other tetrapods are homologous. But the developmental processes and the genetics controlling those processes are not homologous. This phenomenon of homologous phenotypes being generated by non-homologous developmental processes is not restricted to adult morphology. In vertebrate embryos, the gastrula stage is considered to be homologous. However, it is found that very different developmental processes produce the gastrula in different vertebrate taxa.
Levels of homology
By revealing that development itself evolves, evo-devo implies that homology should be understood in a hierarchical fashion as there are several levels of homology. Homology at one level might not correspond to homology at other levels. As already discussed, two species may have homologous limbs, but the developmental processes that produce the limb, or the genetic cascades underlying those processes, may be different. For example, formation of the neural crest can occur by delamination or by cavitation, and gastrulation can occur via a blastodisc or a blastopore.
Some researchers have interpreted similar patterns of regulatory gene expression alone as evidence that morphological structures are homologous. This ignores the idea that homology may exist at several levels and it limits the evidence to a single source. Assuming that similar gene expression identifies homologous structures ignores the evolutionary histories of the structures and the regulatory genes. What exactly is homologous in a given example? The genes? Their expression patterns? Their developmental roles? The morphological structures that arise because of them? Because some of these levels can be homologous while others aren’t, mistakes can be made when expression data alone is used to assign homology to structures. At least three levels of homology and homoplasy must be considered: genes, developmental processes, and the resulting phenotypic character.
How can a morphological character (like segmentation) be homologous if different genes are involved? The answer lies in understanding developmental genetics and gene regulatory networks. Developmental processes can create different features in different organisms because they can be co-opted for new roles and old pathways can resurface or remain unexpressed, perhaps to be co-opted in the future. Wagner proposed that the homology of morphological characters is related to the continuity of gene regulatory networks (GRNs) rather than the expression of individual homologous genes. He refers to these networks as “character identity networks” (ChINs) and argues that they are what enables the execution of character-specific developmental programmes. In insect segmentation, more variation is seen in the homologous genes that are further upstream than downstream. Gap genes and pair-rule genes are higher in the segmentation hierarchy yet show more variation than lower genes such as the segment-polarity genes. Only the Diptera possess the gap gene bicoid and not even all members of the Diptera. Other segmented insects use different genes at this level of the segmentation hierarchy. But downstream GRNs are more conserved between taxa. Most if not all insects use engrailed and wingless as segment-polarity genes.
Generalising the insect segmentation data, Wagner argued that it is the most downstream regulatory networks, the ChINs, controlling the development of morphological characters that specifies the identity of the character. If homologous morphological structures are controlled by homologous ChINs, this would explain the paradoxical relationship between morphology and genes. The use of different genes in developmental programmes for homologous morphological characters could be explained by homologous ChINs co-opting different individual genes (or pathways) independently. A kernel is a highly conserved GRN. The term ChIN is instead concerned with GRNs that execute a character-specific developmental programme. Some kernels will be ChINs, but not all, as both terms were created for different reasons. One is concerned with conservation and age, the other with the relationship between the GRN and its ability to program character identity. Homologous ChINs can be very conserved, but can also co-opt different transcription factors in their regulation.
The complex evolutionary relationship between genotype and phenotype provides two important messages. Firstly, as useful as gene expression data has been, it isn’t sufficient for diagnosing homologous morphological structures. Notch signalling doesn’t suggest that our T-cells and Drosophila eyes are homologous. Regulatory genes have multiple expression domains and play multiple roles in development. Also, it has been assumed that novel structures require novel genes or at least alleles. But how could new alleles or genes become established in a population before they produce an advantageous phenotype? Developmental genes and their ability to have multiple roles suggests an answer to this question. Genes can already exist in a population as new roles evolve and provide fitness advantages for individuals, and potentially the population, given time. Because developmental genes gain and lose roles, some morphological novelties presumably arise by co-opting pre-existing developmental genes for new roles. The echinoderm morphological novelties mentioned earlier provide a good example. At the same time, it’s important not to consider the disassociation between genotype and phenotype as a hindrance to investigation or as noise that stops us from identifying truly homologous characters. There is a lot to learn from studying homology. This phenomenon provides an opportunity to understand how morphological novelties come about and the role co-option plays.
Beyond any confusion caused by multiple levels of homology, there are other common issues in the literature that quite frankly get on my nerves. The nomenclature of genes often makes it difficult. Dlx-2 in Xenopus is not orthologous with Dlx-2 in zebrafish. This example refers to paralogous genes that duplicated before the divergence that led to Xenopus and zebrafish. Even more confusing is when paralogous genes evolve by duplication in independent lineages. It can be extremely difficult to tell which of the duplicates corresponds to the ancestral gene. The homologous gene may have been lost, leaving only the paralogues. Clearly, relying on just one line of evidence isn’t always sufficient for identifying homology. Another major problem is the notion of “functional homology”, which confuses similarity due to common ancestry with similarity due to functional convergence. The functions of homologous genes can diverge from their original functions, or converge on the functions of unrelated genes. Both of these possibilities could confuse a researcher relying only on gene expression patterns as evidence of homology. Clearly homologous structures and genes can have different functions, so similarity of function is not a valid criterion for identifying homology, yet “functional homology” is still occasionally used in the literature. The solution to these two problems is to constantly consider phylogenetics and evolutionary histories when comparing gene expression data. By reconstructing the gene family in all the species being compared, the timing of gene duplications can be calculated relative to the divergences of the species. This approach should improve the likelihood of identifying true orthologues so that only their gene expression patterns are compared.
A third problem that is more difficult to solve (and happens to be one of my favourite biological topics) is the phenomenon of co-option. As discussed, this can lead to the recruitment of orthologous genes to be expressed in non-homologous structures during development. Arthropods, echinoderms, and chordates express distal-less in the distal region of their appendages during their outgrowth, but the structures themselves aren’t homologous. It has become important to distinguish the difference between homology of genes, developmental mechanisms, and morphological structures or other phenotypic characters. To use homology in comparative biology, researchers should observe that homology can exist at different levels and that true homology concerns the evolutionary histories of characters, rather than any general or functional similarity. This approach to homology should be used consistently in studies, whether studying gene expression, developmental mechanisms, or morphological structures. At least that’s what I think.
I like New Scientist headlines. I think it’s hard to take some science topics and make them catch the eye and make sense for all potential readers. It’s a tough job to do well. New Scientist have kept me in stitches with hilarious headlines over the years. Some I really dislike, such as the infamous “Darwin Was Wrong” headline. Others appear to be written by someone on acid. I’ve read about how flies unlock our understanding of slowing down or speeding up time itself, and the magazine has asked me to consider questions such as “does now exist”?
I mean, here’s an example:
It definitely works, because I stopped in my tracks while shopping and walked straight to the magazine when the headline caught my eye.
So, I’ve been getting better at coding thanks to various projects such as Deckard the Robot and my GenomeTweet acounts (HIV and E.coli are complete, yeast, fly and nematode are still running). I got bored last night so I decided to create an automated New Scientist headline generator on Twitter. This isn’t an attack on New Scientist. I certainly do have a problem with some of their sensationalism, but I love the wacky headlines that always make me smile. I don’t agree with many of their choices but I can’t fault their ability to choose eye-catching titles. Although the headline generator clearly has some creative input from me to make sure things run smoothly, the actual results for each tweet are a surprise and there are hundreds of thousands of combinations so I’m really enjoying seeing it run! Here are the first four tweets it created:
3D printing. What Einstein didn't know.—
Not New Scientist (@NS_headlines) January 20, 2014
Stranger than fiction. Cancerous cells are caused by cancerous cells.—
Not New Scientist (@NS_headlines) January 21, 2014
Subterranean mole rats and Darwin: The connection that could change the way we think about wormholes.—
Not New Scientist (@NS_headlines) January 21, 2014
Tomorrow's technology today: Wearable 3D printers.—
Not New Scientist (@NS_headlines) January 21, 2014
It’s only been running a few hours but already has 180+ followers at the time of writing (mostly scientists and science journalists). Clearly I’m not alone in enjoying New Scientist’s wacky headlines. Some thoughts on the new account:
Of course, it hasn’t escaped the notice of the lovely people at New Scientist. Fortunately they’ve taken it the right way.
In my quest to learn programming languages, I decided to build a robot. His name is Deckard. He’s mostly made of Lego with a Raspberry Pi for a brain. He’s going to be learning lots of skills and he’ll be quite social as he tweets about all his actions as well. He’ll understand spoken commands, he’ll be able to explore, function as an embarrassing alarm clock (he can take photos and has access to Twitter), reward work with biscuits, act as a security camera, tweet genomes, and be driven remotely from anywhere in the world. But it’s baby steps for now. He’s still a mess, very bulky and covered in cables. At the moment, I’m making sure that he’ll be able to avoid obstacles when exploring. Here’s a little video of him in action:
The next step is perfecting the voice recognition. You might have spotted a button on his left side near the back. When that button is pressed, he listens for 3 seconds for a command. I’ll upload a video of the voice commands when I’m happy with it. Once he’s a bit more complete you’ll find him tweeting at @DeckardRobot.
Ok, first a quick update on the GenomeTweet project. I’ve added a new species! You can see the Caenorhabditis elegans nematode genome at @GenomeNematode. The only genome that has finished tweeting is the HIV genome. Here are the rest that are currently running:
The next to finish is E. coli, which should be complete in a couple of weeks. Yeast will take another month after that. The fruit fly is going to take approximately 2.5 years. The nematode will take slightly less than the fruit fly, finishing just over 2 years from now. The human genome has been split into separate accounts for each genome but will still take approximately 5 years to finish.
The accounts have been offline every once in a while over the last few days as I made further improvements behind the scenes. I know some people checking these blog posts are interested in how the project works so I thought I’d give an update on the improvements. To see how it basically works, read this previous blog post. I’ve been changing the scripts almost continuously since I started this project. Once I’m happy that I’ve done all I can, I’ll share the code so others can use it. But I’ve still got a few changes I’d like to make.
The script described in the previous post was very simple. It would read a file that contained the genome already prepared into 140-character lines, then tweet each line. Simple as that. Since then the scripts have grown massive, then shrunk back down as I completely rewrote them. Although the tweeting is automatic, I’ve been learning more Python as I go and much of this project still relied on manual interaction. Sometimes things went wrong at Twitter’s end and caused my scripts to fail. For example, when Twitter was over capacity. When this happened, I’d have to look at the Twitter account for that genome, copy the most recent tweet to the clipboard, open the genome file, find the line that was most recently tweeted, delete that line and all lines above it, then restart the tweeting script. That’s a lot of work for a supposedly automated project. It wasn’t so bad if one script failed, but potentially 28 could fail. I hadn’t anticipated that Twitter would cause so many problems. I needed to rewrite the scripts so that they could handle these problems themselves. Over the last week I’ve made multiple changes.
It currently works like this: Each genome (or human chromosome) has a genome file and a script that tweets it. The script communicates with Twitter, reads in the file, and tweets each line until the genome is complete. That’s the ideal world. If something does go wrong at Twitter’s end, then the script stops trying to tweet the genome, it waits a few seconds (to give Twitter some time to start behaving), accesses Twitter and makes a note of the most recent successful tweet to be tweeted. The script then looks for this tweeted line in the genome file and deletes it and everything before it, so that the genome file is essentially reset and ready to start being tweeted. The script then runs again from the beginning, tweeting the genome file either until it finishes, or until there’s another error. This means the script will keep running even when there is a problem but it will only alter the genome file if it has already started trying to tweet. Occasionally I want to prepare the genome files and make them ready to tweet then try tweeting. For that I have a genomeRestart script, which works with all the genomes. It asks which genomes you want to restart, it then checks Twitter for the most recent tweet, updates the genome files, then runs the genome’s tweeting script, which should run fine and if there’s a further problem then it deals with it within that script. The purpose of the genomeRestart script is for the occasions when I deliberately stop a script from tweeting, perhaps because I’m adding improvements to the code, or the scripts stop because of a problem at my end rather than Twitter’s (loss of connection or a power cut).
Now I’m free to sit back and let the scripts do their thing. The only manual interaction at the moment is if there’s a power cut or a loss of internet connection (not had either yet). If it happens, all I need to do is run the genomeRestart script, select the genomes I want to restart (all of them), and it will prepare their genome files and then run the scripts. Easy as that. The scripts get better and better each week as I learn more Python, but I still have further improvements in mind. It’s addictive. The biggest improvement would be to host the whole project on a server so it isn’t running from one of my computers.
Update: Last night, Fred Sanger died. He was a brilliant scientist who won the Nobel Prize several times. He pioneered research of DNA sequencing that eventually led to us sequencing the entire human genome. It was a coincidence that I begun tweeting the human genome on the night he died, but I will dedicate this project to the man and his work. I strongly recommend reading up on his life and scientific contributions if you haven’t heard of him.
The first ever genome on Twitter was HIV. It was followed by the E. coli, yeast (Saccharomyces cerevisiae), and fruit fly (Drosophila melanogaster) genomes. The GenomeTweet project has received quite a bit of interest and I’ve had many questions and requests sent my way. Some people have asked why I’m doing it. Others have asked how it works. The most common question? How long would it take to tweet the human genome? The most common request? Please tweet the human genome.
The human genome consists of approximately 3,200,000,000 nucleotide base pairs (slightly less are actually sequenced, the point is it’s a big number). How long would that take to tweet? At my current rate of just under a thousand tweets per day, and assuming no major delays for technical reasons, it would take approximately 65 years. You can compare this with the completed and current GenomeTweet projects in the table below.
Please do not fall into the trap of thinking humans are special because they have a huge genome compared to the others. If I was feeling particularly ambitious and optimistic, I could tweet a fish genome such as Protopterus aethiopicus (the marbled lungfish). It would take 2652 years to tweet its 130,000,000,000 bp genome. Note that the genomes of animals aren’t necessarily large compared to non-animals. Paris japonica (キヌガサソウ, “the canopy plant”) has a 150,000,000,000 bp genome that would take over 3060 years to tweet. There’s huge variability in genome size even among animals or plants. Nasuia deltocephalinicola, an insect, has the smallest known animal genome at only 112,000 bp. It would only take 20 hours to tweet. Many bacteria and even viruses have bigger genomes than this animal! Extremely small genomes are frequently found in highly derived animals occupying very peculiar ecological niches. Most of the smallest animal genomes belong to parasites. The goal of GenomeTweet is not to provide all of these genome sizes. My aim is to allow Twitter users to relate genome size to number of tweets and really see the differences, as explained in previous entries.
It will take 65 years to tweet the human genome. Doable? Definitely. Practical? Probably not. I hope Twitter is still around in 65 years time! Rather than admit defeat and say it’s not possible, I’d like to offer a compromise. Tonight at 10pm I will begin tweeting the human genome and it will take just over 5 years. Instead of one account tweeting the entire human genome, there will be 24 accounts tweeting chromosomes 1-22 and the two sex chromosomes (X and Y). Because they all start at the same time, the genome will be complete once the largest chromosome has finished. Chromosome 1 is the largest at approximately 249,000,000 bp. This chromosome will take twice as long to tweet as the entire fruit fly genome. Some may think 5 years isn’t practical, but it’s better than 65!
You can find the different accounts here:
@HumanGenome1: HumanGenome – Chromosome 1
@HumanGenome2: HumanGenome – Chromosome 2
@HumanGenome3: HumanGenome – Chromosome 3
@HumanGenome4: HumanGenome – Chromosome 4
@HumanGenome5: HumanGenome – Chromosome 5
@HumanGenome6: HumanGenome – Chromosome 6
@HumanGenome7: HumanGenome – Chromosome 7
@HumanGenome8: HumanGenome – Chromosome 8
@HumanGenome9: HumanGenome – Chromosome 9
@HumanGenome10: HumanGenome – Chromosome 10
@HumanGenome11: HumanGenome – Chromosome 11
@HumanGenome12: HumanGenome – Chromosome 12
@HumanGenome13: HumanGenome – Chromosome 13
@HumanGenome14: HumanGenome – Chromosome 14
@HumanGenome15: HumanGenome – Chromosome 15
@HumanGenome16: HumanGenome – Chromosome 16
@HumanGenome17: HumanGenome – Chromosome 17
@HumanGenome18: HumanGenome – Chromosome 18
@HumanGenome19: HumanGenome – Chromosome 19
@HumanGenome20: HumanGenome – Chromosome 20
@HumanGenome21: HumanGenome – Chromosome 21
@HumanGenome22: HumanGenome – Chromosome 22
@HumanGenomeX: HumanGenome – Chromosome X
@HumanGenomeY: HumanGenome – Chromosome Y
As with the other GenomeTweet accounts, none of these are worth following in my opinion. They might be interesting, maybe even useful, but if you enjoy using Twitter then you probably shouldn’t follow any of these accounts. I say this, but the E. coli account has 74 followers at the time of writing. If you’re interested in checking them out then you can either use the links above or view all of them at the same time in this Twitter list.
I look forward to linking back to this entry in 5 years time!
I also want to take a moment to share some Twitter reactions to the GenomeTweet project so far.
(It’s worth it!)
Interesting sequence of events. @GenomeTweet tweets whole genome sequences. Here’s why: endlessforms.net/2013/10/29/gen… #genomics by @Harrison_Peter—
Malcolm M. Campbell (@m_m_campbell) October 30, 2013