Correlation versus Causation: the Eternal Struggle

By John McLaughlin


Scientists are always warning the public – and each other – not to confuse correlation with causation. Whenever a study is published linking our favorite food to cancer, heart attacks, or other health problems, we are cautioned to take these findings with a grain of salt because identifying causes in a complex sea of correlations is a daunting task.


Despite the challenge, a major task of researchers is to uncover causes – whether it’s the simple mechanism of a protein’s activity within the cell, or a population-level analysis of interactions among genes that increase the risk of disease.


This raises the question: what exactly does it mean for X to cause Y? The concept of causality has existed for a long time, predating the scientific revolution by many centuries. Aristotle explained causation by dividing it into four separate aspects. Take the simple example of a wooden table: its material cause is the wood of which it is composed, its efficient cause is the carpenter who crafted it, its formal cause is the particular shape which makes it a table rather than something else, and its final cause is the purpose for which it was created, maybe to hold a lamp.


Scientists today don’t operate with such a multifaceted theory of causation. Although the meaning of ‘cause’ is usually taken for granted in everyday life, when pressed for a precise definition a biologist would likely explain cause and effect in terms of probabilities. According to probabilistic theories of causation, a cause both precedes its effect and increases its probability, all other things being equal. For instance, we know that smoking causes heart disease; this does not imply that everyone who smokes will suffer heart problems, but it does mean that smokers have a higher probability than non-smokers of developing heart disease, all other factors being held equal.


In order to scientifically study causal relationships, a critical requirement is the ability to intervene in a system and manipulate separate variables. Luckily, researchers can often alter experimental variables and examine counterfactual scenarios, which take the form ‘if X causes Y, then if X does not occur, Y will not occur’. Model organism biologists pride themselves on this skill. Working in the lab, if I’d like to determine whether a particular mutation is the cause of an interesting phenotype, I can compare flies that are genetically identical in all respects except for the mutation in question. By eliminating the confounding variables in this way, a direct causal link can be established.


What, then, is the relationship between causation and correlation? Two correlated variables or events share a mutual connection that can be observed as a positive or negative relationship. At first glance, a correlation between two variables may suggest to us a causal relationship, but this conclusion does not necessarily follow. Fires and fire trucks are often correlated, but obviously it is not the fire trucks that cause fires. To demonstrate this point, just take a look at the ridiculous spurious correlations that can occur between events that are not causally linked.


To make the issue more confusing, even if we do know with certainty that x causes y, it does not therefore imply that these variables will be correlated. Imagine a mixed community of smokers and non-smokers: cigarette smoking is a known cause of heart disease, but in this hypothetical population all of the smokers exercise while the non-smokers do not. If the heart-healthy benefits of the smokers’ exercise perfectly counteract their increased risk of heart disease, then there will be no correlation between smoking and heart disease at the population level.


In a game of billiards, the precise ordering of cause and effect is obvious to the observer. In the real world, discovering causal relationships is often a slow and arduous process, but it’s what scientists signed up to do.

Fighting Zika Virus with Mosquito Genetics


By  John McLaughlin


The Zika virus burst into the news last year when a dramatic increase in microcephaly cases was reported throughout several states in Brazil. This frightening birth defect quickly became associated with the mosquito-borne virus, carried by Aedes mosquitos; Aedes aegypti, which also carries Dengue, is the main vector in the current Zika outbreak. While Zika virus usually affects adults with fairly mild symptoms such as fever, rash, and joint pain, it can have severe or fatal consequences for the fetuses being carried by infected females. In fact, The World Health Organization (WHO) has recently reported a scientific consensus on the theory that Zika is the cause of the large number of Brazilian microcephaly cases.


In January of 2016, a Hawaiian baby born with microcephaly became the first case of Zika reported in the United States. And the U.S. National Institute of Allergy and Infectious Diseases has recently stated that a wider outbreak of the virus within the United States will likely occur soon. Naturally, mosquito containment has become a top priority for health officials in both infected areas and those likely to be impacted by the virus. The standard list of mosquito control protocols includes pesticide repellents, mosquito nets, eliminating stagnant open water sources, and long-sleeved clothing to limit skin exposure. In addition to these, health authorities are considering a number of new strategies based on genetic engineering technologies.


One such technique employs the concept of gene drive, the fact that some “selfish” gene alleles can segregate into gametes at frequencies higher than the expected Mendelian ratios. In this scenario, gene drive can be exploited to spread a disease resistance gene quickly throughout a population of mosquitoes. Recently, a team at the University of California tested this idea by using CRISPR technology to engineer the mosquito Anopheles stephensi with a malarial resistance gene drive. After integration of the resistance gene cassette and DNA targeting with CRISPR, this gene was successfully copied onto the homologous chromosome with high efficiency, thus ensuring that close to 100% of its offspring will bear resistance. Possibly, similar techniques could be exploited to engineer Zika resistance in Aedes mosquitoes.


In contrast to engineering disease resistance, an alternative defense strategy is to simply reduce the population of a specific mosquito species, in the case of a Zika outbreak, Aedes aegypti. The WHO has recently approved a GM mosquito which, after breeding, produces offspring that die before reaching adulthood. This technique can dramatically reduce an insect population when applied in strategic locations. The British biotech firm Oxitech has also developed its own strain of sterile Aedes aegypti males. In laboratory testing, these GM mosquitoes compete effectively with wild males for female breeding partners. The short-term goal is receiving approval to test these sterile males in the wild; ultimately, a targeted release of these mosquitoes will reduce the Aedes aegypti population in Zika hot spots without affecting other species.


In parallel to mosquito engineering, other work has focused on studying the mechanisms underlying Zika’s dramatic affects on the brain. To study the process of Zika infection in vitro, scientists at Johns Hopkins cultured 3-D printed brain organoids and demonstrated that the virus preferentially infects neural stem cells, resulting in reduced cortical thickness owing to the loss of differentiated neurons. This neural cell death may explain the frequent microcephaly observed in fetuses carried by infected mothers.


Much like the recent outbreak of Ebola in several African countries, this event helps underscores the importance of basic research. A recent New York Times article drew attention to this fact by highlighting the need for more complete genome sequences of the mosquito species that carry Zika. With a complete genome sequence at hand, researchers might be able to piece together information in answering questions such as: why are some Aedes mosquitoes vectors for Zika and others aren’t? Species differences in genome sequence may provide some answers. Nevertheless, greater knowledge of the mosquito’s biology will yield more options for human intervention. This is an excellent case study in how ‘basic’ and ‘translational’ research projects can co-evolve in special situations.


Dr. Thomas Gregor

Development On the Fly: An Interview with Dr. Thomas Gregor

By John McLaughlin


Thomas Gregor is a biophysicist and Professor at Princeton University. His Laboratory for the Physics of Life uses both Drosophila melanogaster and Dictyostelium discoideum as model systems to understand developmental processes from a physical perspective.


Could you briefly describe your educational path from undergraduate to faculty member at Princeton?

TG: As an undergraduate, I studied physics in Geneva, and then moved into theoretical physics and math. I came to Princeton, initially for a theoretical physics PhD; I switched during my time here to theoretical biophysics and then realized that it makes sense to combine this with experiments. I ended up doing a PhD between three complementary disciplines. My main advisor was Bill Bialek, a theoretical physicist. My other two were David Tank, an experimental neuroscientist, and Eric Wieschaus, a fly geneticist. So I had both experiment and theory, from a biological and a physical side. I then went to Tokyo for a brief post-doc, during which I continued in that interface. But I changed model organisms: I switched from a multicellular, embryonic system to looking at populations of single cells [the social amoeba Dictyostelium discoideum]. As a physicist you’re not married to model organisms. When I came back to start my lab at Princeton in 2009, I kept both the fly and the amoeba systems.


What is the overall goal of your lab’s research program?

TG: Basically, to find physical principles behind biological phenomena. How can we come up with a larger, principled understanding that goes beyond the molecular details of any one particular system? I mostly look at genetic networks and try to understand their global properties.


Do you think the approaches of biologists and physicists are very different, and if so are they complementary?

TG: I’m driven by the physical aspects of things, but I’m also realistic enough to see what can now be done in biological systems, in terms of data collection and what we can test. To find the overlap between them is kind of an art, and I think that’s where I’m trying to come in.


Do you have any scientific role models who have shaped how you approach science?

TG: The three that I mentioned: Bialek influenced me in the types of questions that speak to me; Tank had a very thorough experimental approach that taught me how to make real, physics-style measurements; and Wieschaus brought a lot of enthusiasm and knowledge of the system.


Your lab has been studying developmental reproducibility and precision, in the patterning of the fly Drosophila melanogaster. In a 2014 paper1, you showed that levels of the anterior determinant bicoid mRNA vary by only ~9% between different embryos. This is a very similar value to the ~10% variation in Bicoid protein levels between embryos, which you demonstrated several years earlier2. So it seems that this reproducibility occurs even at the mRNA level.

TG: Before going into this, the general thought in the field is that things were very noisy initially, and as the developmental path goes along it becomes more refined and things become more precise. This paper basically asked whether the precision is inherited from the mother, or the embryo needs to acquire it. Because the fluctuations in mRNA, from the mother, completely mimic the fluctuations in protein that the zygote expresses, that told us that the mother lays the groundwork, and passes on a very reproducible pattern. So there’s no necessity for a mechanism that reduces fluctuations from the mRNA to the protein level.


Continuing on the theme of precision: in a separate paper from the same year3, your lab showed that the wing structure among different adult flies is identical to within less than a single cell width. Did you have any prior expectations going into this study, and did the results surprise you?

TG: Before looking at the wing, I had kind of made up my mind. I had first seen single cell precision in patterning of gene expression boundaries in the embryo. But I also knew that it’s always better to make a measurement first, and it seems that things are much more precise and reproducible in biology than we think, given the idea of “sloppiness” that we have.


Do you think that a high level of reproducibility is a general feature of development, or varies widely among different types of species?

TG: It’s a philosophical question in a way, because I haven’t looked. I think what we found in the embryo is not special to the fly; specific mechanisms for getting there might be unique to the fly. For instance, we have also shown in a recent paper from 2013 that transcription is just as noisy in flies as it is in bacteria, hugely noisy. So, physical mechanisms like temporal and spatial averaging seem enough to reduce the high ubiquitous noise that transcription has to the very fine, reproducible patterns that you see in the fly. The specific mechanisms that reduce noise will be very different from species to species, but I think overall the fact that development is precise and reproducible is something we may one day be able to call a principle.


If you could make any changes to scientific institutions, such as the current funding system, journal peer review, etc. what would they be?

TG: One thing that might be nice is if we didn’t have to fund graduate students for the first five years of their career; it would be nice to have more streamlined training grants, not only for U.S. but also international graduate students. And so, graduate students wouldn’t have to worry. They should be free to choose a school based on their scientific interests.

For peer review in journals, the problem is the sheer volume of output is becoming so high. One way to keep a peer review system, is either to pay the reviewers money, or to put everything on the bioRxiv [bio archive is a pre-print server for the life sciences] and let some other means determine how to evaluate a paper. I don’t read papers from looking at the top journals’ table of contents every week, I read them because I see people talk about it on Twitter, or my colleagues tell me I should look at that paper, or because I hear about the work in a talk and decide to see what else the guy is doing.

A lot of people are advocating the new metrics – citations, citation rates, H-index – which are so dependent on the particular field and not necessarily a good measure of impact. In 100 years, are we going to look more at those papers than the ones that currently get very few citations? We don’t know. I don’t think the solution is out there yet.


Do you have any advice for young scientists – current PhD students or post-doctoral fellows – for being successful in science?

TG: My advice would be to focus on one very impactful finding. If it’s very thorough and good science, it will be seen. Also, nothing comes from nothing. You need to put in the hours if you want to get a job in academia. And I think that’s one of the ways to measure a good scientist, because knowledge in experimental science comes from new, good data.

What are some future goals of your lab’s research?

TG: We’ve been looking at the genetic network in the fly embryo, trying to understand properties of that network. Medium term, we want to incorporate a slightly different angle, which is looking at the link between transcriptional regulation and the 3D architecture of the genome. In the living embryo, we want to look at how individual pieces of DNA interact, and how that influences transcription and eventually patterning. In the longer term, I don’t know yet; I just got tenure, so I need to sit back. Everything is open. That is what’s nice about being a physicist; you’re not married to your biological past so much.


In your opinion, what are the most exciting developments happening in biology right now, whether in your own field or elsewhere?

TG: It’s definitely the fact that so many different disciplines have stormed into biology, making it a very multidisciplinary science. I think it makes the life sciences a very vibrant, communal enterprise. Hopefully the next decades will show the fruits of those interactions.


This question is asked very often: How do you balance your lab and family life?

TG: When you start thinking about having a family in science, things become much more complicated. Since I’ve had children, my workload went down a lot. My wife is also a scientist, and for her it’s much harder because she’s not yet tenured. As much as people look at the CV and see how many high-profile papers you have, they should also look at it and see your family and life situation. And for women in science, despite all the efforts that have been made, I don’t think we’re there yet.



[ordered_list style="decimal"]

  1. Petkova, MD et al. Maternal origins of developmental reproducibility. Current Biology. 2014. 24(11).
  2. Gregor, T et al. Probing the limits to positional information. Cell. 2007. 130(1).
  3. Abouchar, L et al. Fly wing vein patterns have spatial reproducibility of a single cell. J R Soc Interface. 2014. 11(97).




The Royal Society: 350 Years of Scientific Publishing


By John McLaughlin


Professional scientific journals are commonplace and widely distributed today, but their origin dates to over three centuries ago. This year marks the 350th anniversary of the oldest continuously published scientific journal, Philosophical Transactions of the Royal Society, first appearing in 1665. This is the flagship journal of Britain’s Royal Society, founded in 1660 London by a fellowship of physicians and philosophers. It remains Britain’s most prestigious scientific academy, and serves as the main scientific advisor to the UK government.


The Royal Society’s founding occurred during a very important historical period, arguably at the beginning of Europe’s scientific revolution. Its guiding principles were inspired largely by the work of Francis Bacon, a British politician and philosopher who died a few decades before its creation. His most important work, The New Organon, set out a vision for a new and more rigorous scientific methodology, based on empirical observation and testing theory by experiment. Bacon lamented the past centuries’ slow pace of progress in the sciences, and emphasized the need to place them on a firmer foundation in order to accurately study natural phenomena. He also cautioned against the various idols, or biases, which affect our proper understanding of the natural world, such as those determined by one’s personal history, culture, or deference to authority. He would have been pleased to see that the Royal Society’s founders took these ideas to heart; this is well captured in the Society’s motto, Nullius in verba: Take nobody’s word for it.


Philosophical Transactions introduced, in a more primitive form, several of the modern hallmarks of scientific research: most articles were reviewed and edited by Society members, systematically curated, and widely distributed. Interestingly, the journal operated at a financial loss for most of its history, only recently becoming profitable. It also made achievements in social equality; a 1787 article by Caroline Herschel, describing several new comets, became the journal’s first paper authored by a woman, and the Royal Society’s first female fellows were elected in 1945. As scientific disciplines proliferated and accumulated knowledge over the generations, the 19th century saw the journal split into two series, Philosophical Transactions A and B, dedicated to the physical and life sciences respectively. Today, both journals publish invited articles, with each issue centered on a specific theme.


In its capacity as a grant awarding agency, the Royal Society funds about 1,500 researchers around the United Kingdom, and provides fellowships for international scientists who wish to conduct research in or partner with UK universities. As part of its mission in promoting and recognizing excellence in science, it hosts frequent scientific meetings and lectures on a variety of topics, many of which are open to the public. To be elected a fellow of the Royal Society is a high honor, first requiring recommendations from two current fellows; the 8,000 fellows inducted in its long history have included scientific giants such as Isaac Newton, Charles Darwin, James Clerk Maxwell, and Stephen Hawking.


Scientific publishing had humble beginnings; in the 21st century, the spread of electronic journals has given us easy access to a number of high-quality papers that past generations of scientists could not have imagined. The sciences have changed dramatically over the years, but the institutions of publication and peer review will remain centrally important.

21st Century Science: an Academic Pyramid Scheme?


By John McLaughlin

Academic science is traditionally built on an apprenticeship model, in which a student works under the mentorship of a principal investigator, learning the skills of the trade and preparing to be an independent researcher. After a few years of training as a post-doctoral fellow, a scientist would likely obtain a tenure-track position at a university (if choosing the academic route) and mentor the next generation of scientists, continuing the academic circle of life. In the past few decades, this situation has drastically changed.


As most graduate students and post-docs have probably noticed, there has been an enormous amount of discussion on the difficulties of landing a good academic job following the PhD. In searching for the causes of this phenomenon, commentators have described several factors, two of the most salient being the recent stagnation in NIH funding (adjusted for inflation), and a dramatic increase in the number of PhDs awarded in the natural sciences. To provide context for the situation in the U.S., in the past three decades about 800,000 PhDs were awarded in science and engineering fields, compared to ~100,000 tenure-track positions created in the same time frame. These forces have changed the structure of the scientific academy, the result being a new arena in which many PhDs are competing for a smaller number of academic jobs, and with those who land one often shuttling between low-paying adjunct positions with meager benefits and no possibility of tenure.

Economists studying the U.S. scientific academy, particularly the post-doctoral fellow system, have gone so far as to describe it as a “pyramid scheme.” This type of financial scheme operates by luring new investors with the promise of an easy payout; but the players nearer the top profit the most, at the expense of those at the bottom.
Post-doctoral fellows, often the main workhorse of a biology research lab, are cheap (~$40,000 starting salary in U.S.) and replaceable, owing to the large excess of PhDs on the market; graduate students are even cheaper, as they often teach to earn their salaries. And a principal investigator (PI) running a large, well-funded lab will gain status and prestige for all grants and publications generated by their personnel.


Despite the less than ideal job prospects awaiting science PhDs, the government and media continue to strongly advocate education in the STEM fields, encouraging more undergraduates to pursue STEM majors and thereby increasing the number at the graduate level. While U.S. society’s general enthusiasm and respect for science is definitely positive, it is irresponsible to push so many young people into this career path without making substantial funding commitments. Certainly, not all PhD students intend to pursue a career in academia, and those who do may later find that their passion lies elsewhere, for instance in a biotechnology field. However, one should keep in mind that the past decade has also been rough for the U.S. pharmaceutical industry. Since 2000, thousands of U.S. and European industry research positions have been lost, while several “big pharma” firms plan to open new R&D centers in Asia, where costs are lower.


Although the outlook might seem bleak for those currently navigating these turbulent academic waters, the calls of post-doctoral advocacy organizations for increased salaries and benefits may finally be making a difference. This year, the NIH increased the base salary of its National Research Service Award post-doctoral trainees, and other institutions have increased post-doctoral pay and benefits, resulting in higher post-doc satisfaction.


These proposals will not only increase the quality of life for current post-docs, but also change the incentive structure of the marketplace: as laboratory personnel become more expensive, PIs will hire more selectively. Fewer PhDs will enter the post-doctoral route, either opting to pursue a career in industry or another field entirely. It may take years for these policy changes to be fully implemented, but hopefully academic scientists will be able to pursue their passion without fearing for their livelihoods or career prospects.

The Use and Misuse of p-values in Biology


By John McLaughlin

After completing an experiment, most of us dutifully perform statistical tests to determine whether our results are “significant.” These tests heavily determine whether experimental findings are considered robust, interesting, and publishable. P-values are commonly used to report statistical significance in the biology literature, but biologists have been chastised in recent years for misunderstanding and misusing this statistic. Underscoring this problem, a recent paper in PLOS Biology surveyed the scientific literature and found widespread evidence of  “p-hacking”, or the manipulation of experimental parameters, such as sample size and the removal of outlier data points, for the sole purpose of obtaining statistically significant p-values.


What is the precise definition of the p-value, as it is most commonly used in biological research? It is important to note that there are several different interpretations of the concept of “probability”, perhaps the two most notable belonging to the Bayesian and Frequent schools of statistics. According to the Bayesian approach (developed by 18th century mathematician Thomas Bayes), probability is best thought of as the likelihood of a particular outcome, given our prior knowledge of the situation in addition to newly acquired data. To give a commonplace example: when searching for a lost set of keys in your home, you will want to estimate the probability that they are in a given location — most likely by remembering previous occasions that the keys were lost and where they were recovered. This “prior” knowledge will factor heavily into your probability estimate. You can then contribute new data to update this probability estimate, for example if it is known with certainty that the keys are not in one of these locations. The Bayesian interpretation of probability accords more with our common, everyday usage of the term.


However, the understanding of probability that dominates in the biological sciences is known as Frequentism; most p-value statistics in biological research are computed using this school’s methods. According to frequentist statistics, the probability of a given event is simply the frequency with which it occurs. To give a simple example: If a coin is flipped 100 times and lands “heads” on 58 flips, the probability of the coin’s landing heads is 0.58. Presumably, as the number of coin flips approaches infinity, the observed frequency of heads will approach the “true” probability of 0.5. Frequentism is based on the notion that repeated randomized trials, or experiments, will in the long run approximate the true probability of an event.


When running an experiment in the lab, a biologist may want to know the probability of her hypothesis being true, given the experimental data she observes. A p-value calculated using a standard t-test, however, would tell her the converse of this: the probability of observing the experimental data, given the null hypothesis being true. A common experimental  “null hypothesis” is a statement of no relationship between the variables under observation (e.g. the means of two data sets are roughly equal). The p-value is therefore the probability of observing the experimental data or a data set more extreme, when assuming that this null hypothesis is correct – a lower p-value makes a stronger case to reject this null hypothesis.


There are a few things that the p-value statistic definitely does not tell a scientist. First, do experimental results with a low p-value tell a scientist that her hypothesis is correct? No. Rejecting the statistical null hypothesis is not equivalent to accepting her particular biological hypothesis. Is the p-value the probability that the null hypothesis is correct? Again, no. Biologists and statisticians use the term “hypothesis” very differently. When the statistician and evolutionary biologist Ronald Fisher popularized use of the p-value in the 1920s, it was never intended as a metric for confirming or refuting biological hypotheses. It was meant to be a general heuristic for judging whether a data set might warrant a second look or follow-up experiments; the p-value itself does not decisively settle any experimental questions.


What should researchers do to avoid p-hacking? One recent paper on this topic recommends choosing the experimental sample sizes in advance, detailing the removal of any outlier data points, and allowing other researchers access to the raw data. P-value statistics can be useful when employed properly, but they are not the whole story. As scientists face continued pressure to report “significant” findings and publish in high-tier journals, understanding procedures for proper data interpretation will be increasingly important. Hopefully, the trend towards open access publication will encourage greater transparency and scrutiny of experimental data reporting, along with a better understanding of p-value statistics and their applications.

Lethal Weapon: How Many Lethal Mutations Do We Carry?


By John McLaughlin

Many human genetic disorders, such as cystic fibrosis and sickle cell anemia, are caused by recessive mutations with a predictable pattern of inheritance. Tracking hereditary disorders such as these is an important part of genetic counseling, for example when planning a family. In fact, there exists an online database dedicated to medical genetics, Mendelian Inheritance in Man, which contains information on most human genetic disorders and their associated phenotypes.


The authors of a new paper in Genetics set out to estimate the number of recessive lethal mutations carried in the average human’s genome. The researchers’ rationale for specifically focusing on recessive mutations is their higher potential impact on human health; because deleterious mutations that are recessive are less likely to be purged by selection, they can be maintained in heterozygotes with little impact on fitness, and therefore occur in greater frequency. For the purposes of their analysis, recessive lethal disorders (i.e. caused by a recessive lethal mutation) were defined by two main criteria: first, when homozygous for its causative mutation, the disease leads to the death or effective sterility of its carrier before reproductive age, and second, mutant heterozygotes do not display any disease symptoms.


For this study, the researchers had access to an excellent sample population, a religious community known as the Hutterian Brethren. This South Dakotan community of ~1600 individuals is one of three closely related groups that migrated from Europe to North America in the 19th century. Importantly, the community has maintained a detailed genealogical record tracing back to the original 64 founders, which also contains information on individuals affected by genetic disorders since 1950. An additional bonus is that the Hutterites practice a communal lifestyle in which there is no private property; this helps to reduce the impact of confounding socioeconomic factors on the analysis.


Four recessive lethal genetic disorders have been identified in the Hutterite pedigree since their more detailed records began: cystic fibrosis, nonsyndromic mental retardation, restrictive dermopathy, and myopathy. To estimate the number of recessive lethal mutations carried by the original founders, the team used both the Hutterite pedigree and a type of computational simulation known as “gene dropping”. In a typical gene dropping simulation, alleles are assigned to a founder population, the Mendelian segregation and inheritance of these alleles across generations is simulated, and the output is compared with the known pedigree. One simplifying assumption made during the analysis is that no de novo lethal mutations had arisen in the population since its founding; therefore, any disorders arising in the pedigree are attributed to mutations carried by the original founder population.


After combining the results from many thousands of such simulations with the Hutterite pedigree, the authors make a final estimate of roughly one or two recessive lethal mutations carried per human genome (the exact figure is ~0.58). What are the implications of this estimate for human health? Although mating between more closely related individuals has been long known to increase the probability of recessive mutations homozygosing in offspring, a more precise risk factor was generated from this study’s mutation estimate. In the discussion section it is noted that mating between first cousins, although fairly rare today in the United States, is expected to increase the chance of a recessive lethal disorder in offspring by ~1.8%.


Perhaps the most interesting finding from this paper was the consistency of the predicted lethal mutation load across the genomes of different animal species. The authors compared their estimates for human recessive lethal mutation number to those from previous studies examining this same question in fruit fly and zebrafish genomes, and observed a similar value of one or two mutations per genome. Of course, the many simplifying assumptions made during their analyses should be kept in mind; the estimates are considered tentative and will most likely be followed up with similar future work in other human populations. It will certainly be interesting to see how large-scale studies such as this one will impact human medical genetics in the future.


Darwin’s Finches Revisited


By John McLaughlin

In 1859, Charles Darwin published the now famous “On the Origin of Species,” containing the first presentation of his theory of the common origin of all life forms and their diversification by means of natural selection. One aim of this theory was to explain the diversity of traits found in nature as a result of the gradual adaptation of populations to their environments. This point is elegantly summarized in the third chapter:


[quote style="boxed"]Owing to this struggle for life, any variation, however slight and from whatever cause proceeding, if it be in any degree profitable to an individual of any species, in its infinitely complex relations to other organic beings and to external nature, will tend to the preservation of that individual, and will generally be inherited by its offspring.[/quote]


A large contribution to this theory resulted from his five-year voyage aboard the HMS Beagle, during which he traveled in South and Central America, Africa, and Australia. Darwin collected a huge volume of notes on various plant and animal species, perhaps most famously the finch species inhabiting the Galápagos islands to the west of Ecuador. Although his finch studies were only briefly mentioned in one of his journals, “Darwin’s finches” are now a popular example of microevolution and adaptation for both students and the general public. One striking feature of these finch species is their diversity of beak shape; finches with larger, blunt beaks feed mainly on seeds from the ground while those with longer, thin beaks tend to have a diet of insects or seeds from fruit.


A recent study published in Nature examines the evolution of fifteen finch species that Darwin studied during his time in the Galápagos. Although previous work has helped construct phylogenetic trees based on mitochondrial and microsatellite DNA sequences from these same specimens, this is the first study to perform whole genome sequencing of all fifteen species. In addition to a more accurate phylogeny, these genome sequences allowed for new types of analyses to be performed.


First, the authors assessed the amount of interspecies hybridization that has taken place among the finches in their recent evolutionary history, and found evidence for both recent and more ancient hybridization between finch species on different islands. The authors then looked for specific genomic regions that could be driving the differences in beak morphology among the different finch species. To perform this analysis, they divided closely related finch species on the basis of beak shape, into either “pointed” or “blunt” groups; the genomes from each group were then searched for differentially fixed sequences. On the list of most significant regions uncovered, several included genes known to be involved in mammalian and bird craniofacial development. The top hit, ALX1, is a homeobox gene that also has previously established roles in vertebrate cranial development. Interestingly, almost all of the blunt beaked finches shared a specific ALX1 haplotype (“type B”) which was distinct from that shared by their pointed beak counterparts (“type P”), and vice versa. Based on the distribution of the “P” and “B” haplotypes, the authors estimated that these two groups of finches diverged approximately 900,000 years ago.


By applying genome-sequencing technologies, these labs were able to shed new light on a classic story in biology. Until fairly recently, phylogenetic relationships such as those described in the article could only by inferred on the basis of external morphology. In a Nature News piece commenting on this study, one of the co-authors remarked on what Darwin would think of the results: “We would have to give him a crash course in genetics, but then he would be delighted. The results are entirely consistent with his ideas.”

DNA gel

Biotech Breakthrough: The CRISPR/Cas System


By John McLaughlin

In the last few years, a huge amount of excitement has grown over the CRISPR/Cas system and its use in targeted genome editing; this acronym derives from Clustered Regularly Interspaced Short Palindromic Repeats and their CRISPR-associated genes (Cas). CRISPR loci, which are found in many species of bacteria and most archae, have been collectively described as an RNA-based “immune system,” because of their ability to recognize and destroy foreign phage and plasmid DNA.


Although the acronym was first coined in a 2002 paper, CRISPR has only recently been exploited as a research tool. How does the system work and what is its use in the lab? There are at least three distinct types of CRISPR system. A typical “type II” CRISPR locus consists of several protein-coding Cas genes adjacent to an array of direct repeat and spacer sequences. The direct repeats are usually palindromic and conserved, in contrast to the much more variable spacers; these repeat-spacer sequences are transcribed as one unit and then processed into short CRISPR-RNAs (crRNAs).  A 2007 Science article demonstrated that a bacterial population could acquire resistance to phage infection by incorporating DNA fragments from the invading phage genome into a CRISPR locus, in the form of new spacer sequences. The newly acquired spacers are then transcribed and processed into crRNAs, associate with a trans-activating RNA (tracRNA) and Cas protein, and are eventually guided to a homologous DNA sequence to catalyze a double-stranded break.


The CRISPR system can be flexibly “reprogrammed” by designing custom chimeric RNAs (chiRNA), which serve the function of both crRNA and tracRNA in one molecule. By co-expressing a “designer” chiRNA with a Cas protein, a targeted and specific DNA break can be created in the genome; after providing an exogenous DNA template to help repair the break, customized knock-ins or knock-outs can be generated. Judging from the rapid technical advances made in the last few years, the system promises to be an efficient and high-throughput format for genome editing. To date, knock-outs have been created in a variety of organisms including rats, flies, and human cells.


CRISPR/Cas technology has attracted scientific attention as well as commercial interests. In November 2014, biologists Jennifer Doudna and Emmanuelle Charpentier were honored as co-recipients of the 2015 Breakthrough Prize in the Life Sciences, for their work in dissecting the mechanism of CRISPR’s sequence-specific DNA cleavage. According to its proponents, the possible applications of the CRISPR system seem almost limitless. CRISPR Therapeutics, a recently formed company dedicated to translating the technology into genetic disease therapies, has raised 25 million dollars from new investors. And just last month, the pharmaceutical company Novartis began collaborations with Intellia Therapeutics and Caribou Biosciences in order to pursue new therapeutics using CRISPR/Cas.


A technology as potentially lucrative as this one does not develop without controversy. MIT Technology Review recently reported on the competing startup companies aiming to exploit CRISPR technology, and the ensuing battles over intellectual property rights in different organisms. In fact, last year the Broad Institute and MIT were awarded a patent which covers the use of CRISPR genome-editing technology in eukaryotes. Feng Zhang, who is listed as Inventor on the patent, and his lab at MIT were the first to publish on CRISPR’s functionality in human cells.


In a few years, this exciting technology may be a commonplace fixture of the biology lab. Only time will tell if the CRISPR craze produces the amazing breakthroughs that scientists, and the general public, are eagerly awaiting.

The Epigenetics of Metabolic Reprogramming


By John McLaughlin

One of the greatest health issues facing the US is adult and childhood obesity, which exerts a huge human and economic cost on the healthcare system; therefore, its underlying causes are of enormous interest. While environmental factors such as diet and exercise are obviously major contributors, what roles do genetic variation or epigenetic effects play in predisposing individuals to obesity or affecting metabolism in general? Addressing these questions in simple, genetically tractable model systems is a time-tested method for pursuing the answers.


In recent studies on gene expression, especially related to disease, there has been growing interest in the role that epigenetic regulation plays in development and metabolism. On the web page of its “Roadmap Epigenomics Project,” the NIH defines “epigenetics” as referring to “…both heritable changes in gene activity and expression (in the progeny of cells or of individuals) and also stable, long-term alterations in the transcriptional potential of a cell that are not necessarily heritable.” Although the precise working definition of “epigenetic” is still contested, its usage by biologists generally includes phenomena such as DNA methylation and chemical modification of histones, which can cause changes in the transcriptional capacity of chromatin without altering DNA sequence.


In a recent Cell article, Öst and colleagues describe a phenomenon in Drosophila melanogaster which they term “intergenerational metabolic reprogramming” (IGMR). The study is intriguing because it adds to a growing body of work demonstrating intergenerational metabolic effects in a variety of organisms, including rats, flies and worms. An intergenerational effect, as described in this work, results when an environmental stimulus is applied to an organism and elicits a defined response in its offspring, which were not exposed to the original stimulus. Specifically, this study examined intergenerational effects on metabolism that are transmitted paternally; therefore, the mediators of the effect are presumably in the sperm cells at the time of egg fertilization.


Their IGMR experimental paradigm is straightforward: male flies are fed diets of varying sugar levels, mated to normally fed females, and their offspring characterized with regards to different metabolic features. The experiments were controlled to ensure that rearing conditions, such as diet and fly density, were identical among all groups of offspring. Interestingly, the progeny of both high-sugar and low-sugar fed fathers exhibited a similar phenotype: increased triglycerides, lipid droplet size, and food intake compared to offspring whose fathers had moderate sugar diets. In addition, this IGMR response was highly specific in causing metabolic phenotypes, as no general developmental effects were observed such as altered wing size, offspring number, or timing of adult eclosure. In other words, the male fly’s diet had a specific and significant impact on its offsprings’ metabolism.


An obvious question followed these results: what were the changes in gene regulation mediating this phenotypic response? The IGMR effect correlated with increased gene expression from X chromosome heterochromatin, suggesting an epigenetic process such as chromatin remodeling was at work. This idea was reinforced by an observed decrease in H3K9me3 staining, a histone mark typical of heterochromatin, in the fat bodies of IGMR flies. Although the authors didn’t identify the precise molecular mechanisms of paternal-diet-induced epigenetic reprogramming, they did several analyses comparing gene expression levels of flies fathered by control and IGMR males. RNA sequencing and computational analysis showed that several hundred genes, whose upregulation was correlated with the paternal high-sugar IGMR phenotype, are involved in known metabolic pathways in flies and other organisms.


So, what purpose might this type of intergenerational regulation serve? One possibility is that it increases the adaptiveness of offspring to local environments. Parents can “signal” to their offspring, through epigenetic mechanisms, the nutritional state of their surroundings, and thus the next generation of flies will be more metabolically primed to deal with the new environment. Of course, this one study should not be extrapolated to draw unwarranted conclusions about metabolic reprogramming in humans or other animals; much more work remains to be done on the subject. However, it is still exciting to ponder the myriad processes, both known and unknown, that work in the complexities of development. As our understanding of all the facets of gene regulation continues to progress, there will most likely be more amazing surprises to come.




The “Big Data” Future of Neuroscience


By John McLaughlin

In the scientific world, the increasingly popular trend towards “big data” has overtaken several disciplines, including many fields in biology. What exactly is “big data?” This buzz phrase usually signifies research with one or more key attributes: tackling problems with the use of large high-throughput data sets, large-scale “big-picture” projects involving collaborations among several labs, and heavy use of informatics and computational tools for data collection and analysis. Along with the big data revolution has come an exploding number of new “omics”: genomics, proteomics, regulomics, metabolomics, connectomics, and many others which promise to expand and integrate our understanding of biological systems.


The field of neuroscience is no exception to this trend, and has the added bonus of capturing the curiosity and enthusiasm of the public. In 2013, the United States’ BRAIN Initiative and the European Union’s Human Brain Project were both announced, each committing hundreds of millions of dollars over the next decade to funding a wide variety of projects, directed toward the ultimate goal of completely mapping the neuronal activity of the human brain. A sizeable portion of the funding will be directed towards informatics and computing projects for analyzing and integrating the collected data. Because grant funding will be distributed among many labs with differing expertise, these projects will be essential for biologists to compare and understand one another’s results.


In a recent “Focus on Big Data” issue, Nature Neuroscience featured editorials exploring some of the unique conceptual and technical challenges facing neuroscience today. For one, scientists seek to understand brain function at multiple levels of organization, from individual synapses up to the activity of whole brain regions, and each level of analysis requires its own set of tools with different spatial and temporal resolutions. For example, measuring the voltage inside single neurons will give us very different insights from an fMRI scan of a large brain region. How will the data acquired using disparate techniques become unified into a holistic understanding of the brain? New technologies have allowed us to observe tighter correlations between neural activity and organismal behavior. Understanding the causes underlying this behavior will require manipulating neuronal function, for example by using optogenetic tools that are now part of the big data toolkit.


Neuroscience has a relatively long history; the brain and nervous system have been studied in many different model systems which greatly range in complexity, from nematodes and fruit flies, to zebrafish, amphibians, mice, and humans. As another commentary points out, big data neuroscience will need to supplement the “vertical” reductionist approaches that have been successfully used to understand neuronal function, by integrating what has been learned across species into a unified account of the brain.


We should also wonder: will there be any negative consequences of the big data revolution? Although the costs of data acquisition and sharing are decreasing, putting the data to good use is still very complicated, and may require full-time computational biologists or software engineers in the lab. Will smaller labs, working at a more modest scale, be able to compete for funds in an academic climate dominated by large consortia? From a conceptual angle, the big data approach is sometimes criticized for not being “hypothesis-driven,” because it places emphasis on data collection rather than addressing smaller, individual questions. Will big data neuroscience help clarify the big-picture questions or end up muddling them?


If recent years are a reliable indicator, the coming decades in neuroscience promise to be very exciting. Hopefully we can continue navigating towards the big picture of the brain without drowning in a sea of data.

Is There Really a Reproducibility Problem in the Biomedical Sciences?


By John McLaughlin

The ability to reproduce experimental findings is a keystone of the scientific method; it is a major part of what makes modern science such a successful social activity. In the past few years, however, there has been growing alarm over what is being called a “reproducibility crisis” in science, particularly the biomedical sciences.


One especially high-profile example was discussed in a Nature commentary two years ago: The biotech company Amgen, before investing resources into a new drug program, attempted to reproduce the findings of what it considered 53 “landmark” papers in the cancer biology field, and failed to do so for all but six of the publications. This raises the question, are resources being misguidedly invested into therapeutics that are based on flawed results? And more importantly, is this problem unique to pre-clinical research or is it more pervasive?


The replication problem is definitely receiving attention, in both the popular and scientific press. Several of the world’s most elite scientific journals, including Nature and Science, have recently published editorials calling for answers. Unsurprisingly, the proposed solutions have varied. Some are pushing for more extreme approaches, such as hiring independent, third party laboratories to reproduce the findings of a paper before it reaches publication. Other suggestions have been more modest; journals should require increased transparency regarding the description of experimental methods, and raw data should be submitted to open-access repositories where they can be scrutinized more closely.


The call for more rigorous standards of reproducibility is already evoking concrete responses. Last year, several organizations, including PLOS One, the Science Exchange, and Mendeley, together started the Reproducibility Initiative, which bills itself as an effort to “reward high quality reproducible research”. Here’s the basic idea: scientists confidentially submit their experiments for replication (for a fee), choosing among a network of labs with expertise in a chosen technique. If the findings are confirmed, they can boast an “Independently Validated” badge upon publication of the results. They have already received a $1.3 million grant to reproduce 50 of the “most impactful” cancer biology studies published during 2010-2012.


But if this practice becomes a norm, it may place further financial burdens on labs that are already struggling for funds. Are there any more modest, practical changes we can begin making in our own labs to combat this problem? Part of the solution can be improved graduate training of scientists; regarding the day-to-day use of statistics, which types of analysis are appropriate for your experiment, what sample sizes are needed and what conclusions can reasonably be drawn? Miscommunication between scientists may be a factor as well. Today’s biological science involves complicated experimental techniques, using highly complex animal and cell culture models; more intimate knowledge of the methods may be needed in order to faithfully replicate the results.


On the flip side, are institutional and cultural issues also playing a role? The frantic competition for academic faculty positions and grant funding may skew incentives, encouraging post-docs and PIs to cut corners and push for publication as quickly as possible, in high-tier journals. Nobel Laureate Randy Schekman called attention to this problem last year, and vowed to boycott publishing in “glamour” journals like Nature, Cell, and Science.


Whether or not you agree there is a replication crisis in biomedical science, it surely can’t hurt to encourage more openness, transparency, and improved training. The next generation of young scientists would benefit from making these practices a cultural norm.