The Journal of Philosophy, Science & Law
Volume 11, October 3, 2011
printer friendly version
Ray Greek,* Niall Shanks,** and Mark J. Rice***
* President, Americans For Medical Advancement
** Department of Philosophy, Wichita State University (deceased)
*** Department of Anesthesiology, University of Florida
The current use of animals to test for potential teratogenic effects of drugs and other chemicals dates back to the thalidomide disaster of the late 1950s and early 1960s. Controversy surrounds the following questions:
We review the literature in order to address these questions.
Keywords: thalidomide, teratology, species, animal, history, prediction
Nonhuman animals, subsequently referred to simply as animals, are used in research and testing as models for humans. We will refer to such uses generically as animal models. A thorough examination of the use of animals as causal analogical models or predictive models for a specific disease or treatment must place the empirical evidence into the context of our current theories and philosophy of science. Any discussion of animal models in biomedical research should occur in the context of evolutionary biology, evolutionary developmental biology, genetics and genomics, and complex systems in addition to our current understanding of the philosophy of science. Here, we examine one very specific historical instance of drug development gone awry and consider what role animal models actually did play or what role they could have played under ideal circumstances. The issue being discussed does not therefore concern animal protection or the ethics of animal testing, but rather an examination of the promise and limitations of biomedical research aimed at modeling that purports to predict human outcomes and safety. The philosophical and legal implications of this will logically follow. Having examined the general aspects of animal-based research and testing in other publications [1, 2], we will very briefly outline those arguments before examining the thalidomide issue.
All members of the animal kingdom are examples of complex systems. The characteristics of a complex system are important in any discussion about using one species to predict the response of a second. Such characteristics [4-6] include the following:
When the characteristics of complex systems are placed into the context of living systems evolving through time, with different evolutionary trajectories, we can hypothesize that very small differences between species (initial conditions) will manifest as profound differences to perturbations such as drugs and disease. Empirical evidence, some of which we will discuss, confirms this hypothesis.
Thalidomide as presented in the literature Toxicity testing has routinely been performed prior to a drug being released into the market. Part of the toxicity testing process tests teratogenicity—the ability to cause birth defects or congenital anomalies—the study of which is called teratology. Thalidomide is probably the most infamous teratogen in history. Thalidomide (N-phthalidomido-glutarimide) initially came to the attention of scientists because it was similar in structure to two sedatives introduced in Germany in the early 1950s—diazepam and barbital. Based on this similarity, the Germany-based company Chemie Grünenthal developed, manufactured, and marketed thalidomide as a sedative in 1957. The first reports that thalidomide caused severe congenital anomalies were made independently by McBride and Lenz.[19, 20] Immediately thereafter, and continuing to the present day, laboratories began searching for a model organism in order to 1) explain the mechanism by which thalidomide caused congenital anomalies and 2) to prevent future thalidomide-like occurrences. One purpose of this article is to judge the success of those endeavors.
The thalidomide disaster has present day implications. The role animal models played, or could have played, in the thalidomide disaster is frequently cited in support of animal-based research. Gad states that animal testing for toxicity and pharmacology is predictive for humans, has worked well in predicting human responses for centuries, and further observes: “Current testing procedures (or even those at the time in the United States, where the drug was never approved for human use) would have identified the hazard and prevented this tragedy.” (Emphasis added) This theme has recurred in the scientific and popular literature. The UK-based group Understanding Animal Research stated that fetal effects from maternal ingestion, also called fetal transfer, was unknown in the 1950s and that had thalidomide been tested on pregnant animals: “ . . . the same birth defects would have shown up in the animals - as they did subsequently - and thalidomide would never have been used by pregnant women.” (Emphasis added) According to the UK-based group Pro-Test:
. . . thalidomide did initially pass safety tests in animals but this was because the proper tests were not performed: thalidomide was not tested on pregnant animals. If a thorough battery of tests had been performed in animals, the teratogenic effects would have been caught. Thalidomide was never approved for sale in the USA because the Food and Drug Administration felt that not enough testing had been carried out. After its withdrawal from the market, thalidomide was tested on pregnant animals and found to induce birth defects.
Similarly, Ringach has claimed: “You may not know that initially there was no testing of thalidomide in pregnant animals. Once the drug was pulled off the market, additional tests in animals were done, and it was found that mice, rats, hamsters, marmosets and baboons all suffered similar effects as observed in humans (references below).” (Emphasis added)
Speth stated that the thalidomide disaster happened secondary to inadequate animal testing and that: “Of note, thalidomide was not approved for morning sickness in the USA because FDA inspector Frances Kelsey required it to be tested on pregnant animals.” We will also examine the above claim about Kelsey later in this article.
In testimony before the Science, Research and Technology Subcommittee of the House of Representatives on October 14, 1981, Ernst Knobil, chairman of the Department of Physiology at the Pittsburgh School of Medicine and Past President of the American Physiological Society and of the Endocrine Society stated: “This drug produces the same deformities in animals as it does in human infants. Because we paid attention to this fact in the United States our children were spared these unforeseen catastrophic effects of a supposedly harmless drug.” These are but a few of many similar examples.[27, 28]
The thalidomide controversy involves numerous issues. Our claims are as follows and will be examined in order:
Scientists did know that chemicals crossed the placenta
A myth dating back to the Middle Ages said that the placenta did not allow harmful chemicals to cross over to the fetus. Some  have claimed that animal studies, specifically studies using pigs in the 1920s and 1930s were responsible for finally disproving this myth. In fact, scientists had good evidence that the placenta was permeable in the late 1800s based on human observations.  By the 1930s, a combination of human and animal data had definitively disproven the myth. Moreover, drugs like thalidomide, that crossed the blood brain barrier thereby decreasing nausea, would also be expected to cross the placenta.
In 1874, Zweifel  performed research that revealed that chloroform crossed the human placenta. (Also see Marx .) This was again demonstrated in dogs in 1912. Other anesthetic agents such as opioids, barbiturates, and tranquilizers (e.g., chlorpromazine), were also known before the 1950s to cross the placenta. In 1878, a case was published that reported sodium salicylate was found in the urine of the baby after having been given to the mother 30 minutes prior to delivery.  This appears to be the first time the scientific literature suggested that chemicals were able to cross the placenta. In 1909, alcohol was measured in the umbilical cord blood and revealed to be in the same concentration as in the mother. The alcohol had been administered one hour prior to delivery . Between 1909 and 1933, Nicloux and others [34-37] had conducted research that led them to think that drugs such as quinine passed through the placenta. Taylor wrote in 1935: “that the placenta is permeable to drugs is well established.” Dille  in 1934 had shown that Amytal passed the placenta in cats, rabbits, and guinea pigs and pointed out that the effects of morphine had been demonstrated numerous times on the human fetus. Pettey , in 1912 had pointed out a case where a newborn infant from a mother addicted to morphine was also addicted and went so far as to suggest that any infant born to an opiate addicted mother should receive an opiate for three days following birth. He based this on the fact that opiates crossed the placenta and, since the placenta had been severed, no more opiates were available for the infant. Burnet , in 1920 suggested the same. Taylor , in 1932 also reported that cyanosis of the newborn had all but been eliminated by abandoning twilight sleep, a practice involving giving opiates to women in labor. Taylor summarized by stating: “The obstetrician has long since recognized the permeability of the placenta to drugs.” Clearly, human and animal data was available in the 1930s that proved drugs did cross the placental barrier.
(Statements such as Taylor’s must be interpreted in the context of the day. The fact that any drugs crossed the placenta would have been viewed as proof of the concept that at least some drugs were capable of crossing, not that all drugs could or did cross the placenta. As we now know, placental transfer is highly dependent upon a number of factors including the lipophilicity of the drug. All drugs do not, in fact, cross the placenta. For example, polar molecules such as muscle relaxants (like curare) do not cross.)
Thiersch [42-44], in the 1950s, had shown repeatedly that some drugs could cross the placenta and cause the mother to abort the fetus. Sjöström and Nilsson  referred to Thiersch when they stated in court testimony at Södertälje that by 1959 at least 25 chemicals had been shown to affect the fetus, either killing the fetus or inducing malformations, that these studies had been performed in the US, Japan, and Europe, and that: “The findings by the various investigators were published in scientific journals and distributed internationally.” Other studies supporting the placental transfer of a number of drugs were also available to the manufacturer of thalidomide.[46-51] It had been shown that any chemical with a molecular weight less than 1000 was at least a candidate for crossing the placenta. Page  wrote in 1957 that chemicals with a molecular weight below 350 or 450 Da were known to be capable of crossing the placenta. (Also see Marx .) The molecular weight of thalidomide is 258 Da and entering the fetal circulation should have been considered a strong possibility based on the knowledge of the time.[47, 48]
We conclude that the scientific community was clearly aware of the fact that chemicals could cross the placenta before thalidomide was developed and marketed. With the passage of time, more has been learned about the mechanisms of placental transfer. Chemicals can pass via simple diffusion, pumps, plasma membrane carriers, and biotransforming enzymes  but the general principle was appreciated well before the 1950s.
Scientists did test thalidomide on pregnant animals
Testing for teratogenicity was common practice in the pre-thalidomide era. Review articles had been published in the 1950s and early 1960s tracing the history of teratology testing.[54-58] The first use of animals for this purpose dates back to 1891, when Dareste experimented with chick embryos and induced congenital defects. Scientists had studied the effects of radiation, nutrients, hormones, and eventually chemicals including specific drugs. Wilson  reported that such studies included radiation of cats , guinea pigs , and rabbits. Gudernatsch and Bagg  reported on the effects of radium in rats. Hanson also studied radiation in rats  while DeNobele and Lams  studied guinea pigs. In 1936, Job et al.  studied the quantitative relationship between teratogens and the resulting defects. Hoskins  studied the teratogenic effects of nitrogen mustard and Gilman et al.  the effects of trypan blue. Many others [58, 69-72] continued this research using various mammals. Wilson  went so far as to suggest that publications regarding teratogens and teratology in general did not even increase because of thalidomide.
Nobel laureate chemist Roald Hoffmann  observed:
Indeed animal testing for teratogenicity of new drugs was routine in the major pharmaceutical companies. Hoffmann-LaRoche’s Roche Laboratories published a major reproductive-system study of its Librium in 1959. Wallace Laboratories did so for Miltown in 1954. Both incidents antedate the thalidomide story.
The Sunday Times of London, on June 27, 1976 published the results of their very extensive investigation into the thalidomide disaster and stated that, by 1958, teratogenicity testing was routine. Moreover, according to a German medical journal of the era, toxicity tests on animals were conducted prior to thalidomide’s release.
Because all the records from Grünenthal were destroyed , we will never know exactly what testing was performed with thalidomide. However, we do know that teratogenicity testing on animals was common at that time. This leads us to the critical question: Would more animal testing have prevented the thalidomide disaster?
More animal testing would not have prevented the disaster
Regardless of what tests Grünenthal conducted, two questions naturally arise. First, what would thalidomide’s teratogenicity profile have looked like with more animal testing? Second, what would thalidomide’s teratogenicity profile look like with our current test platforms?
Thalidomide is a complex molecule.[76-78] It exists as an S(-) and R(+) isomer and each has different effect profiles, a common trait with anesthetic-like drugs. The S(-) form is thought to be the teratogen. The toxicity of a drug may be determined, in part, by its metabolites. Thalidomide breaks down into a number of metabolites and therefore the effects of each metabolite’s interaction with the cell may vary considerably. Lu et al.  discovered major interspecies differences in the hepatic metabolism of thalidomide and found that the plasma half-life of thalidomide was shorter in mice than in humans. Rabbits had half-life times between humans and mice. Lu et al. found no hydroxylated thalidomide metabolites in humans despite the fact that these are formed in a high percentage in some species noting that: “The amount of 5-hydroxythalidomide formed was high in mice, lower in rabbits, and barely detectable in humans.” These differences in metabolism may have implications for species variability with regards to toxicity. Ando et al.  also compared the ability of five different species to form the thalidomide metabolites 5-hydroxythalidomide and cis-5-hydroxythalidomide. They found that humans had the lowest hepatic microsome activity, which is the site of metabolism for these pathways.
There are other differences. Chung et al. draw our attention to the following observations:
Plasma concentration-time profiles for the individual [human] patients [treated with thalidomide] were very similar to each other, but widely different pharmacokinetic properties were found between patients compared with those in mice or rabbits [also given thalidomide]. Area under the concentration curve values for mice, rabbits, and multiple myeloma patients were 4, 8, and 81 • mol/L • hour, respectively, and corresponding elimination half-lives were 0.5, 2.2, and 7.3 hours, respectively.
Thalidomide causes numerous birth defects in humans including microphthalmia (small eyes), coloboma (a hole in a part of the eye), other abnormalities of the eyes, and abnormalities of the ears, internal organs, and genitals.[82-86] However, the congenital anomaly for which it is most infamous is phocomelia. Phocomelia comes from the Greek φώκη meaning seal and µέλοςmeaning limb— seal limbs. It was coined by Étienne Geoffroy Saint-Hilaire in 1836. Today, it is synonymous for the thalidomide-induced complete absence of limbs or the presence of very abbreviated limbs. In order for the human fetus to suffer from phocomelia, thalidomide must be administered between days 39-45 post-menstruation. Other birth defects can be seen if thalidomide is administered from day 34-50 post-menstruation.[87, 88]
In the mid-1960s, Karnofsky  stated what has subsequently been referred to as Karnofsky’s law:
Any drug administered at the proper dosage, and at the proper stage of development to embryos of the proper species-and these include both vertebrates and invertebrates-will be effective in causing disturbances in embryonic development.
In other words, all medications are teratogenic in at least one species, if given in a large enough dose and at the proper time in development. To illustrate the sensitivity level for some species to some chemicals, even sodium chloride (common table salt) and water are teratogens in some species when they are administered in the required dose at the right time.[91-93] An immense amount of animal testing has proven Karnofsky correct. Therefore, some form of congenital abnormality can be found in one or more animal species for essentially every drug.
The significance of this observation is further complicated by the fact that even drugs that are routinely given to women in pregnancy may occasionally be associated with a birth defect in an individual woman—an idiosyncratic response. The only way to assure that women will not give birth to a baby with birth defects secondary to taking drugs during pregnancy is to avoid administering or prescribing drugs to women who may be pregnant. There is no foolproof testing scheme, even today, for determining whether a specific woman will react to a specific drug by giving birth to an infant with congenital malformations. The drugs that are prescribed to pregnant women are given based on a long history of safety, epidemiological data relating the same, or based on chemistry that precludes placental transfer, not because of animal testing. This is why two-thirds  of new drugs in the US are labelled: “Category C: Either studies in animals have revealed adverse effects on the fetus (teratogenic or embryocidal or other) and there are no controlled studies in women, or studies in women and animals are not available. Drugs should be given only if the potential benefit justifies the potential risk to the fetus.” (The FDA has a classification system for drugs to ostensibly better inform physicians and patients regarding the risks of birth defects from drugs. The FDA has classified all drugs into the categories A, B, C, D and X. Drugs that have been studied in humans and determined to be completely safe are classified as Category A. Drugs known to be teratogenic are classified as Category X.) In light of Karnofsky’s law, it is not surprising that most drugs are classified as Category C.
Many species have been studied in an attempt to discover the mechanism for thalidomide–induced phocomelia. These studies have been frustrating for many reasons. First, thalidomide does not cause phocomelia in all species or even most. Second, thalidomide is metabolized to possibly greater than 100 metabolites.[77, 96] Third, an abundance of factors that can lead to phocomelia. Over thirty hypotheses have been offered to explain thalidomide’s developmental effects. Therapontos et al.  recently provided evidence that, at least in some animals, angiogenesis inhibition is responsible for the limb defects seen from thalidomide. Thalidomide acts as an anti-inflammatory agent and as an angiogenesis inhibitor, either of which can account for the changes in limb formation associated with its use. Changes in the PTEN/Akt signaling system have also been proposed. Thalidomide is known to change gene expression profiles.[78, 97, 100-102] Parman et al.  demonstrated that oxidative stress can cause teratogenicity.
In addition to different experimental methods yielding different results, different species have also responded differently to thalidomide. Mice do not exhibit classic thalidomide toxicity even at 4,000 mg/kg.[19, 104] Thalidomide causes congenital anomalies in humans at 0.5 mg/kg. It was prescribed in 50 mg tablets at 50–200 mg/day. Thalidomide has been extensively tested and shown not to cause human-like limb deformities in rodents   [106-113], or hamsters. According to Homburger et al. : “. . . rats predominantly showed increase numbers of fetal resorptions and only rarely some fetal malformation . . . random-bred hamsters also failed to show fetal malformations after thalidomide ingestion by the mothers . . .” Other studies also revealed an increase in fetal reabsorption. Newman et al. state: “Although evidence of at least some embryotoxicity was observed in most studies using oral doses =100 mg/kg/day, TH [thalidomide] exposure in the rat does not result in a typical or specific teratogenic response” . Somers  also showed that rats failed to predict the teratogenic effects of thalidomide. (Also see .)
However, when rodent cells are exposed to thalidomide in aortic ring or mouse cornea models, angiogenesis is inhibited. [98, 119, 120] Thus, one could infer that development might be affected. The lack of angiogenesis inhibition activity in intact rats and mice may be because thalidomide is metabolized differently in rodents than primates [105, 121], or because of some other characteristic of a complex system. Other mechanisms, such as receptor differences and placental barriers may also be involved. Carney et al.  noted that the dose of the chemical could also vary, causing different results in teratogen testing.
Obviously most of the aforementioned studies were conducted after thalidomide’s human side effects, along with interpretive ambiguities, were known. We remind the reader that we are including such recent studies for the purpose of examining thalidomide’s toxicity profile under today’s conditions, where ambiguities in human and interspecific response are better understood than they were in the 1950s, addressing the question of what the results would be if thalidomide were tested with current knowledge.
Historically, writers and scientists, for example Ringach in 2010 , have cited various combinations of studies, including those by by McColl , Dipaolo and others [107, 124-126] in support of their position that rats suffered from human-like defects when thalidomide was administered chronically. This is, in fact, not what these studies revealed and hence is worthy of further consideration over the next several paragraphs. In the McColl study, the authors found skeletal abnormalities but not phocomelia. A later study by the same authors revealed no long bone abnormalities , such as occur with thalidomide. Other authors [111, 128] noted skeletal abnormalities but not long bone defects, along with an increase in reabsorption rate. Numerous drugs  can cause skeletal abnormalities and, as these defects bear little to no resemblance to the specific defects caused by thalidomide, are probably caused by a different mechanism. In any event, considering that thalidomide caused so many congenital defects, the presence of any one could also be expected from virtually any drug if given at the right time in development to the right species at the right dose. Therefore, there are clear limitations to the use of animal studies as models predictive of human responses. In view of these observations, McColl et al. ended their discussion by stating: “The factors influencing the response of an embryo to a drug interact in such complicated ways that one cannot predict whether a drug will be teratogenic in man from results obtained in animals.”
The misinterpretation of the 1962 Pliess article  may be because Pliess, from the University of Hamburg, began by stating that he conducted necropsies when today, among native English-speakers, we would refer to what he did as autopsies. The word necropsy has historically been used to refer to the examination of dead humans or animals but today almost exclusively refers to the examination of animals. Thus if one only read that Pliess conducted necropsies then noted the results included phocomelia, one could mistakenly believe he was referring to rats. Pliess goes on to describe the testing of thalidomide on rats and observed that: “the [rat] embryos were not harmed by large doses of thalidomide. None showed malformations.”
In 1962, Seller  studied rats, mice and rabbits and observed no abnormalities in the rats, mice, and silver-grey rabbits. This lead Seller to state:
. . .negative results in this respect may prove to be no indication that the drugs are safe for human use . . . Consequently the most satisfactory method at the present time, and in the light of the thalidomide experience, of dealing with drugs with an unknown effect in the pregnant woman, would appear to be not to administer them except if absolutely life-saving.
Since thalidomide, Seller’s advice, which is consistent with the basic science behind Karnofsky’s law, has been standard of care for human medical practice.
King and Kendrick  studied rat fetuses after administering thalidomide at various doses and stages during development over a 2½ year period. They noted numerous gross malformations, occurring in 14.6% of the fetuses, and an increase in reabsorption but no phocomelia. Dipaolo  discussed various congenital anomalies in mice fetuses from mothers orally administered 31 or 62 mg/kg of thalidomide daily for 4-5 days, but once again the main effect was an increased rate of resorption. Abnormalities of the bone, including phocomelia, were reported occasionally in the mice, but further investigators have not been able to confirm this finding. Blake et al.  even showed that the metabolite thought to cause congenital malformations (an arene oxide) was absent in the rat casting more doubt on the few instances of long bone abnormalities in rat fetuses.
McColl et al.  summarize the research with rats by stating in 1965: “The results are interpreted to cast doubt upon the reliability of the use of the rat as an adequate laboratory test for possible teratogenic agents in man.” Hau  acknowledged in 2003:
Thalidomide, which crippled 10,000 children, does not cause birth defects in rats or many other species but does so in primates. A close phylogenetical relationship or anatomical similarity is not a guarantee of identical biochemical mechanisms and parallel physiological response, although such is the case in many instances.
It has been argued that testing rats for teratogenicity is appropriate, as every drug shown to cause birth defects in humans has also caused birth defects of some kind in rats. The problem with relying on rats as predictors for humans is obviously that a high sensitivity (sensitivity is the true positive rate of a test, the probability that the test will catch all the cases that are in fact examples of what it is looking for, for example drugs that cause birth defects) does not imply a high positive predictive value. (See tables 1 and 2.) The positive predictive value (PPV) is the proportion of the population that tested positive that actually exhibits the trait being tested for.
|T+ = Test positive|
|T- = Test negative|
|T = True|
|F = False|
|P = Positive|
|N = Negative|
|GS+ = Gold standard positive|
|GS- = Gold standard negative|
In our discussion, the gold standard would be the human response and the test would be the animal model.
|Sensitivity = TP/TP+FN|
|Specificity = TN/FP+TN|
|Positive Predictive Value = TP/TP+FP|
|Negative Predictive Value = TN/FN+TN|
Calculations for sensitivity, specificity, positive predictive value, and negative predictive value.
Neither does a high sensitivity imply a high negative predictive value (NPV). NPV is the proportion of the population that tested negative that actually lacks the trait being tested for. In other words, when the test reports the drug does not cause birth defects, it really does not. Rats also suffer birth defects from roughly 95% of drugs that do not cause birth defects in humans. Therefore, rats have a very low PPV. Therefore, if society were to ban every drug that produced a birth defect in rats, we would market no new pharmaceuticals. Shepard and Lemire  pointed out in 2004 that about 1500 agents were known, at that time, to be capable of producing birth defects in individuals of some species. However, only around 40 were known human teratogens.Prediction in science does not mean the same thing it means in lay society. In society, one can claim he predicted the outcome of the ballgame. In science, in order for a test, a modality, or a practice in general to be considered predictive it must demonstrate results with a high PPV and NPV and this must be based on testing the practice many, many times. (See [1, 2] for more on this.) Thus the predictive value (in the scientific sense of the word) of animals in general and rats specifically for teratogenicity is so low it can be considered nonexistent.
But what about our closest evolutionary cousins—the nonhuman primates? With the exception of the nonhuman primate known as the Greater Bushbaby (Galago crassicuudatus), essentially all nonhuman primates manifested phocomelia in response to thalidomide.[131-138] However, what are we to conclude about the ability of nonhuman primates (NHPs) to predict human response to potential teratogens in general? Was thalidomide the exception or the rule? Researchers, based on the thalidomide experience, had high expectations for NHPs as an animal model for teratogenicity. Unfortunately the expectation was thwarted as the predictive value of NHPs turned out to be suboptimal. Schardein  states that the results of using NHPs for teratogenicity testing have been “disappointing.” In part, this is because of 15 chemicals known to be teratogenic in humans, only 8 were teratogenic in one or more species of the NHPs. Tests with other chemicals gave equally divergent results. Aspirin, which is routinely and safely used by pregnant women, showed variable effects in rhesus macaque monkeys. Aminopterin, a folate inhibitor used as a cancer drug, is teratogenic in humans but two studies in monkeys were negative. Testing of 5-fluorouracil gave conflicting results and trypan blue gave consistently negative results despite being regarded as a universal teratogen. Rosenbauer  stated that because of important differences between humans and other primates, NHPs would never be able to predict teratogenicity in humans. Further research  would substantiate this view. Drugs known to damage the human fetus are found to be safe in 70% of cases when tried on primates. Even testing thalidomide on NHPs is problematic. Hendrickx et al.  found that low doses of thalidomide at the time of conception had varying effects on NHPs. It did prevent implantation in rhesus macaques, but not in baboons. However thalidomide did result in fetal deformities in the baboon. Monkeys were affected at 10 times the normal dose [142-144] but not at the usual human dose.[117, 132] Clearly, NHPs do not provide a high enough positive predictive value or negative predictive value upon which to base prescribing or marketing decisions.
The White New Zealand rabbit is the most cited example of replicating human teratogenicity for thalidomide. But even the White New Zealand rabbit demonstrated phocomelia only at a dose between 25 and 300 times that given to humans.[111, 112, 117, 132, 143, 145] Schardein , in 1975, stated regarding thalidomide:
In approximately 10 strains of rats, 15 strains of mice, 11 breeds of rabbits, 2 breeds of dogs, 3 strains of hamsters, 8 species of primates and in other such varied species as cats, armadillos, guinea pigs, swine and ferrets in which thalidomide has been tested, teratogenic effects have been induced only occasionally.
Manson and Wise  summarized the thalidomide testing by stating that the “mouse and rat were resistant” and that rabbits and hamsters and even different strains of the same species gave variable responses.
An FDA study by Frankos  examined the sensitivity and specificity of animal models for teratogenicity for a number of drugs. Of the 38 drugs known to be teratogens in humans, only one failed to show such activity in at least one animal species. Two or more species showed a positive response to 80% of the drugs. The mouse reacted as humans 85% of the time with rats, rabbits, hamsters, and monkeys reacting the same 80%, 60%, 45%, and 30%, respectively. Specificity was even less impressive. Animals were tested with 165 chemicals known not to cause congenital abnormalities or known to have a long track record of safety. Only 29% were negative in all species with 51% negative in multiple species. False positives were found for 41% of chemicals in more than one species. It should also be pointed out that these figures did not compare strains and we would expect differences among various strains. Furthermore, the raw numbers lead to positive and negative predictive values that would not be useful in making treatment decisions for humans. Over 1000 chemicals have been shown to be teratogenic in one or more animal species, but for which there is no evidence of teratogenicity in humans. This illustrates why one cannot rely on sensitivity alone when making decisions. Many have bemoaned the lack of predictivity (the lack of a high PPV and NPV) of animal models.[148-153] Carney et al.  stated that while teratogenicity testing relies on animals, no species can actually predict human response.
Taussig , writing in 1962, provides an interesting note regarding Grünenthal’s after-the-fact testing:
Grunenthal has tried to reproduce phocomelia in rats, mice, and rabbits and has failed. In Keil the drug was fed to hens and the chicks were normal. Grunenthal has shown that the drug passes through the placenta of rabbits but in their experience the offspring were normal.
Recent work  has revealed the chick embryo to be sensitive to thalidomide despite early reports that they were not , proving yet again that even the same species under different testing conditions will yield different results. Stephens  perhaps unintentionally summarized the mindset of some that use animals in teratology research today when he stated:
The historic irony is that even though it has been clearly demonstrated that thalidomide does not cause limb defects in mice and rats, these animals are still being used to examine mechanisms of thalidomide’s teratogenic action.
Lasagna  commented that once a chemical is known to cause birth defects in humans, an animal species or strain can usually be found that will replicate the response, but that this is not the same as prospectively predicting this response. As we have noted, neither is the response the same across species, thus once again raising the question, when different species demonstrate different outcomes: Which species resembles human in this particular case? He notes, for example, that cortisone is a dysmorphogen in rabbits and mice but not rats and that: “Carbutamide produces malformations in the eyes of rats and mice, but facial and visceral malformations in rabbits.” Lasagna , also states that the practice of testing medications on animals gives a “pathetic illusion [that] simply doing enough animal testing will predict all human toxicity.” Similarly, Smithells  states that “The extensive animal reproductive studies to which all new drugs are now subjected are more in . . . the nature of a public relations exercise than a serious contribution to drug safety.” Teeling-Smith  perhaps summed it up best in 1980 stating:
There is at present no hard evidence to show the value of more extensive and more prolonged laboratory testing as a method of reducing eventual risk in human patients. In other words the predictive value of studies carried out in animals is uncertain. The statutory bodies such as the Committee on Safety of Medicines [Britain’s counterpart to the FDA] that require these tests do so largely as an act of faith rather than on hard scientific grounds. With thalidomide, for example, it is only possible to produce specific deformities in a very small number of species of animal. In this particular case, therefore, it is unlikely that specific tests in pregnant animals would have given the necessary warning: the right species would probably never have been used.
As noted above, some have opined that the United States government did not approve thalidomide because animal tests had raised suspicions about the drug. Reality tells a different story. Frances Kelsey, a medical officer at the FDA, stated the decision not to allow thalidomide was based on resulting peripheral neuritis—numb and tingling fingers in adult humans. Animal tests had nothing to do with the decision. Kelsey noticed a case report in the British Medical Journal that linked thalidomide to peripheral neuropathy. She wrote to Richardson-Merrell, who was trying to obtain license to market the drug, expressing concern that they had known about these cases but had failed to include them in their report (this turned out to be true) and stated they would now have to prove that the link between thalidomide and peripheral neuropathy was false if they wanted to gain a license for thalidomide in the US. Interestingly, Kelsey had also aided in repudiating the notion that the placenta was impermeable to all drugs. In the 1930s, she and her future fiancée conducted research on quinine using rabbits. They found that adults easily metabolized the drug, but that fetuses did not metabolize any of the drug. They showed that quinine crossed the placenta and that the mother and fetus did not necessarily metabolize all chemicals the same.
The deleterious effects of thalidomide appeared very early, after the disaster was noted, to vary among species and strains.[162-164] Testing numerous species would have produced a hodgepodge of results and been uninterpretable to the scientists of the 1950s, just like the results from animal testing for teratogenicity are uninterpretable to scientists today. Red flags are not raised because a drug-to-be is teratogenic in an animal species. This is the norm and is part of the reason why most new drugs are labelled Class C. Animal tests did not and could not have prevented the thalidomide disaster and, in fact, delayed the acknowledgment of its severe side effects. When an Australian physician observed the link between thalidomide and birth defects, he sounded an alarm  that was largely ignored because of messages from Chemie Grünenthal asserting that if thalidomide crossed the placenta, it would be unlikely to cause harm. We do not believe this was sheer mendacity on Grünenthal’s part, but was based on the results from their animal testing.
The notion that more animal testing would have prevented the thalidomide tragedy stems not from scientific literature but from Senator Estes Kefauver of Tennessee  who stated that, "if thalidomide had been tested in rabbits . . . this whole catastrophe would have been avoided.” This was not the case because, as the famous pediatric cardiologist Helen Taussig  pointed out, the data would have been very messy:
I have not been able to obtain abnormalities in baby rabbits with thalidomide primarily because the massive doses I have used bring on so many abortions…If thalidomide had been developed in this country, I am convinced that it would easily have found wide distribution before the terrible power to cause deformity had been apparent.
Further, even if the data had not been “messy,” if society were to judge every drug on the basis of its effects on any given one species, no new drugs would reach the market, per Karnofsky. Even the results from multiple species do not predict human response.
Interestingly, animals did not predict the sedative qualities of thalidomide. Taussig  wrote inJAMA in 1962 that when thalidomide was first considered for development and tested on animal it had appeared to have no effects whatsoever. Human testing revealed its sedative properties. Taussig continues:
Thalidomide is a syntectic drug which, as the story is told in West Germany, was first conceived and made by Ciba and found by then, to have no effect on animals; therefore, it was discarded. In 1958 Grunenthal developed then drug and tried it on animals; they, too, found it had no effect on animals. Thereupon it occurred to the inventors that it might be useful in epilepsy and was marketed as an anticonvulsant drug. It was soon found to be worthless for epilepsy but it caused sleep. Thereafter, it was sold as a sleeping tablet, a sedative, and tranquilizer. It had a prompt action, gave a natural deep sleep and no hangover. It appeared innocent and safe . . . As previously mentioned, thalidomide does not induce sleep in the usual laboratory animals.
Ideally, knowing what we know now, how would pharmaceutical companies test thalidomide or any other drug to assure safety in pregnant women? The short answer is that such testing is currently impossible. According to Lo and Friedman , more than 90% of all the drugs approved since 1980 have not been adequately tested for teratogenicity. The risk from such drugs is listed as “undetermined.” Given the scattergram effect of thalidomide in terms of producing congenital abnormalities in different species and the fact that no combination of animals has since been shown to be predictive (high positive and negative predictive values) for teratogenicity in humans, we conclude that even if thalidomide were extensively tested today on animals using the best science currently available, it would still be labeled Category C. The reason society has not seen another thalidomide disaster lies not in the fact that all drugs are tested on animals but rather that new drugs are rarely given to pregnant women.
Currently, the only way to determine for certain whether a drug is a teratogen is through epidemiology. Better monitoring of medication intake could facilitate establishing a safety profile.
Current requirements for teratogenicity testing in animals are based on the mistaken notion that animal models can predict human response
As a result of the thalidomide disaster, the US congress, in 1962, passed the Kefauver Harris Amendment. This is an amendment to the 1938 Federal Food, Drug, and Cosmetic Act and mandates that new drugs be proven both safe and effective. (The 1938 Federal Food, Drug, and Cosmetic Act was itself also a result of another disaster; the deaths of over 100 patients from the toxin diethylene glycol. The new wonder drug sulfanilamide was combined with diethylene glycol in order for the drug to be given as a liquid.) Animal models were quickly incorporated for this purpose. The fact that animal models were meant to be used to predict human response and thus prevent another tragedy is made clear from Senator Estes Kefauver quoted above  and the FDA did in fact interpret and enforce Kefauver Harris in that way and continues to do so to this day.
The notion that animal models can be used to predict human response to drugs, such as teratogenicity, and disease continues into the current time, despite empirical and theoretical evidence disproving the notion.[1, 2] Scientists routinely state that animal models can predict human response to drugs and disease based in part on the thalidomide incident. We will expand the above from Gad  as an example:
Biomedical sciences’ use of animals as models [is to] help understand and predictresponses in humans, in toxicology and pharmacology . . . by and large animals have worked exceptionally well as predictive models for humans . . . Animals have been used as models for centuries to predict what chemicals and environmental factors would do to humans…. The use of animals as predictors of potential ill effects has grown since that time . . . If we correctly identify toxic agents (using animals and otherpredictive model systems) in advance of a product or agent being introduced into the marketplace or environment, generally it will not be introduced . . . The use of thalidomide, a sedative-hypnotic agent, led to some 10,000 deformed children being born in Europe. This in turn led directly to the 1962 revision of the Food, Drug and Cosmetic Act, requiring more stringent testing. Current testing procedures (or even those at the time in the United States, where the drug was never approved for human use) would have identified the hazard and prevented this tragedy. (Emphasis added)
Considering that such statements are commonplace [130, 167-171], let us briefly evaluate animal uses in toxicity testing.
In 1978, Fletcher  reported drug toxicity tests and subsequent clinical experience with forty-five major new drugs. Some toxic effects were seen only in animals, while others were observed only in humans. The survey established that 25% of toxic effects observed in animals might be expected to occur as adverse reactions in humans. In one small series in 1990  in which the toxicity in clinical trials led to the termination of drug development, it was found that in 16/24 (67%) cases the toxicity was not foreseen in animals. Suter  reported the results of six drugs tested in humans and animals. Animals and humans shared 22 side effects (true positives), animals incorrectly identified 48 side effects that did not in fact occur in humans (false positives), and the animals missed 20 side effects that did occur in humans (false negatives). This results in a sensitivity for the animal tests = 22/(22+20) = 52% and a positive predictive value = 22/(22+48) = 31%.
The LD50, a test that determines the dose necessary to kill 50% of the animals it is administered to, has been criticized historically for lack of predictive ability. Recently, Dawson et al studied suicide attempts and suicides that used pesticides in Sri Lanka and found that the WHO toxicity ranking, based on LD50 in rats did not correlate to toxicity in humans. Paraquat, for example, was far more lethal than would have been anticipated by the LD50. [175, 176]
Dixit and Boelsterli  state: “Traditional animal toxicology tests predict in the range of less than 10% to ~70% of all human adverse effects.” Even if we assume the higher number, missing adverse effects 30% of the time is unacceptable in medicine and any test that can only accomplish the correct answer 70% of the time will yield a PPV and NPV that means the test is of little use in medicine. Heywood  stated said: “… the best guess for the correlation of adverse reactions in man and animal toxicity data is somewhere between 5 and 25%.”
The above is directly tied in to the fact that attrition rate for compounds in drug development is unacceptably high and the pharmaceutical industry is well aware of this. Over 99.8% of chemicals that come out of the lead optimization process fail to make it to the market. Even drugs that enter Phase I human clinical trials have only an 8% chance of reaching the market. In large part, this is because animal models cannot predict human response in terms of efficacy or toxicity.[15, 181-189] In 2006, then-U.S. Secretary of Health and Human Services Mike Leavitt  stated: “Currently, nine out of ten experimental drugs fail in clinical studies because we cannot accurately predict how they will behave in people based on laboratory and animal studies.” An editorial in Nature Reviews Drug Discovery 2011  echoed this stating: “Unpredicted drug toxicities remain a leading cause of attrition in clinical trials and are a major complication of drug therapy.”
Ultimately toxicity and other data for drug development must come from humans. Littman and Williams  of Pfizer writing about using humans as models for other humans in 2005 stated:
Experimental medicine is the use of innovative measurements, models and designs in studying human subjects for establishing proof of mechanism and concept of new drugs, for exploring the potential for market differentiation for successful drug candidates, and for efficiently terminating the development of unsuccessful ones. Humans are the ultimate ‘model’ because of the uncertain validity and efficacy of novel targets and drug candidates that emerge from genomics, combinatorial chemistry and highthroughput screening and the use of poorly predictive preclinical models…Experimental medicine in contemporary drug development is a business strategy that relies on experiments in humans for the purpose of demonstrating the mechanistic activity of new drugs at safe doses (exposures) and linking that activity to efficacy in patients. This strategy is very much a part of the ‘learning’ phase of early drug development that helps weed out drug failures early and precedes the ‘confirmation’ late phases of development in which high costs demand higher levels of success . . . In the new paradigm, studies in humans increase confidence in the relevance of novel drug targets and largely replace the animal efficacy models that are often poorly predictive of the efficacy of novel agents with unprecedented mechanisms of action (see below)…Until or unless a predictive preclinical model can represent each of these subtypes, humans will remain the ‘ultimate model organism.’
The editorial  in Nature Reviews Drug Discovery that accompanied the article echoed this, stating:
Clearly, one part of the problem [of drug research] is poorly predictive animal models, particularly for some disease areas and drug classes with a novel mechanism of action, a topic we continue to cover in our ongoing 'Model Organisms' series. But arguably the best 'models' for drug discovery are human subjects and as the need to have proof of concept or mechanism for a drug before moving on to larger, more costly clinical trials has never been greater, more big-pharma companies are now embarking on programmes in experimental or translational medicine.
There is another side to the lack of predictive ability of animal toxicity tests. Scientists have expressed concern that useful drugs, including treatments for cancer, have been lost to society because of adverse reactions in animals.  Many commonly used drugs including furosemide [194, 195], isoniazid [196, 197], phenobarbital , digoxin[199, 200], acetaminophen, tacrolimus[202, 203], tamoxifen, and metronidazole are highly toxic to some animal species.
Drug toxicity is not the sole area where animal models do not predict human response. Testing animals for pharmacodynamic and pharmacokinetic properties is not predictive for humans. Mechanisms of disease and disease response also vary considerably. (See [1, 2] ) This is to be expected based on our current understanding of living, evolved, complex systems.
Based on the above, we conclude the following:
The authors became interested in this topic as a result of reading: The Dark Remedy  and Thalidomide and the Power of the Drug Companies .
About the Authors
Ray Greek, MD is the president of the not-for-profit group Americans For Medical Advancement (www.AFMA-curedisease.org). Dr. Greek is the corresponding author and can be contacted at DrRayGreek@aol.com.
Niall Shanks, PhD (deceased) was the Curtis D. Gridley Distinguished Professor of History and Philosophy of Science at Wichita State University.
Mark Rice, MD is chief of the liver transplant division at the University of Florida, Department of Anesthesiology, has seven US patents, and reviews for several major journals.
Return to Home Page