The Journal of Philosophy, Science & Law

Manuscripts and Articles

Volume 6, July 1, 2006
printer friendly version

The Presence of Hypotheses in the Scientific Literature

Norman A. Desbiens, M.D.*


* Department of Medicine, University of Tennessee, College of Medicine – Chattanooga Unit



The statement of a hypothesis is felt to be a crucial component of properly conducted research.(U.S.E.P.A, 2005) (A.A.A.S., 1990) Once a hypothesis has been conceived, a researcher can then design a study that attempts to support or refute it and determine the level of precision needed to do so, after which the number of study units required can be determined. (Moher 1994, 122) Without a clear statement of the hypothesis, there is risk that investigators will search for associations first and then discuss their “statistically significant” findings later—a process that has a high probability of inflating associations and leading to irreproducible results. (Cui 2002, 347) These are major issues, since validity and reproducibility of findings are undeniable hallmarks of the scientific method.

It has been recently demonstrated that the explicit statement of hypotheses is uncommon in the scientific medical literature. (Desbiens 2004, 319) An exception is in the reporting of randomized control trials, where hypotheses are prevalent by dint of prominent medical journals requiring sample size calculations as recommended by several organizations including the CONsolidated Standards Of Reporting Trials (CONSORT) group. (Altman et al. 2001, 663)

Based on a previous study, experience with the literature, and the fact that guidelines for the reporting of the general scientific literature are less prevalent than for the scientific medical literature, we hypothesized that the explicit statement of hypotheses in the general scientific literature would occur in less than a fifth of articles. We did not think that journal type, length of report or field of science would affect our findings.



We reviewed all articles published in Science and Nature in 2004. These general scientific journals are among the most referenced and have the highest impact factors (in 2001, Science23.33, Nature 27.96) in the scientific literature.(Garfield 1994) We concentrated on the Research Articles, Brevia, and Reports sections of Science and the Brief Communications, Articles and Letters sections of Nature. For purposes of analysis, we considered articles in the Articles and Research Articles sections to be full reports and those in other sections brief reports. Since each journal uses a different and fine-grained subject classification system (for example, Science uses over 30 categories) each article was assigned to the following five broad subject headings: physics or material science, biology, astronomy or geosciences, human biology studies and chemistry.

A research assistant reviewed each article electronically for the root “hypoth” and its inflections. The sentence containing this stem was read in order to determine its meaning. If it referred to the study hypothesis, it was considered to be an explicit statement. A 5 percent random sample of each abstracted field that required interpretation was independently reviewed by the author and percent agreement on abstracted data calculated.

We hypothesized less than 15 percent of articles would use the root “hypoth.” This number would allow us to reproducibly test the association of the presence or absence of an explicit hypothesis as the dependent variable in a logistic regression with journal, scientific subject and type of article (brief vs. long) as independent variables (6 degrees of freedom). (Harrell 2001, 60-61) The c-index was used to indicate the predictive ability of the model (0.5 = random, 1 = perfect prediction). Binomial confidence intervals were calculated for percentages. Variables with p-values of less than .05 were considered statistically significant.

For each article we determined if there was an explicit statement of a hypothesis. In so doing, we eliminated hypotheses that were mentioned after results were reported (such as “…these findings are consistent with the hypothesis that…”). We next estimated the implicit statement of hypotheses. Hypotheses can be implicit if they are stated using a synonym for hypothesis or if an article is done to substantiate or corroborate a hypothesis that has been stated in a previous article. In order to identify implicit statements we sought the dictionary and thesaurus definitions of the word “hypothesis,” from the Merriam-Webster Dictionary and repeated this process for possible synonyms until we had gone through 29 cycles and no new synonyms were identified. Candidate words or stems indicating possible implicit hypotheses are listed in the Appendix. A 5 percent random sample of the articles was then scanned for each synonym. If a synonym was found, the sentence containing the candidate synonym was read in order to determine its meaning. If it referred to the study hypothesis, it was considered to be an implicit statement.



Seventeen hundred ninety-nine articles were found. Of these, 963 (54%) were in Nature and 836 (46 %) were in Science. The distribution of article by field of study was as follows: biology 52%, astronomy or geosciences 17%, physics or material science 14%, human studies 13% and chemistry 5%; 11% were full reports.

In the random sample obtained by an independent abstractor, there was 100% agreement with the primary abstractor (96%, 100%) (95% confidence internal) on length of report, 97% (91%, 99%) on the presence of an explicit hypothesis, 91% (83%, 95%) on presence of an implied hypothesis and 88% (79%, 93%) (79%, 93%) on article type (most of the disagreements were in studies where it was hard to determine whether molecules or cells were human or animal).

The stem “hypoth” was used in 534 of 1799 articles (30%). An explicit statement of a hypothesis was present in 174 articles (10%). From the sample, we calculated the ratio of implicit to explicit hypotheses to be 2.1:1. Using logistic regression analysis (c-index = .72), article type was not associated with the presence of explicit hypotheses, but journal and scientific field were. Compared to physics or material science articles, general (adjusted odds ratio; 95% confidence interval: 9.2; 3.4, 25.3) and human biology articles (10.4; 3.6, 30.3) had much greater odds of containing explicit hypotheses. Compared to articles in Science, those in Nature were less likely to contain explicit hypotheses (0.38; 0.27, 0.54).



This study indicates that researchers infrequently state hypotheses when they report science. Our findings are consistent with a recent estimate of a prevalence of hypotheses at about 25 percent in the wildlife research literature.(Guthery et al 2004, 1325) To the student of scientific methodology, this low frequency might seem paradoxical given the central role of a hypothesis in the conduct of science. (Gauch 2003, 406-409) A notable exception is seen in reporting of randomized controlled medical trials, where, though the explicit reporting of hypotheses occurs in about half, the requirement of many journals to include sample size calculations in the reporting of these trials guarantees that hypotheses can be determined in 4 out of 5 of these studies. (Desbiens 2004, 319)

Why don’t scientists report their hypotheses more frequently? Some scientists may perceive that the first step in science is observation from nature and that deductive reasoning from these observations leads to hypotheses that are testable using inductive reasoning from observations. (Goodman 1999, 673) (Savitz 2001, 975)

This conceptualization could lead one to conclude that both observational studies that do not test hypotheses and studies that test hypotheses are suitable for publication in the medical literature and that less is methodologically required from the former types of studies. The International Committee on Harmonization (USFDA, 1998) hints at this rationalization when it states that “Like all clinical trials, these exploratory studies should have clear and precise objectives. However, in contrast to confirmatory trials, their objectives may not always lead to simple tests of predefined hypotheses. In addition, exploratory trials may sometimes require a more flexible approach to design so that changes can be made in response to accumulating results. Their analysis may entail data exploration.” In contradistinction to this view, it has been strongly argued that even observation from nature must be based on hypotheses that drive the investigation. (Gauch 2003, 406-409) For example, why was an archeologist digging in a particular cave in Indonesia in the first place? Why was an astronomer looking at a particular star cluster? Why did a geologist study a particular rock formation?

The lack of statement of a hypothesis at the beginning of a study calls in to question how a scientist was able to determine the appropriate sample size for analyses. Even in observational studies, registering a hypothesis keeps one honest in the reporting of the chronology of research and helps avoid hindsight and other biases that plague the human intellect. (Tversky and Kahneman 1974, 1124) The potential for this type of problem is suggested in our study by articles that did not mention a hypothesis initially, but later discussed that findings were consistent with a hypothesis.

The folklore of science is replete with the importance of serendipity in science (Fleming’s discovery of penicillin, Roentgen’s discovery of x-rays, etc.) However, basing scientific methodology on anecdotes is at best poor science and at worst folly. These anecdotes represent a numerator without a denominator and do not give an inkling of the relative frequency of serendipitous findings compared to those guided by hypotheses. If something is found completely by serendipity, then the “utmost good faith” (Senn 1997, 404) and objectivity of scientists require that they state the entire circumstances that surrounded their finding in their reports. These types of findings will require reproduction and refinement under the annealing fire of a hypothesis, as was done by Chain and Florey to further study penicillin. (Bowler and Morus 2005, 452) A sorites of hypotheses from general to more specific is crucial to the development of a scientific field.

It may be that scientists have inadequately studied epistemology and the scientific method. They have learned to write manuscripts in their fields from their mentors and the literature, but have they been adequately grounded in the general principles of the scientific method before beginning their specialized pursuits? (Guthery et al 2004, 6-13) The American Association for the Advancement of Science states that science must be taught as one of the liberal arts, “which it unquestionably is,” but the study of epistemology is not required for most science majors.

It may also be that the hypothesis underlying a report in the scientific literature is implicitly understood by scientists in the field and that an explicit statement is unnecessary. However, given the detail required in science and the vagaries of human communication this seems unlikely. Objectivity and explicitness are hallmarks of the scientific method. Our study suggests that there may be twice as many implicitly- as explicitly-stated hypotheses, yet there was the greatest disagreement between observers in determining the implicit hypothesis.

The present study has several limitations. One is that only two journals were selected for review. However, they are among the most widely read and influential journals in science. It is doubtful that other non-medical scientific journals would have a higher frequency of statement of hypotheses. Another limitation is that the abstraction of implicit hypotheses is perforce subjective. We tried to decrease error by random sampling, using synonyms and re-abstracting, but we may yet have underestimated the use of implicit hypotheses. This limitation further strengthens the case for reporting hypotheses explicitly.

How could the present situation be improved? The medical scientific literature has demonstrated that structural requirements for journal acceptance can dramatically improve the statement of hypotheses. (Desbiens 2004, 319) The present study demonstrates that explicit hypotheses, albeit at a low rate, are much more frequently stated for general and human biology studies than for other study types. It may be that the requirements for the statement of hypotheses in the scientific medical literature carry over to the general scientific literature.

In addition to the above considerations, there is still much disagreement in the scientific community about the classification of hypotheses, confusion between statistical and research hypotheses and about whether some hypotheses are too trivial to state. (Guthery et al 2004, 1325) It may also be that the lack of a clear statement of hypotheses contributes to the balkanization of scientific disciplines. Pending further discussions about the classification and role of hypotheses in the reporting of science, general scientific journals could require authors to state their hypotheses or otherwise label their papers as descriptive, and then print these descriptors in a prominent location, perhaps after the paper’s title. This requirement might be more difficult for general scientific journals than for medical journals. Since many general research articles report a theme of research that addresses many hypotheses (a recent example addresses over nine separate studies in one paper), multiple hypotheses may need to be reported. (Cowen and Lindquist 2005, 2185) In addition, the reporting of the general scientific literature has been much more free-wheeling than that of the general medical literature. Most medical journals require the organization of articles into introduction, methods, results and discussion sections. Reporting in most general science journals is often a mélange of this content. Because of this history, bringing more structure into the reporting of general science may be more daunting.

Our study indicates that authors in major scientific journals infrequently state hypotheses. The conduct and reporting of science would be improved by the explicit statement of a hypothesis or the lack of one in every scientific article. A further refinement could be the dated registering of a hypothesis in an electronic repository when a project is formalized, as is being considered for clinical trials. (Antes 2004, 321) Scientific journals could influence this salutary change by requiring the statement of a hypothesis or the lack of one as a condition for publication.



The author would like to thank Mr. Benjamin Moffitt for his assistance in abstracting the articles and Ms. Jenny Post for her work on the early conceptualization of the study.



1. Altman, D.G., K.F. Schulz, D. Moher, et al. CONSORT GROUP (Consolidated Standards of Reporting Trials). 2001. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Annals of Internal Medicine 134:663-694.

2. American Association for the Advancement of Science 1990. The Liberal Art of Science: Agenda for Action, section xiii. Washington, D.C.

3. Antes, G. 2004. Registering clinical trials is necessary for ethical, scientific and economic reasons. Bulletin of the World Health Organization. 82:321.

4. Bowler, P.J., and I.R. Morus. 2005. Making modern science. Chicago: The University ofChicago Press.

5. Cowen L.E., and S. Lindquist. 2005. Hsp90 potentiates the rapid evolution of new traits: drug resistance in diverse fungi. Science 309:2185-2189.

6. Cui, L., H.M. Hung, S.J. Wang, et al. 2002. Issues related to subgroup analysis in clinical trials. Journal of Biopharmaceutical Statistics 12:347-358.

7. Desbiens, N.A. 2004. The presence of hypotheses in the medical literature. AmericanJournal of Medical Science 328:319-322.

8. Garfield, E. “The Impact Factor,” Thompson Scientific, n.d. 18, 2005)

9. Gauch, H.G. 2003. Scientific Method in Practice. Cambridge, United Kingdom: CambridgeUniversity Press

10. Goodman, L. 1999. Hypothesis-limited research. Genome Research 9:673-674.

11. Guthery, Fred S., Jeffrey J. Lusk, and Markus J. Peterson 2004. Hypotheses in Wildlife Science 32:1325-1332.

12. Harrell, Frank E. 2001. Regression modeling strategies. 1st ed. New York: Springer-Verlag.

13. Moher, D., C.S. Dulberg, and G.A. Wells. 1994. Statistical power, sample size, and their reporting in randomized controlled trials. Journal of the American Medical Association. 272:122-124.

14. Savitz, D.A. 2001. Prior specification of hypotheses: cause or just a correlate of informative studies? International Journal of Epidemiology 30:975-958.

15. Senn, S. 1997. Statistical issues in drug development. New York: John Wiley & Sons.

16. Tversky, A., and D. Kahneman. 1990. Readings in uncertain reasoning. San Francisco,CA,: Morgan Kaufmann Publishers Inc.

17. U.S. Department of Health and Human Services. Food and Drug Administration. Center for Drug Evaluation and Research (CDER). Center for Biologics Evaluation and Research (CBER) September 1998.Guidance for Industry. E9 Statistical Principles for Clinical Trials.

18. “What is the scientific method?” Environmental Protection Agency, n.d., (October 31, 2005)


Correspondence and reprint requests to:

Norman A. Desbiens, M.D.

Department of Medicine, University of Tennessee College of Medicine

975 East Third Street

Chattanooga, TN 37403

Tel: (423) 778-2998 Fax: (423) 778-2611



Table 1. Characteristics of articles (N=1799)
Characteristic %
  Science 46
  Nature 54
  Regular 11
  Short 89
  Physics or material science 14
  Biology 52
  Human Biology 13
  Astronomy or Geosciences 17
  Chemistry 5
Any mention of the stem “hypoth*” 30
Explicit statement of hypothesis 10


Appendix. Candidate words or stems (*) used to suggest implicit hypotheses 

account corroborate grounds position search
aim could be infer postulate see
analyze decide inquire predic* settle
anticipate decision inspect preferable show
appraise deduce intention premise sought
ascertain demonstrate invalidate presume speculat*
ask desideratum investigate probe study
assess desire invisage proof substantiate
association determine judge propose suppos*
assum* discover law prove surmise
basis document learn purpose survey
better enquire look quaesitum superior
check establish make quantify suspect
compar* evaluate object question test
concession examine objective rational* theor*
confirm excogitate opine reason thought
confute expect perpend rebut understand
conjecture explore plan reckon unearth
consider find out ponder resolve verify
contemplate goals posit scrutinize weigh


Return to Home Page