A Primer on Scientific Testing
by Brian Dunning
Filed under General Science, Logic & Persuasion
December 11, 2006
Podcast transcript | Listen | Subscribe
Also available in Japanese
Today we're going hide our crib notes in our hats, pull our sleeve down to
cover the notes written on our arms, and dive into the world of testing.
Much of the feedback I received on the wheatgrass juice episode concerned
claims that wheatgrass juice has already been tested and been proven to cure
many different diseases and promote many types of well being. And if I had
a dollar for every email I received accusing me of being in the employ of Big
Evil Corporations who are frightened of wheatgrass juice, I would be able to
afford a shot of this quack elitist scam beverage every single day.
So I hereby present this primer on scientific testing, to better equip the
layperson with the ability to determine the validity of claims being made,
or the published results of supposed research. Valid claims and real research
will follow the whole process that I'm about to outline, and they'll tell you
about it too. If the poster you read in Jamba Juice doesn't detail the testing
procedure used to substatiate its claims, or if the testing procedure is not
similar to that outlined herein, then you have very good cause to be skeptical
of any claims that it makes. If something works, its makers should be happy
to prove it to you.
Testing of something in medicine, for example, is done by what we in the brotherhood
call a clinical trial, more formally known as a randomized controlled trial.
The same general principles apply to any kind of scientific testing. The aspect
of randomization refers to the random distribution of subjects into
similarly sized groups. When done thoroughly and responsibly, complicated statistical
processes are used to remove any sort of bias for the assignment of subjects,
and to ensure that the assignments are not known to the participants or the
administrators. Make no mistake, even this apparently simple first step of
testing is a thorough one, and it's this kind of comprehensive attention to
detail that separates a real test from the typical anecdotal "testing" claimed
by supporters of most pseudoscientific phenomena.
One of the most important characteristics of a valid test is the control.
Let's say your wrist hurts, and so you try acupuncture, and your wrist feels
better. You're likely to consider that you've just tested acupuncture, and
it worked, thus proving its efficacy. But in fact, this was not a valid test,
because there was no control. Your wrist may have healed naturally. Your wrist
may have been healed by a psychic in the next room. There is no way to know
what effect, if any, the acupuncture had. It may have even slowed the healing,
for all you really know. The most basic kind of control would have been
to have at least two people with similar injuries, one received the acupuncture
and the other received a control procedure, and all else would have had to
be equal. With a control, you have the beginnings of a valid test.
Blinding is another fundamental of trials. Blinding means keeping the test
participants in the blind. If people know what they're being given, know what
results they're expected to report, or know what kind of result to look for,
the results are untrustworthy. Everyone is a human being, and if you're not
blinded, you may unknowingly skew the results, or you may have opportunity
to wield some agenda that you might have. Blinding can be single, double, or
In a single-blind test, the participants in the experiment don't know any
information that might skew the results. If they're testing a drug taken orally,
the participants must not know whether they're taking the real drug or the
control placebo. If they're receiving acupuncture, they must not know whether
they're receiving traditional acupuncture or sham acupunture; so in this case,
participants must be carefully screened to be sure that they have no prior
acupuncture experience. If they're taking wheatgrass juice, they must not be
able to tell whether they're drinking real wheatgrass juice or a placebo, so
it would have to be administered in some form where they couldn't tell. The
purpose of blinding the participants is to prevent them from either knowingly
or unknowingly manipulating the results of the test, by reporting or reacting
Single blind tests are good, but double blind tests are
better. In a double blind test, neither the subjects nor the people administering
the tests know what group any given subject is in. They also
don't know whether they're giving the real substance being tested or a placebo.
A double blind test removes the chance that a test administrator might skew
the results by acting differently, either knowingly or unknowingly, and thus
providing information to the test subject.
Triple blind tests take it the furthest extreme. A triple blind test is just
like a double blind test, but with the additional element of the statisticians
also being blinded. For the people tabulating and analyzing the results of
the test to be blinded, the data is presented to them in a coded form so that
they're not able to know anything about any given subject or administrator.
They'll see data like "Subject A was given substance B by administrator C,
and had a 13% improvement." They don't know if subject A was in a control group
or a test group, they don't know what substance B is, and they don't know
who administrator C is. In this way they're able to present detailed results
of the test that are completely unbiased, because even the statisticians themselves
don't know what the data mean.
Once your testing is done, your results are ready for publication. If you
want your report to be taken seriously, it needs to be subjected to — and
survive — the process of peer review.
Peer review means having your research submitted to experts
in the field. So who are the experts? That usually depends on who's publishing
or funding the research. If it's a scientific journal, the editorial staff
will usually maintain a stable of referees in the community. If it's some group
considering funding your research, they'll typically have hired a panel of
experts. If your research was responsibly conducted and your conclusions are
well supported by the evidence, then the referees will typically give it a
passing grade for publication.
Let's say a UFO researcher writes a paper that says UFOs come from another
dimension, and he has some of his fellow UFOlogists — whom he considers
his peers — to endorse his paper. Does that make it peer reviewed? No,
because he chose the referees himself. What if the editor of an undergound
UFO pamphlet chooses a panel of UFOlogists who endorse the paper, does that
make it peer reviewed? No, because these referees are clearly biased, and their
scientific acumen would not survive any type of scrutiny from the general scientific
community. Typically the publication must be one with a long standing reputation,
and strict requirement, of thorough peer review. The process of peer review
is not perfect, as it relies on individuals who, though they've been scrutinized
by a committee themselves, are still human beings who can make mistakes, get
lazy, have agendas, or just bad hair days. But peer review succeeds far more
often than it fails, and if you want anyone to take your research seriously,
it must be peer reviewed.
Remember: Articles that report reliable results will always detail the testing
that was done and the methods used. If the claim is far fetched, and the supporting
documentation of testing that the claimants are willing to share is inadequate,
you have very good reason to be skeptical.
By Brian Dunning
Please contact us with any corrections or feedback.
Cite this article:
Dunning, B. "A Primer on Scientific Testing." Skeptoid Podcast. Skeptoid Media,
11 Dec 2006. Web.
28 Nov 2015. <http://skeptoid.com/episodes/4013>
References & Further Reading
Bratman, Steven. "The double-blind gaze: how the double-blind experimental protocol changed science." Skeptic. 1 Jan. 2005, Volume 11, Issue 3: 64-73.
Dean, Angela M., Voss, Daniel. Design and Analysis of Experiments. New York: Springer Publishing, 2001. 1-740.
Edmund, Norman W. "The Scientific Method Today." The Scientific Method Today. Edmund Scientific, 1 Jan. 1997. Web. 9 Oct. 2009. <http://www.scientificmethod.com/index.html>
FDA. "Protecting America's Health Through Human Drugs." U.S. Food and Drug Administration's Information for Consumers (Drugs). U.S. Food and Drug Administration, 1 Jan. 2006. Web. 9 Oct. 2009. <http://www.fda.gov/Drugs/ResourcesForYou/Consumers/ucm143455.htm>
Gonick, Larry. The Cartoon Guide to Statistics. London: Collins Reference, 1993. 1-240.
Keppel, Geoffrey, Saufley, William H. Jr., Tokunaga, Howard. Introduction to Design and Analysis: A Student's Handbook. New York: Worth Publishers, 1992. 1-626.
Manton, David J., Walker, Glenn D., Cai, Fan, Cochrane, Nathan J., Pshen, Eiyan, Reynolds, Eric C. "Remineralization of enamel subsurface lesions in situ by the use of three commercially available sugar-free gums." International Journal of Paediatric Dentistry. 23 Apr. 2008, Volume 18, Number 4: 284-290.
Share, Bianca, Sanders, Nick, Kemp, Justin. "Caffeine and performance in clay target shooting." Journal of Sports Sciences. 1 Apr. 2009, Volume 27, Number 6: 661-666.
©2015 Skeptoid Media, Inc. All Rights Reserved. Rights and reuse information