A Primer on Scientific Testing

Understand the basics of scientific testing.

by Brian Dunning

Filed under General Science, Logic & Persuasion

Skeptoid #13
December 11, 2006
Podcast transcript | Listen | Subscribe
Also available in Japanese

Today we're going hide our crib notes in our hats, pull our sleeve down to cover the notes written on our arms, and dive into the world of testing.

Much of the feedback I received on the wheatgrass juice episode concerned claims that wheatgrass juice has already been tested and been proven to cure many different diseases and promote many types of well being. And if I had a dollar for every email I received accusing me of being in the employ of Big Evil Corporations who are frightened of wheatgrass juice, I would be able to afford a shot of this quack elitist scam beverage every single day.

So I hereby present this primer on scientific testing, to better equip the layperson with the ability to determine the validity of claims being made, or the published results of supposed research. Valid claims and real research will follow the whole process that I'm about to outline, and they'll tell you about it too. If the poster you read in Jamba Juice doesn't detail the testing procedure used to substatiate its claims, or if the testing procedure is not similar to that outlined herein, then you have very good cause to be skeptical of any claims that it makes. If something works, its makers should be happy to prove it to you.

Testing of something in medicine, for example, is done by what we in the brotherhood call a clinical trial, more formally known as a randomized controlled trial. The same general principles apply to any kind of scientific testing. The aspect of randomization refers to the random distribution of subjects into similarly sized groups. When done thoroughly and responsibly, complicated statistical processes are used to remove any sort of bias for the assignment of subjects, and to ensure that the assignments are not known to the participants or the administrators. Make no mistake, even this apparently simple first step of testing is a thorough one, and it's this kind of comprehensive attention to detail that separates a real test from the typical anecdotal "testing" claimed by supporters of most pseudoscientific phenomena.

One of the most important characteristics of a valid test is the control. Let's say your wrist hurts, and so you try acupuncture, and your wrist feels better. You're likely to consider that you've just tested acupuncture, and it worked, thus proving its efficacy. But in fact, this was not a valid test, because there was no control. Your wrist may have healed naturally. Your wrist may have been healed by a psychic in the next room. There is no way to know what effect, if any, the acupuncture had. It may have even slowed the healing, for all you really know. The most basic kind of control would have been to have at least two people with similar injuries, one received the acupuncture and the other received a control procedure, and all else would have had to be equal. With a control, you have the beginnings of a valid test.

Blinding is another fundamental of trials. Blinding means keeping the test participants in the blind. If people know what they're being given, know what results they're expected to report, or know what kind of result to look for, the results are untrustworthy. Everyone is a human being, and if you're not blinded, you may unknowingly skew the results, or you may have opportunity to wield some agenda that you might have. Blinding can be single, double, or even triple.

In a single-blind test, the participants in the experiment don't know any information that might skew the results. If they're testing a drug taken orally, the participants must not know whether they're taking the real drug or the control placebo. If they're receiving acupuncture, they must not know whether they're receiving traditional acupuncture or sham acupunture; so in this case, participants must be carefully screened to be sure that they have no prior acupuncture experience. If they're taking wheatgrass juice, they must not be able to tell whether they're drinking real wheatgrass juice or a placebo, so it would have to be administered in some form where they couldn't tell. The purpose of blinding the participants is to prevent them from either knowingly or unknowingly manipulating the results of the test, by reporting or reacting differently.

Single blind tests are good, but double blind tests are better. In a double blind test, neither the subjects nor the people administering the tests know what group any given subject is in. They also don't know whether they're giving the real substance being tested or a placebo. A double blind test removes the chance that a test administrator might skew the results by acting differently, either knowingly or unknowingly, and thus providing information to the test subject.

Triple blind tests take it the furthest extreme. A triple blind test is just like a double blind test, but with the additional element of the statisticians also being blinded. For the people tabulating and analyzing the results of the test to be blinded, the data is presented to them in a coded form so that they're not able to know anything about any given subject or administrator. They'll see data like "Subject A was given substance B by administrator C, and had a 13% improvement." They don't know if subject A was in a control group or a test group, they don't know what substance B is, and they don't know who administrator C is. In this way they're able to present detailed results of the test that are completely unbiased, because even the statisticians themselves don't know what the data mean.

Once your testing is done, your results are ready for publication. If you want your report to be taken seriously, it needs to be subjected to — and survive — the process of peer review.

Peer review means having your research submitted to experts in the field. So who are the experts? That usually depends on who's publishing or funding the research. If it's a scientific journal, the editorial staff will usually maintain a stable of referees in the community. If it's some group considering funding your research, they'll typically have hired a panel of experts. If your research was responsibly conducted and your conclusions are well supported by the evidence, then the referees will typically give it a passing grade for publication.

Let's say a UFO researcher writes a paper that says UFOs come from another dimension, and he has some of his fellow UFOlogists — whom he considers his peers — to endorse his paper. Does that make it peer reviewed? No, because he chose the referees himself. What if the editor of an undergound UFO pamphlet chooses a panel of UFOlogists who endorse the paper, does that make it peer reviewed? No, because these referees are clearly biased, and their scientific acumen would not survive any type of scrutiny from the general scientific community. Typically the publication must be one with a long standing reputation, and strict requirement, of thorough peer review. The process of peer review is not perfect, as it relies on individuals who, though they've been scrutinized by a committee themselves, are still human beings who can make mistakes, get lazy, have agendas, or just bad hair days. But peer review succeeds far more often than it fails, and if you want anyone to take your research seriously, it must be peer reviewed.

Tip Skeptoid $2/mo $5/mo $10/mo One time

Remember: Articles that report reliable results will always detail the testing that was done and the methods used. If the claim is far fetched, and the supporting documentation of testing that the claimants are willing to share is inadequate, you have very good reason to be skeptical.

Brian Dunning

© 2006 Skeptoid Media Copyright information

References & Further Reading

Bratman, Steven. "The double-blind gaze: how the double-blind experimental protocol changed science." Skeptic. 1 Jan. 2005, Volume 11, Issue 3: 64-73.

Dean, Angela M., Voss, Daniel. Design and Analysis of Experiments. New York: Springer Publishing, 2001. 1-740.

Edmund, Norman W. "The Scientific Method Today." The Scientific Method Today. Edmund Scientific, 1 Jan. 1997. Web. 9 Oct. 2009. <http://www.scientificmethod.com/index.html>

FDA. "Protecting America's Health Through Human Drugs." U.S. Food and Drug Administration's Information for Consumers (Drugs). U.S. Food and Drug Administration, 1 Jan. 2006. Web. 9 Oct. 2009. <http://www.fda.gov/Drugs/ResourcesForYou/Consumers/ucm143455.htm>

Gonick, Larry. The Cartoon Guide to Statistics. London: Collins Reference, 1993. 1-240.

Keppel, Geoffrey, Saufley, William H. Jr., Tokunaga, Howard. Introduction to Design and Analysis: A Student's Handbook. New York: Worth Publishers, 1992. 1-626.

Manton, David J., Walker, Glenn D., Cai, Fan, Cochrane, Nathan J., Pshen, Eiyan, Reynolds, Eric C. "Remineralization of enamel subsurface lesions in situ by the use of three commercially available sugar-free gums." International Journal of Paediatric Dentistry. 23 Apr. 2008, Volume 18, Number 4: 284-290.

Share, Bianca, Sanders, Nick, Kemp, Justin. "Caffeine and performance in clay target shooting." Journal of Sports Sciences. 1 Apr. 2009, Volume 27, Number 6: 661-666.

Reference this article:
Dunning, B. "A Primer on Scientific Testing." Skeptoid Podcast. Skeptoid Media, 11 Dec 2006. Web. 7 Oct 2015. <http://skeptoid.com/episodes/4013>


10 most recent comments | Show all 21 comments

I heard the episode where you expressed dismay at the lack of interest in this episode. I was actually very glad you did this one - I wasn't really sure what bling testing actually meant, and this show is a real help to me thank you for the clarification.

Moe Shinola, Kansas CIty, MO
September 16, 2010 2:25pm

Unfortunately scientists themselves deliberately subvert assessment procedures in order to skew results. Such a process was used in many of the basic tests to define Chronic Fatigue Syndrome. The most important criteria for this disease cluster was Fukuda et al 1994 the Centres for Disease Control case definition

This was sabotaged by deliberately avoiding one of the stipulations in the Case definition, in studies carried out by some very questionable persons. The stipulation was that persons with diagnosed Clinical Depression be not admitted to sample groups

The methodology to justify this was to utilise earlier broader definitions of CFS that allowed Clinical Depression in samples. To no one's surprise the studies found that CFS was predominantly "a form of Depression". The studies varied between incompetent and fraudulent

Sadly, because of prejudice in the peer review processes the faulty samples were not exposed. CFS patients for years suffered from the admission of clinically depressed people into the diagnosis as displaying the primary cause of the syndromes. Big business for psychiatrists and psychologists - many with NO medical qualifications.

What was bordering on medical fraud held back the research into CFS and later into GWS, which has similar features, for over a decade. Genuine suffers were subject to negligence and ridicule - their desperation treated as "depression"

CBT was one proposed "cure" - those who did not respond were further vilified.

Phi, Sydney
March 5, 2011 4:38pm

And another thread becomes a staging post for Phi to complain psychology is not science.

How about some evidence of the prejudiced world view, flawed methodology or failing testing? No links to support your claims? No varifiable examples? No proof the case definition was ignored?

Illuminatus, somewhere hot
April 21, 2011 6:45am

Check the studies Illy - they supposedly addressed the CDC case definition (Fukuda et al 1994) but diagnosed according to the Wesselley/Oxford Criteria to provide contaminated samples

It was fraud. It was then supported by the party in Veterans Affairs seeking to make GWS appear to be a psychosomatic disorder

I can't give you the name here for obvious reasons - but work it out by reading the papers. A very high level set that one up.

phi, Sydney
April 21, 2011 7:07am

You know Phi, I keep reading that one study you posted, and I cant see for the life of me how on earth it supports your view.

Provide some context here. The definitions in it support much of the same as whats available today still. It isnt a well understood disorder. I havent seen any credible reviews saying it was a farce. They may be out there but you are going to have to show them.

Heres a more recent study:


2004 beats 1994. Have a gander at it ppl.

And yes Phi bring the subject back to where we started it ;) Harping your case over a lot of forums does not make it any more credible you know. You still need some...hell ANY backing evidence.

If its out there bring it for review.

More importantly you'll see :O A BIT OF SCIENCE in that paper by gosh and by golly ;) Might help you get a bit more education here.

Cam, Thunder Bay
April 21, 2011 7:18am

Easy Cam - the samples allow people in with clinical depression - which is forbidden by the CDC case definition

If you let people in with clinical depression (using alternate criteria) then of course you will find that the illness concerned is a form of depression or a psychosomatic condition

It's fraud. The Oxford and Wesselley criteria were already redundant because of the depression problem

The handling of CFS is a medical scandal. Fortunately in Canada far better guidelines now exist and the world is evaluating them. They are well worth reading.

Phi, Sydney
April 22, 2011 7:13am

No, its sad that CFS is over diagnosed. A medial scandal is sawing feet off to screw on artifials sourced from your company.

There is a difference.

Or sampling kiddies at a birthday party.

Henk V, Sydney Australia
September 2, 2011 10:29pm

Magically I found Brian's

A Primer on Scientific Testing

It covers how EB is done very well. Whilst it doesnt cover SB measurements, the student should look to how EB is generated.

I am glad that you made this skeptoid for future students to read!

Muddie, Sutherland BatCave, Oz
November 5, 2011 11:38am

Its clear that nobody wants to know about testing claim.

Its very unfortunate because this was a great skeptoid.

Mud, Sin City, Oz
January 31, 2013 2:26am

WTF? January 2013 for a September 2013 skeptoid.Time travel or faster than light comments

Sbo, KZN Midlands SA
November 23, 2013 5:46am

Make a comment about this episode of Skeptoid (please try to keep it brief & to the point).

Post a reply


What's the most important thing about Skeptoid?

Support Skeptoid

About That 1970s Global Cooling...
Skeptoid #487, Oct 6 2015
Read | Listen (12:13)
The Flying Saucer Menace
Skeptoid #486, Sep 29 2015
Read | Listen (12:29)
Holocaust Denial
Skeptoid #485, Sep 22 2015
Read | Listen (12:54)
More Unsung Women of Science
Skeptoid #484, Sep 15 2015
Read | Listen (12:56)
Unsung Women of Science
Skeptoid #483, Sep 8 2015
Read | Listen (13:13)
#1 -
The St. Clair Triangle UFO
Read | Listen
#2 -
Tube Amplifiers
Read | Listen
#3 -
Read | Listen
#4 -
That Elusive Fibromyalgia
Read | Listen
#5 -
SS Iron Mountain
Read | Listen
#6 -
A Skeptical Look at the News
Read | Listen
#7 -
The War of the Worlds Panic Broadcast
Read | Listen
#8 -
Ancient Astronauts
Read | Listen

Recent Comments...

[Valid RSS]

  Skeptoid PodcastSkeptoid on Facebook   Skeptoid on Twitter   Brian Dunning on Google+   Skeptoid on Stitcher   Skeptoid RSS

Members Portal


Follow @skeptoid

Tweets about skeptoid

Support Skeptoid

Email: [Why do we need this?]To reduce spam, we email new faces a confirmation link you must click before your comment will appear.
characters left. Abusive posts and spam will be deleted.