A Primer on Scientific Testing
Understand the basics of scientific testing.
Filed under Logic & Persuasion
| Skeptoid #13 December 11, 2006 Podcast transcript | Listen | Subscribe Also available in Japanese |
|
Today we're going hide our crib notes in our hats, pull our sleeve down to cover the notes written on our arms, and dive into the world of testing.
Much of the feedback I received on the wheatgrass juice episode concerned claims that wheatgrass juice has already been tested and been proven to cure many different diseases and promote many types of well being. And if I had a dollar for every email I received accusing me of being in the employ of Big Evil Corporations who are frightened of wheatgrass juice, I would be able to afford a shot of this quack elitist scam beverage every single day.
So I hereby present this primer on scientific testing, to better equip the layperson with the ability to determine the validity of claims being made, or the published results of supposed research. Valid claims and real research will follow the whole process that I'm about to outline, and they'll tell you about it too. If the poster you read in Jamba Juice doesn't detail the testing procedure used to substatiate its claims, or if the testing procedure is not similar to that outlined herein, then you have very good cause to be skeptical of any claims that it makes. If something works, its makers should be happy to prove it to you.
Testing of something in medicine, for example, is done by what we in the brotherhood call a clinical trial, more formally known as a randomized controlled trial. The same general principles apply to any kind of scientific testing. The aspect of randomization refers to the random distribution of subjects into similarly sized groups. When done thoroughly and responsibly, complicated statistical processes are used to remove any sort of bias for the assignment of subjects, and to ensure that the assignments are not known to the participants or the administrators. Make no mistake, even this apparently simple first step of testing is a thorough one, and it's this kind of comprehensive attention to detail that separates a real test from the typical anecdotal "testing" claimed by supporters of most pseudoscientific phenomena.
One of the most important characteristics of a valid test is the control. Let's say your wrist hurts, and so you try acupuncture, and your wrist feels better. You're likely to consider that you've just tested acupuncture, and it worked, thus proving its efficacy. But in fact, this was not a valid test, because there was no control. Your wrist may have healed naturally. Your wrist may have been healed by a psychic in the next room. There is no way to know what effect, if any, the acupuncture had. It may have even slowed the healing, for all you really know. The most basic kind of control would have been to have at least two people with similar injuries, one received the acupuncture and the other received a control procedure, and all else would have had to be equal. With a control, you have the beginnings of a valid test.
Blinding is another fundamental of trials. Blinding means keeping the test participants in the blind. If people know what they're being given, know what results they're expected to report, or know what kind of result to look for, the results are untrustworthy. Everyone is a human being, and if you're not blinded, you may unknowingly skew the results, or you may have opportunity to wield some agenda that you might have. Blinding can be single, double, or even triple.
In a single-blind test, the participants in the experiment don't know any information that might skew the results. If they're testing a drug taken orally, the participants must not know whether they're taking the real drug or the control placebo. If they're receiving acupuncture, they must not know whether they're receiving traditional acupuncture or sham acupunture; so in this case, participants must be carefully screened to be sure that they have no prior acupuncture experience. If they're taking wheatgrass juice, they must not be able to tell whether they're drinking real wheatgrass juice or a placebo, so it would have to be administered in some form where they couldn't tell. The purpose of blinding the participants is to prevent them from either knowingly or unknowingly manipulating the results of the test, by reporting or reacting differently.
Single blind tests are good, but double blind tests are better. In a double blind test, neither the subjects nor the people administering the tests know what group any given subject is in. They also don't know whether they're giving the real substance being tested or a placebo. A double blind test removes the chance that a test administrator might skew the results by acting differently, either knowingly or unknowingly, and thus providing information to the test subject.
Triple blind tests take it the furthest extreme. A triple blind test is just like a double blind test, but with the additional element of the statisticians also being blinded. For the people tabulating and analyzing the results of the test to be blinded, the data is presented to them in a coded form so that they're not able to know anything about any given subject or administrator. They'll see data like "Subject A was given substance B by administrator C, and had a 13% improvement." They don't know if subject A was in a control group or a test group, they don't know what substance B is, and they don't know who administrator C is. In this way they're able to present detailed results of the test that are completely unbiased, because even the statisticians themselves don't know what the data mean.
Once your testing is done, your results are ready for publication. If you want your report to be taken seriously, it needs to be subjected to — and survive — the process of peer review.
Peer review means having your research submitted to experts in the field. So who are the experts? That usually depends on who's publishing or funding the research. If it's a scientific journal, the editorial staff will usually maintain a stable of referees in the community. If it's some group considering funding your research, they'll typically have hired a panel of experts. If your research was responsibly conducted and your conclusions are well supported by the evidence, then the referees will typically give it a passing grade for publication.
Let's say a UFO researcher writes a paper that says UFOs come from another dimension, and he has some of his fellow UFOlogists — whom he considers his peers — to endorse his paper. Does that make it peer reviewed? No, because he chose the referees himself. What if the editor of an undergound UFO pamphlet chooses a panel of UFOlogists who endorse the paper, does that make it peer reviewed? No, because these referees are clearly biased, and their scientific acumen would not survive any type of scrutiny from the general scientific community. Typically the publication must be one with a long standing reputation, and strict requirement, of thorough peer review. The process of peer review is not perfect, as it relies on individuals who, though they've been scrutinized by a committee themselves, are still human beings who can make mistakes, get lazy, have agendas, or just bad hair days. But peer review succeeds far more often than it fails, and if you want anyone to take your research seriously, it must be peer reviewed.
Remember: Articles that report reliable results will always detail the testing that was done and the methods used. If the claim is far fetched, and the supporting documentation of testing that the claimants are willing to share is inadequate, you have very good reason to be skeptical.
You should follow me on twitter here.
© 2006 Skeptoid Media, Inc. Copyright information
References & Further Reading
Bratman, Steven. "The double-blind gaze: how the double-blind experimental protocol changed science." Skeptic. 1 Jan. 2005, Volume 11, Issue 3: 64-73.
Dean, Angela M., Voss, Daniel. Design and Analysis of Experiments. New York: Springer Publishing, 2001. 1-740.
Edmund, Norman W. "The Scientific Method Today." The Scientific Method Today. Edmund Scientific, 1 Jan. 1997. Web. 9 Oct. 2009. <http://www.scientificmethod.com/index.html>
FDA. "Protecting America's Health Through Human Drugs." U.S. Food and Drug Administration's Information for Consumers (Drugs). U.S. Food and Drug Administration, 1 Jan. 2006. Web. 9 Oct. 2009. <http://www.fda.gov/Drugs/ResourcesForYou/Consumers/ucm143455.htm>
Gonick, Larry. The Cartoon Guide to Statistics. London: Collins Reference, 1993. 1-240.
Keppel, Geoffrey, Saufley, William H. Jr., Tokunaga, Howard. Introduction to Design and Analysis: A Student's Handbook. New York: Worth Publishers, 1992. 1-626.
Manton, David J., Walker, Glenn D., Cai, Fan, Cochrane, Nathan J., Pshen, Eiyan, Reynolds, Eric C. "Remineralization of enamel subsurface lesions in situ by the use of three commercially available sugar-free gums." International Journal of Paediatric Dentistry. 23 Apr. 2008, Volume 18, Number 4: 284-290.
Share, Bianca, Sanders, Nick, Kemp, Justin. "Caffeine and performance in clay target shooting." Journal of Sports Sciences. 1 Apr. 2009, Volume 27, Number 6: 661-666.
Reference this article:
Dunning, Brian.
"A Primer on Scientific Testing." Skeptoid Podcast. Skeptoid Media, Inc.,
11 Dec 2006. Web.
6 Sep 2010. <http://skeptoid.com/episodes/4013>
Discuss!
Remember, you should always read with skepticism the comments of anyone too lame to put their real name & city.
You asked, so here goes. I have a couple of small beefs with your discussion of scientific testing.
You are correct - double and triple blind studies are the gold standard of scientific testing. However they are also unethical or impossible in many legitimate scientific testing situations.
I do educational research on school choice. I would LOVE to randomly assign students to schools, but their parents get cranky. Even in medical research, you can't ethically assign random people to be smokers or not smokers.
Your podcast made it sound cut and dried but it isn't. More important than double or triple blind studies is having an accurately defined control group, statistical controls, and an understanding and discussion of the potential biases in the data. The true test of those things is whether it passes peer review.
Kate, Mesa, AZ
May 22, 2007 7:38pm
Greetings
Just recently discovered Skepticism and your podcast. As someone with a non-existent science background this podcast was very much appreciated. It also helped me write a reply to friends who asked me about Laurie Cabot's The Science of Witchcraft Tradition and why I feel it isn't scientific. Thanks and enjoying going through the podcasts and looking forward to more.
Tressa, Gardner, MA
July 27, 2007 6:42pm
The BMJ (British Medical Journal ) published an article about randomized clinical testing. They took a skeptical look at the efficacy of parachutes in preventing death and serious injury from free-fall. As they point out there has never been a controlled study. The article is tongue-in-cheek and meant to remind people that sometimes common sense can be overlooked. Here's a link to the article in question.
http://www.bmj.com/cgi/content/full/327/7429/1459
If you don't care for unknown URL's just do a search for "parachute" on bmj.com.
Happy Reading.
Scott, Providence, RI
August 12, 2007 11:50am
I have recently been studying the question of cholesterol and saturated fats and their relation to cardiovascular disease. After much thought and digging in the "literature" I have come to the conclusion that the common "wisdom" on this subject is 180 degrees wrong.
My problem is that the peer reviewers are "true believers" just like almost everybody else in our current society, and they're being paid off. The "studies" either prove nothing or imply the opposite of what they claim to prove and the reviewers don't care.
Here is some parachute-type common sense: Animal fat is naturally occurring and good for us, while vegetable oils are artificial like white sugar and are bad for us.
Cholesterol is produced by the liver. Why would the body poison itself? If you eat more cholesterol the liver produces less. Why interfere with a normal process? Cholesterol is a basic building block of the body. Why beat up on it?
These are questions to ask your peer reviewers.
I guess my point is that the established medical profession is so corrupt and unscientific that it almost hopeless for the public to get honest guidance and protection from the epidemic diseases being propagated by that same medical profession!
(the bias against animal fat may be causing diabetes, overweight, acid reflux, impotence, osteoporosis, heart disease and stroke, and who knows what else
in millions of people.)
Robert Maltz, Palo Alto, CA
September 07, 2007 5:23pm
I love double/triple tests. I hate it everytime I hear a 'proven to blah blah' on TV and I cower in the corner. But, there is a little snag I've always had. Statistics are wonderful things. Just 2 days ago I was teaching my nephew about Mean Mode and Median, and I managed to show a group of numbers where 80% of the 'test group' got below the average :-D.
The point is, I really prefer CAUSAL relationships. That is, you do a double /triple research, and it shows (say) acupuncture and wheatgrass causes wrist pain to decrease at 20% increased rate. Ok brilliant, now show me the causal relationship. What is it in wheatgrass and acupuncture that actually causes the pain to subside. Once that link is formed, that's when I get really excited.
For all the research done, I hate it how we don't seem to have enough causal relationship studies.
John.
PS: To Robert Maltz above me, I'm not sure of your background, I'm just a humble nurse, so be kind. But the body has a bunch of negative feedback loops where it 'poisons' itself. The best example is water toxicity found often in polydipsia patients. (I'm mental health). In cut down form. You drink a lot of water, which lowers the salt in your blood, your blood gets salt from your muscles, which triggers your brain for thirst, so you drink a lot of watser..... And then you die.
And as for 'natural'. Isn't everything made out of atoms at the end of the day? I thought everything was 'natural' one way or another. Except that anti-matter food
John Grayson, Newcastle / Australia
November 07, 2007 6:27am
Has this protcol been applied to vaccination use? If so great carry on injecting. If not, why not? and what are the implications of not doing it? Does this also give credibilty to the anti-vaccination lobby? Responses greatly appreciated quickly as MMR syringe and needle poised above Son's arm waiting!
Sean Gilbert, Leeds UK
January 29, 2008 6:55am
People like us that do real science are usually attacked on all fronts when our conclusions don't really stand to the evidence. Evidence can be interpreted in many ways, and statistics can help you say something that has absolutely nothing to do with reality.
Peer reviewers are sometimes biased as well. I've learned about people rejecting or accepting scientific journal articles based on their own agenda. This happens especially on low impact journals.
We have to be aware as well, that just because someone published something (e.g. Reader's Digest), it doesn't make it true.
The scientific method IS science. People think of Science of some sort of religion, when in fact it is only a tool, like mathematics, that helps us understand better this crazy world of ours.
John, Mexico City
February 27, 2008 3:52pm
I think this is an important episode. I'm a member of the Cleveland Skeptics and I have included this episode on our website under "Critical Thinking 101". Thanks!
Josh Hunt, Cleveland, OH
December 25, 2008 7:49am
Just a quick question: with a triple-blind, if the patients, administrators, and staticians are all kept in the dark, then at the end of the day, how does anyone figure out who got what and what the results are? Not being sarcastic; honestly puzzled.
Andy, Prosperity, SC, USA
June 11, 2009 7:39pm
Andy:
Single blinded: patient doesn't know. (But if administrator knows, they could give "clues" (even unintentional) to the patient.)
Double blinded: patient and administrator don't know. (But if the results evaluator knows, they could "try harder" (even unintentionally) to find a correlation.
Which leads us to the triple blinding.
We write down the following:
Samples A, C and D are the substance under test.
Samples B, E and F are the placebos.
We give the administrators the samples, but they only know them as A through F. (to prevent them from giving clues to the patients about the "real" substance and the placebo)
The statisticians then look at the results and say that samples B and C show patient improvement. (they have no idea which substance "should" show patient improvement)
So we, the experimenters know which sample is which, but no one else in the chain does.
Glen Wolfram, Eugene, Oregon
August 06, 2009 3:15pm
Make a comment about this episode of Skeptoid (please try to keep it brief & to the point). Anyone can post:
You can also discuss this episode in the Skeptoid Forum, hosted by the James Randi Educational Foundation.
Join the Skeptalk email discussion list.
What's the most important thing about Skeptoid?






Alright. Since you asked, I'll comment on your primer on scientific testing.
This is actually the second time I'll have listened to this episode. I admit that initially I had treated this as "wheatgrass part 2."
I particular like how you explain the difference between a double-blind test and a triple-blind test, pointing out that you get best results when the subjects are blind to placebo vs. real stuff, the distributors are blind to which is which, and the statistician being blind to which was used by whom.
I think even people who are somewhat aware of what a double-blind test is are not prone to thinking on who does the number crunching. Before listening to this a second time as per your request in your listener feedback episode, even I had fallen back into assuming that it must be the administrators of the experiment (the ones who distribute the placebo and the real test substance) that does the number crunching. It's easy to think this is always done and is necessary because it is what we are told is done by every quack out there, and it numbs us to the value of blind testing. We most often see a group of pseudoscientists doin the numbers for their own studies and skewing them, and require a debunking by a more jargon-fluent expert to explain to us what was done wrong.
It's easy to forget the idea of hiring an objective third party statistician with no political or personal ties to either the subjects or the administrators to ensure that the statistics are not skewed and reveal any inappropriate biases inherent to the study. It's also easy to dismiss because pseudoscientists often hire third party statisticians who DO have a bias and an agenda to push (see Intel
Aerik, Shawnee, KS
March 17, 2007 6:17pm