When an experimental study states "The group with treatment X had significantly less disease (p = 1%)", many people interpret this statement as being equivalent to "there is a 99% chance that if I do treatment X it will prevent disease." This essay explains why these statements are not equivalent. For such an experiment, all of the following are possible:
X is in fact an effective treatment as claimed.
X is only effective for some people, but not for me, because I am different in a way that the experiment failed to distinguish.
X is ineffective, and only looked effective due to random chance.
X is ineffective because of a systematic flaw in the experiment.
X is ineffective and the experimenters and/or reader misinterpreted the results to say that it is.
There is no way to know for sure which possibility holds, but there are warning signs that can dilute the credibility of an experiment. In Part I we look at warning signs in the design of an experiment that can render it uninformative; in Part II at warning signs in the interpretation of an experiment that can lead the reader to give it more credibility than it deserves. The presence of any one warning sign does not invalidate a study -- there certainly are valid and convincing studies that are not randomized, for example -- but the more warning signs the more skeptical you should be.