The basic idea of bootstrapping is to use a sample dataset of modest size to simulate an entire population. An example is provided by carrying out calculations to derive a 95% CI of the mean using the data that were analyzed in (; ). Here, 55 data points comprise the original sample of measured GFP intensities. In the first round of bootstrapping, the 55 data points are randomly to obtain a new data set of 55 values (). Notice that, by sampling randomly with replacement, some of the data points from the original dataset are missing, whereas others are repeated two or more times. A mean is then calculated for the re-sampled set, along with other statistical parameters that may be of interest. Round one is done. The results of rounds two and three are also shown for clarity (). Now repeat 3,997 more times. At this point, there should be 4,000 means obtained entirely through re-sampling. Next, imagine lining up the 4,000 means from lowest to highest, top to bottom (). We can partition off the lowest (top of list) 2.5% of values by drawing a line between the mean values at positions 100 and 101. We can also do this for the highest (bottom of list) 2.5% of values by drawing a line between the means at positions 3,900 and 3,901. To get a 95% CI, we simply report the mean values at positions 101 (14.86890) and 3900 (19.96438). Put another way, had we carried out just 40 iterations instead of 4,000, the 95% CI would range from the second highest to the second lowest number. Thus, at its core, bootstrapping is conceptually very simple. The fact that it's a bear computationally matters only to your computer.
The last test shown in is the output from confidence interval calculations for two ratios. This test was carried out using an Excel tool that is included in this chapter. To use this tool, we must enter for each paired experiment the mean (termed “estimate”) and the SE (“SE of est”) and must also choose a confidence level (). Looking at the results of the statistical analysis of ratios (), we generally observe much crisper results than were provided by the -tests. For example, in the three cases where comparisons were made only within individual blots, all three showed significant differences corresponding to . In contrast, as would be expected, combining lane data between different blots to obtain ratios did not yield significant results, even though the ratios were of a similar magnitude to the blot-specific data. Furthermore, although combining all values to obtain means for the ratios did give
This quote comes from Walton’s first letter to his sister in England
The proper understanding and use of statistical tools are essential to the scientific enterprise. This is true both at the level of designing one's own experiments as well as for critically evaluating studies carried out by others. Unfortunately, many researchers who are otherwise rigorous and thoughtful in their scientific approach lack sufficient knowledge of this field. This methods chapter is written with such individuals in mind. Although the majority of examples are drawn from the field of biology, the concepts and practical applications are also relevant to those who work in the disciplines of molecular genetics and cell and developmental biology. Our intent has been to limit theoretical considerations to a necessary minimum and to use common examples as illustrations for statistical analysis. Our chapter includes a description of basic terms and central concepts and also contains in-depth discussions on the analysis of means, proportions, ratios, probabilities, and correlations. We also address issues related to sample size, normality, outliers, and non-parametric approaches.