Steven Holland

# Bootstrap

Oftentimes, you need to place confidence limits on a parameter whose distribution is unknown. Bootstrapping is a good solution to this problem, and you will do that in this problem set. For this problem set, do not use magic numbers or loops; neither is necessary to solve this problem. Follow the formatting you have used in the past two problem sets.

1) Load the cadmium.txt data set and assign it to an object called cadmium. Display the first few rows of the data. It is a vector, so there is no need to attach() it.

2) Make a new plot window, and using the height and width arguments, make it three times wider than it is tall, which will avoid a lot of white space in the next step. Because the sample size is rather small, visualize the distribution of the data with stripchart(). Label the x-axis properly and do not include a main label. Use open circles for the plot symbol, because this will help you resolve data points that are almost identical.

3) You would like to estimate the skewness of this data. Write your own function to calculate skewness on a vector, and call the function skewness. The only argument it should take is x (a vector of data, but not necessarily the cadmium data), and it should return a single number, the skewness of that vector. The formula for sample skewness (G1) is as follows (s is sample standard deviation):

Whenever you write a function, you must test it to demonstrate that it works correctly. The scientific literature is littered with errors because code was not checked. There are several strategies for checking code. One is to give it an input (data), for which you know the result. Another is to compare your result with some other published code, with the assumption that what has been published is correct. Here, you should compare your value of skewness for the cadmium data with that produced by Excel’s SKEW() function. Check the result of your this way but do not turn in the your check. If your function is not correct, you must of course fix it so that it does work.

Note that your skewness function gives you a single value for this statistic, but you do not have an estimate of your uncertainty, that is, a confidence interval. You will use bootstrapping to generate this confidence interval.

5) The first step in bootstrapping your skewness function is to write a second function, one that calculates a single bootstrapped value of skewness. Use the example from lecture as a guide. Remember to set the sample size of your bootstrapped sample equal to the size of the cadmium data set, and remember to sample with replacement.

6) The second step is to use replicate() to calculate many bootstrapped skewness values, saving them to a vector called skewness.bootstrap. Replicate your bootstrap function 100,000 times.

7) Make a new plot window, and plot a frequency distribution of your bootstrapped values. Color the bars gray, suggest 50 breaks, properly label the x-axis, rotate the y-axis values, and do not show a main title.

8) From your bootstrapped values, calculate your parameter estimate of skewness by taking the mean of your bootstrapped values. Calculate the 95% confidence limits with the quantile() function.

9) As a comment, state in words your estimate of the parameter and its 95% confidence interval, using a reasonable number of significant figures.

E-mail your commands file to Steven Holland, following all the standard instructions. This problem set is due 25 October.