20 August 2009
Many times, you will want to retrieve one variable based on whether a second variable equals some value. To make things concrete, suppose you have a matrix called geochemistry with many different columns (variables) and many rows (samples). [Note: your data matrices should generally be set up this way, with columns corresponding to variables and rows corresponding to samples.] Suppose you would like to get all the values of nitrate for samples in which the pH was less than 7.0. To do this, you would write this:
Let’s take this apart to see how it works. geochemistry$pH is a vector, the pH column from the geochemistry matrix. geochemistry$pH<7.0 is also a vector, but a vector of TRUE and FALSE, stating whether each value of pH is less than 7.0. You want all the samples for which pH is less than 7.0, and you want the column (vector) for nitrate, so you can combine them in the call to geochemistry[rows,columns].
Rather than test against a numerical value, sometimes you may want to test against missing values, which should be coded as NA in your data set. To do this, type
The function is.na() tests whether a value is NA. The exclamation mark in front of this changes it into a test for not NA, that is, it returns a TRUE for every value that isn’t an NA.