20 August 2015
Named parts of a data frame can be accessed through dollar-sign ($) notation, such as
myDataFrame$myFirstVariable. This quickly becomes cumbersome and even error-prone if the data frame has a long name. One solution is to use
attach(myDataFrame), which makes the variables in the data frame accessible without using $ notation.
attach() function comes with some potentially nasty side-effects. I’ll show you three of these. First, I'll make and attach a small data frame.
m <- matrix(data=round(rnorm(15,20,10),0), nrow=5)
colnames(m) <- c('alpha', 'beta', 'gamma')
df <- data.frame(m)
## alpha beta gamma
## 1 30 12 32
## 2 18 23 19
## 3 15 25 30
## 4 36 20 18
## 5 41 9 30
The first thing to realize about
attach() is that it makes copies of the vectors in the data frame, and that you will be working with these copies, not the original data frame. Any changes you make to the vectors produced by
attach() will not propagate back to the data frame. For example, suppose that you realize one of the values (beta) was incorrect and you change it. The changed value appears in the vector made by
attach(), but not in the data frame, which still has the original value.
beta <- 1000
##  1000
##  12
Second, if you’ve made a change like this, and later
detach() the data frame, any modified vectors will remain. You might notice this and be puzzled, even more so when you realize that only the modified vector remains – the unmodified alpha and gamma vectors are gone.
##  1000 23 25 20 9
## Error in eval(expr, envir, enclos): object 'alpha' not found
## function (x) .Primitive("gamma")
Third, if we reattach the data frame, we now get a warning about masked objects.
## The following object is masked _by_ .GlobalEnv:
But what does this warning mean? Let’s look at alpha, beta, and gamma.
##  30 18 15 36 41
##  1000 23 25 20 9
##  32 19 30 18 30
Doing this reveals that the alpha and gamma were copied from the data frame, but that beta was not – it is still the vector that we modified.
More generally, if you have an object with some name, and you attach a data frame that has a vector with the same name, attaching will not allow you to access that vector. If the two objects with different names have different data structures (for example, if the first object was a regression object), you will likely get an error if you try to it where you intended to use the vector with that name. If the first object was a vector, particularly if it has the same length as the one in the data frame, you may get odd results from any calculations. Hopefully, you will be eagle-eyed enough to spot these, or have tests in place to detect them; if not, errors will creep into your work.
Many argue that you should avoid
attach() altogether, including Google. A more balanced view is to understand the potential problems of
attach() and to use it with caution. The safest way to use
attach() is when you are working with only a single object, the data frame, and you are not modifying its contents.
There are several ways to avoid using
You could continue to use $ notation, although the original problem of repeatedly typing its name remains. This could be lessened by making the data frame have a shorter name, although it should still clearly express what it represents.
Second, some functions have a data parameter that lets you access the variables of a data frame without $ notation. The linear model regression function is one example.
## lm(formula = beta ~ alpha, data = df)
## (Intercept) alpha
## 31.855 -0.502
with() function lets you call variables without $ notation.
with(df, plot(alpha, beta))
Because you must wrap every statement in a
with() that includes the data frame name, it only partially lessens the typing problem. Even so, it will help you avoid casual attaching and detaching of data frames, which you should avoid.
Comments or questions? Contact me at email@example.com