R Tips



Steven Holland

Writing your own functions

12 October, 2009, revised 16 October 2013

Writing functions in R is an important time-saver, and a skill you should learn early. Here are a few tips, using a function for the coefficient of variation as an example. Such a function could be defined like this:

coeffVar <- function(x) {
  CV <- mean(x)/sd(x)

Functions are objects, so in the first line, the function is given a name, coeffVar. Names should be descriptive and intuitive, so coeffVar would be a good name, but cv147smh would not.

The word function is an R function that creates our function object. The result of the the function() command is assigned to the name of the function object.

Functions generally require parameters, that is, data on which they will work as well as settings that control what they will do. Parameters are listed in parentheses after the word function. In this case, all the function needs to work is one parameter, a vector of data, which will be called x. Any name for this parameter could have been chosen, but x is used here because that matches R’s convention.

Following the list of parameters is a set of curly braces, {}, which enclose all of the statements that the function will execute. For clarity, these statements are usually listed one per line, beginning on the line after which the name of the function is declared. For clarity, the statements within the function are indented.

The last statement in the function is the object that will be returned. In this case, we calculate the coefficient of variation and assign it to an object called CV. The final line of the function is simply CV, because we want the function to return that value. You can also return a value by enclosing the object in the return() function.

Default parameter values

In some cases, you may want to specify a default value for a parameter. For example, suppose you were writing a function that would sort a vector. Since there are two ways to sort a vector, ascending or descending, you might have a default value for the sort order, so that the user would not need to specify the order if all they wanted was the default.

mySortingFunction <- function(x, sortAscending=TRUE) {
  ... do some work here

In this case, if you wanted to have your vector sorted in ascending order, you could skip the sortAscending parameter and just call mySortingVector(someData). If you needed it to be in descending order, you would need to specify the parameter, as in mySortingVector(someData, sortAscending=FALSE). Remember that the same rules for calling parameters by value and calling parameters by position apply for your functions as well as built-in functions.

Best practices

You should think of a function as a black box: you supply it some data and settings, it does its work, and it returns its value. It should not require anything that is not internally defined or supplied through a parameter.

For example, suppose you had written the coefficient of variation function like this:

coeffVar <- function(x) {
  CV <- mean(x)/sd

Notice that the first line within the function requires a variable called sd, and not a function call like sd(x). In other words, our function assumes that the sd has been calculated. No parameter has been created for this standard deviation, so if we just called our function, we would usually get an error that no object named sd was found. However, if we had an object called sd outside of our function, our function would be able to see it and use it, and the function would work with no errors or warning. This is bad, bad, bad, and you should avoid doing this! Instead, you should set your functions up so that everything it needs is either supplied as a parameter or calculated inside from those parameters.