A linear equation is any equation of the form y = ax + bz, that is, y as a sum of a set of one or more coefficients, each multiplied by a variable. Such an equation is said to be linear in its parameters (the coefficients a and b in this example) and its variables (here, x and z). Any equation that cannot be put into this form is said to be non-linear.
Some equations may not appear to be linear at first glance, but with a simple substitution, can be placed in linear form. For example, the equation y = ax + bx2 is linear because x2 could be replaced with a new variable z, converting the equation to y = ax + bz. Likewise, y = ax2 + c ex is also linear because with two substitutions, w=x2 and z=ex, the equation becomes y = aw + cz and is now in linear form.
Some equations cannot be recast into linear form by substitutions. For example, there is no such substitution for y = e(a+bx) that can place it in linear form. When this is the case, we must perform nonlinear least-squares regression, easily done in R with nls().
Chapter 20 of The R Book by Michael J. Crawley is an excellent treatment of non-linear regression, and it includes a helpful table (20.1) of twelve common non-linear functions. If you need to fit a non-linear function, start with this chapter.
One common non-linear equation is the Michaelis-Menten function, which is used to describe the reaction rate as a function of substrate concentration. The Michaelis-Menten function is characterized by a steep rise that gradually flattens into a plateau. Initially, small increases in substrate concentration cause a substantial increase in reaction rate, but eventually, additional increases in concentration cause progressively smaller increases in reaction rate. Eventually reaction rate plateaus and does not respond to increases in concentration. The Michaelis-Menten function has this form:
where r is the growth rate (the dependent variable), C is the concentration (the independent variable), Vm is the maximum rate in the system, and K is the concentration at which the reaction rate is Vm/2. These last two parameters (Vm and K) are the Michaelis-Menten parameters to be fit.
To fit this model, begin by entering the data and plotting it.
concentration <- c(1,2,3,5,10,15,20,25,30,35);
rate <- c(2.8,4.2,3.5,6.3,15.7,21.3,23.7,25.1,25.8,25.9)
plot(concentration, rate, las=1, pch=16)
The plot suggests that the Michaelis-Menten function might be a good model.
To fit the function, use nls(), which takes two arguments. The first argument is the model formula, specifying that rate is a function of Vm multiplied by concentration, all divided by K plus concentration. Notice that the I() function is not needed in nls() to specify that the arithmetic operators should keep their algebraic interpretations (if we wrote this same formula in lm(), we would have needed to wrap our formulat in I()).The second argument provides the fitting process with a starting estimate, the initial guesses for Vm and K.
mmModel <- nls(rate~Vm*concentration/(K+concentration), start=list(Vm=30, K=25))
As for a linear regression, view the output with summary(). The fitted values of Vm and K can be extracted with the coef() function.
Adding the fitted function to our data plot takes a couple of steps. First, create a range over which to evaluate the function, then evaluate the function over that range, and finally add the points to the plot with the points() function. One might be tempted to evaluate the function by extracting the coefficients for the model and writing out the model, but it is much simpler and less error-prone to use predict().
x <- seq(min(concentration), max(concentration), length=100)
y <- predict(mmModel, list(concentration=x))
points(x, y, type='l', col='blue')
Another common model is logistic growth of a population, which is characterized by slow initial growth, followed by a period of rapid growth, terminated by a period of slowing growth, which produces a characteristic sigmoidal relationship. The logistic growth model can be written as
where P is population size (the dependent variable), t is time (the independent variable), with the logistic parameters of Po (initial population size), r (growth rate), and K (carrying capacity, or final population size), which are to be fitted by the regression.
Again, start by entering the data and plotting it.
time <- c(1,2,3,5,10,15,20,25,30,35)
population <- c(2.8,4.2,3.5,6.3,15.7,21.3,23.7,25.1,25.8,25.9)
plot(time, population, las=1, pch=16)
This equation can also be fit with the nls() function, with initial guesses for the logistic parameters.
logisticModel <- nls(population~K/(1+exp(Po+r*time)), start=list(Po=5, r=2, K=5))
Non-linear fits can be sensitive to the initial guesses. If these initial guesses are poorly chosen as in the previous line of code, you may get a singular-gradient error:
Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Error in nls(population ~ (K * Po * exp(r * time))/(K + Po * (exp(r * :
For this data, the initial population size seems close to zero and the plot appears to plateau near 30, so these can be taken as better initial guesses for Po and K. To get an initial guess for r, select a point in the middle of the data set, substitute values of P and t into the equation, along with the initial guesses for for Po and K, and solve for r. The model can then be specified with these appropriate guesses for the parameters, with better results.
logisticModel <- nls(population~K/(1+exp(Po+r*time)), start=list(Po=0, r=-0.211, K=30))
View the regression with summary(), and extract the fitted parameters with coef().
As in the Michaelis-Menton example, adding the fitted curve requires evaluating the function over the range of the data, and adding it with the points() function. The easiest way to evaluate the fitted function is with predict().
x <- seq(min(time), max(time), length=100)
y <- predict(logisticModel, list(time=x))
points(x, y, type='l', col='blue')
The added step of finding good starting parameters can be bypassed by using the self-starting nls logistic model with SSlogis(). This function creates an initial estimate of the parameters, which can then be fed to the nls() function to find the best-fit logistic equation. The SSlogis() function fits a slightly different parameterization of the logistic function:
where Asym is the carrying capacity, xmid is the x value at the inflection point of the logistic curve, and scal is a scaling parameter for the x-axis. These parameters are easily converted to the previous form used for the logistic.
logisticModelSS <- nls(population~SSlogis(time, Asym, xmid, scal))
Similar self-starting models are also available for the Michaelis-Menten model and several other functions, and they are all part of the built-in stats package.
Crawley, M.J., 2013. The R Book. Wiley: Chichester, 1051 p.