Lecture Notes



Steven Holland

Non-Metric Multidimensional Scaling (NMS)


Nonmetric multidimensional scaling (NMS, also abbreviated NMDS and MDS) is an ordination technique that differs in five important ways from nearly all other ordination methods.

• Most ordination methods calculate many axes, but display only a few of those for reasons of practicality. In contrast, NMS calculates only a limited number of axes that are explicitly chosen prior to the analysis. As a result, there are no hidden axes of variation.

• Most other ordination methods are analytical and therefore result in a single unique solution to a set of data. In contrast, NMS is a numerical technique that iteratively seeks a solution and stops computation when an acceptable solution has been found, or it stops after some pre-specified number of attempts. As a result, an NMS ordination is not a unique solution; a subsequent NMS of the same data will likely result in a somewhat different ordination.

• NMS is not an eigenanalysis technique like principal components analysis or correspondence analysis. As a result, the axes cannot be interpreted such that axis 1 explains the greatest amount of variance, axis 2 explains the next greatest amount of variance, and so on. As a result, an NMS ordination can be rotated, inverted, or centered to any desired configuration.

• NMS makes few assumptions about the nature of the data. For example, principal components analysis assumes linear relationships and reciprocal averaging assumes modal relationships. NMS makes neither of these assumptions, so is well suited for a wide variety of data.

• NMS allows the use of any distance measure of the samples, unlike other methods which specify particular measures, such as covariance or correlation in PCA or the implied chi-squared measure in detrended correspondence analysis.

Although NMS is a highly flexible and widely applicable technique, it suffers from two principal drawbacks. First, NMS is slow, particularly for large data sets. Second, because NMS is a numerical optimization technique, it can fail to find the true best solution because it can become stuck on local minima, that is, solutions that are not the best solution but that are better than all nearby solutions. Increasing computational speed is solving both of these problems: large data sets can now be run relatively quickly, and multiple restarts can be used to lessen the chances of a solution getting stuck on a local minimum.


The method underlying NMS is straightforward, but computationally demanding. First, it starts with a matrix of data consisting of n rows of samples and p columns of variables (species or taxa in ecological data). There should be no missing values: every variable should have a value for every sample, and this value may be zero. From this, a n x n symmetrical matrix of all pairwise distances among samples is calculated with an appropriate distance measure, such as Euclidean distance, Manhattan distance (city block distance), or Bray (Sorenson) distance. The NMS ordination is performed on this distance matrix.

Next, a desired number of k dimensions is chosen for the ordination. The resulting ordination can be greatly sensitive to this number of chosen dimensions. For example, a k-dimensional ordination is not equivalent to the first k dimensions of a k+1-dimensional ordination.

NMS begins by constructing an initial configuration of the samples in the k dimensions. This initial configuration could be based on another ordination or it could consist of an entirely random placement of the samples. The final ordination is partly dependent on this initial configuration, so a variety of approaches are used to avoid the problem of local minima. One approach is to perform several ordinations, each starting from a different random placement of points, and to select the ordination with the best fit. Another approach is to perform a different type of ordination, such as a principal components analysis or a higher-order NMS, and to use k axes from that ordination as the initial configuration. A third approach, useful for data thought to be geographically arrayed, is to use the geographic locations of samples as a starting configuration.

Distances among samples in this starting configuration are calculated, typically with a Euclidean metric. These distances are regressed against the original distance matrix and the predicted ordination distances for each pair of samples is calculated. A variety of regression methods can be used, including linear, polynomial, and non-parametric approaches, the last of which stipulates only that the regression consistently increases from left to right. In any case, the regression is fitted by least-squares. In a perfect ordination, all ordinated distances would fall exactly on the regression, that is, they would match the rank order of distances in the original distance matrix. The goodness of fit of the regression is measured as the sum of squared differences between ordination-based distances and the distances predicted by the regression. This goodness of fit is called stress and can be calculated in several ways, with one of the most common being Kruskal’s Stress (formula 1)

Kruskals stress formula

where dhi is the ordinated distance between samples h and i, and d-hat is the distance predicted from the regression.

This configuration is improved by moving the positions of samples in ordination space by a small amount in the direction of steepest descent, the direction in which stress changes most rapidly. The ordination distance matrix is recalculated, the regression is performed again, and stress is recalculated. This entire procedure is performed repeatedly until some small specified tolerance value is achieved or until the procedure converges by failing to achieve any lower values of stress, which indicates that a minimum (perhaps local) has been found.


The ordination will be sensitive to the number of dimensions that is chosen, so this choice must be made with care. Choosing too few dimensions will force multiple axes of variation to be expressed on a single ordination dimension. Choosing too many dimensions is no better in that it can cause a single source of variation to be expressed on more than one dimension. One way to choose an appropriate number of dimensions is perform ordinations of progressively higher numbers of dimensions. A scree diagram (stress versus number of dimensions) can then be plotted, on which one can identify the point beyond which additional dimensions do not substantially lower the stress value. A second criterion for the appropriate number of dimensions is the interpretability of the ordination, that is, whether the results make sense.

The stress value reflects how well the ordination summarizes the observed distances among the samples. Stress increases both with the number of samples and with the number of variables. For the same underlying data structure, a larger data set will necessarily result in a higher stress value, so use caution when comparing stress among data sets. Stress can also be highly influenced by one or a few poorly fit samples, so it is important to check the contributions to stress among samples in an ordination. Several guidelines for a “good” value of stress have been proposed, but all have been criticized for being simplistic.

Although NMS seeks to preserve the distance relationships among the samples, it is still necessary to perform any transformations to obtain a meaningful ordination. For example, in ecological data, samples should be standardized by sample size to avoid ordinations that reflect primarily sample size, which is generally not of interest.

NMS in R

R has two main NMS functions available, monoMDS(), which is part of the MASS library, and metaMDS(), which is part of the vegan library. The metaMDS() routine allows greater automation of the ordination process, so is usually the preferred method. The metaMDS() function uses monoMDS() in its calculations as well as several helper functions.

By default, metaMDS() follows the ordination with a rotation via principal components analysis. This is a useful feature because it ensures that NMS axis 1 reflects the principal source of variation, and so on, as is characteristic of eigenvalue methods. Most other implementations of NMS do not perform their rotation, which means that the cloud of points in the ordination is entirely arbitrary; for example, the principal source of variation might be partly expressed on a combination of NMS axes 3 and 4. This greatly complicates interpretation of the ordination, so use caution in interpreting the results of these other implementations of NMS.

1) Download and install the vegan library, necessary for running the metaMDS() command.


2) Run an NMS on data set, with columns of variables and rows of samples. The vegan package is designed for ecological data, so the metaMDS() default settings are set with this in mind. For example, the distance metric defaults to Bray and common ecological data transformations are turned on by default. You can also change the number of axes you seek (e.g., setting k=3) and the number of restarts from random configurations (e.g., setting trymax=50) to avoid local minima.

mydata <- read.table('mydata.txt', header=TRUE, row.names=1, sep=',')
mydata.mds <- metaMDS(mydata)

3) View the items in the metaMDS object that is returned.


For convenience, it is often useful to extract sample and variable scores. The column numbers correspond to the NMS axes, so this will return as many columns as was specified with the k parameter in the call to metaMDS. Extracting the scores in this way is unnecessary if you are comfortable with $ notation.

variableScores <- mydata.mds$species
sampleScores <- mydata.mds$points

4) View the results of the NMS, which will display several elements of the list detailed above, including how the function was called (call), the data set and any transformations used (data), the distance measure (distance), the number of dimensions (dims), the final stress value (stress), whether any convergent solution was achieved (converged), how many random initial configurations were tried (tries), plus whether scores have been scaled, centered, or rotated.


5) Plot sample and variable scores in same space. Open black circles correspond to samples and red crosses indicate taxa.


simple plot

6) Because nothing is labelled, this default plot is not very helpful. Usually, you will want to customize an NMS plot. For example, it is often useful to display only the species or only the sites, which is done by setting display). It is also helpful to show the labels for the species or sites, which is done by specifying type='t'.

plot(mydata.mds, type='t', display=c('species'))

species plot

7) NMS plots are often customized as for other bivariate plots by setting type to “n” and plotting points and labels separately. If crowding of labels is a problem, setting cex to a value less than 1 will reduce the size of symbols and text. The choices parameter specifies which NMS axes to plot.

plot(mydata.mds, type='n')
points(mydata.mds, display=c('sites'), choices=c(1,2), pch=3, col='red')
text(mydata.mds, display=c('species'), choices=c(1,2), col='blue', cex=0.7)

full plot

8) For congested plots, the following code will display a plot with symbols for both samples and variables. Click on individual points that you would like identified. Press the escape key when you are done selecting your points to see their identities.

fig <- ordiplot(mydata.mds)
identify(fig, 'spec')

Non-ecological data

NMS can also be performed on non-ecological data, but with several differences:

Using the purdin data set used in the principal components lecture, an NMS can be performed and plotted, resulting in a similar configuration of sample points. First, read the data, cull it, and apply data transformation, ending with one that insures that all data values are positive.

purdin <- read.table('purdin.txt', header=TRUE, row.names=1, sep=',')
geochem <- purdin[ , 2:10]
geochem[ ,3:9] <- log10(geochem[ , 3:9])
geochem <- geochem[,-6]
# Because all of our data must be positive to get variable scores, we will shift each variable by its minimum.
minShift <- function(x) {x + abs(min(x))}
geochem <- apply(geochem, 2, minShift)

Run the NMS as for non-ecological data and display the results

geochemMDS <- metaMDS(geochem, distance='euclidean', k=3, trymax=50, autotransform=FALSE, wasscores=TRUE, noshare=FALSE)

Because we we will be making a custom plot, it is useful to extract the sample scores and variable scores.

variableScores <- geochemMDS$species
sampleScores <- geochemMDS$points

Also, we want to code the points by the formation that they come from, so we will make vectors that describe the formation, for convenience.

Carters <- purdin$StratPosition < 34.2
Hermitage <- purdin$StratPosition > 34.2

Make an empty plot, then add the points, color-coded by formation. Also add color-code labels instead of a legend.

plot(sampleScores[,1], sampleScores[,2], xlab='NMS 1', ylab='NMS 2', type='n', asp=1, las=1)
points(sampleScores[Carters,1], sampleScores[Carters,2], pch=16, cex=0.7, col='blue')
points(sampleScores[Hermitage,1], sampleScores[Hermitage,2], pch=16, cex=0.7, col='red')
text(1.2, 0.5, 'Carters', col='blue')
text(0.1, -1.0, 'Hermitage', col='red')

Add vectors for the variables.

arrows(0, 0, variableScores[,1], variableScores[,2], length=0.1, angle=20)
textNudge <- 1.2
text(variableScores[,1]*textNudge, variableScores[,2]*textNudge, rownames(variableScores), cex=0.7)

Purdin biplot

What to report

When reporting a non-metric multidimensional scaling, always include at least these items: