How to plot multiple data series in ggplot for quality graphs?

plot multiple data series with ggplot

I've already shown how to plot multiple data series in R with a traditional plot by using the par(new=T), par(new=F) trick. Now I'll show how to do it within ggplot2.

First let's generate two data series y1 and y2 and plot them with the traditional points methods

x <- seq(0, 4 * pi, 0.1)
n <- length(x)
y1 <- 0.5 * runif(n) + sin(x)
y2 <- 0.5 * runif(n) + cos(x) - sin(x)
plot(x, y1, col = "blue", pch = 20)
points(x, y2, col = "red", pch = 20)

This is exactly the R code that produced the above plot. It is just a simple plot and points functions to plot multiple data series. It is not really the greatest, smart looking R code you want to use. Better plots can be done in R with ggplot.

Plotting with Ggplot2

Now, let's try this with ggplot2.

First we need to create a data.frame with our series.

If we have very few series we can just plot adding geom_point as needed.

library(ggplot2)
df <- data.frame(x, y1, y2)
ggplot(df, aes(x, y = value, color = variable)) + 
    geom_point(aes(y = y1, col = "y1")) + 
    geom_point(aes(y = y2, col = "y2"))

But if we have many series to plot an alternative is using melt to reshape the data.frame and with this plot an arbitrary number of rows. For example:

library(reshape)
# This creates a new data frame with columns x, variable and value
# x is the id, variable holds each of our timeseries designation
df.melted <- melt(df, id = "x")
 
ggplot(data = df.melted, aes(x = x, y = value, color = variable)) +
  geom_point()

And thats how to plot multiple data series using ggplot. The basic trick is that you need to melt your data into a new data.frame. Remember, in data.frames each row represents an observation.

update:

Another option, pointed to me in the comments by Cosmin Saveanu (Thanks!), it to plot the multiple data series with facets (good for B&W):

library(reshape)
ggplot(data = df.melted, aes(x = x, y = value)) +
geom_point() + facet_grid(variable ~ .)

How to plot multiple data series in R?

plot multiple data series - Multiple plots in R

I usually use ggplot2 to plot multiple data series, but if I don’t use ggplot2, there are TWO simple ways to plot multiple data series in R. I’ll go over both today.

Matlab users can easily plot multiple data series in the same figure. They use hold on and plot the data series as usual. Every data series goes into the same plot until they use hold off.

But can the same thing be done in R? R is getting big as a programming language so plotting multiple data series in R should be trivial.

The R points and lines way

Solution 1: just plot one data series and then use the points or lines commands to plot the other data series in the same figure, creating the multiple data series plot:

> plot(time, series1, type='l', xlab='t /s', ylab='s1')
> points(time, series2, type='l')

Plot Multiple Data Series the Matlab way

Solution 2: this one mimics Matlab hold on/off behaviour. It uses the new parameter of graphical devices. Let’s see how:

Setting new to TRUE tells R NOT to clean the previous frame before drawing the new one. It’s a bit counter intuitive but R is saying “Hey, theres a new plot for the same figure so don’t erase whatever is there before plotting the new data series“.

Example (plot series2 on the same plot as series1):

> plot(time, series1, type='l', xlim=c(0.0,20.0), 
+ ylim=c(0.0,1.0), xlab='t /s', ylab='s1')
> par(new=T)
> plot(time, series2, type='l', xlim=c(0.0,20.0), 
+ ylim=c(0.0,1.0), xlab='', ylab='', axes=F)
> par(new=F)

The par(new=T) tells R to make the second plot without cleaning the first. Two things to consider though: in the second set axes to FALSE, and xlabel and ylabel to empty strings or in the final result you’ll see some overlapping and bleeding of the several labels and axes.

Finally, because of all this superimposing you need to know your axes ranges and set them up equally in all plot commands (xlim, and ylim in this example are set to the range [0,20] and [0,1]).

R doesn’t automatically adjust the axes, as it doesn’t use the first frame as reference or the multiple data series. You need to supply these values or you’ll end up with a wrong looking plot like Marge Simpson’s hair.

In conclusion, either solution will work to plot multiple data series inside R, but sometimes one will be better than the other. Sometimes your data series represent different properties and you’ll need to specify the y ranges individually. In this case the latter option might be useful. Other times you just want a quick exploratory data analysis plot, or your data series are measuring the same property and the former method suffices.