World Top Coffee Drinkers

Who are the world’s top coffee drinkers?

Coffee Drinkers

World's Top 30 Coffee Import - Are We Coffee Drinkers?

I’ve been playing a bit with R and experimenting with different data sets. The above plot is of the Top 30 coffee import countries ordered by their population. I fond a bit surprising some facts:

  • The US is the biggest coffee importer in the world, but if we divide the coffee imports by the population they are low in this ranking, even below Portugal.
  • Belgium! Why is belgium first? They have a similar population to Portugal, but the amount of coffee they import is almost 10 times bigger.

This is obviously just an exercise on data manipulation but some curiosities arise. Wikipedia has a page on Coffee consumption per capita that reorders these countries in a different way: Belgium comes 8th and the top spot is for Finland (Fins drink 4x the coffee we drink here in Portugal!), so don’t take this plot to seriously.

Data sources: Coffee Imports by country – Google Fusion Tables World Pupulation – Wikipedia

How to plot multiple data series in R?

plot multiple data series - Multiple plots in R

I usually use ggplot2 to plot multiple data series, but if I don’t use ggplot2, there are TWO simple ways to plot multiple data series in R. I’ll go over both today.

Matlab users can easily plot multiple data series in the same figure. They use hold on and plot the data series as usual. Every data series goes into the same plot until they use hold off.

But can the same thing be done in R? R is getting big as a programming language so plotting multiple data series in R should be trivial.

The R points and lines way

Solution 1: just plot one data series and then use the points or lines commands to plot the other data series in the same figure, creating the multiple data series plot:

> plot(time, series1, type='l', xlab='t /s', ylab='s1')
> points(time, series2, type='l')

Plot Multiple Data Series the Matlab way

Solution 2: this one mimics Matlab hold on/off behaviour. It uses the new parameter of graphical devices. Let’s see how:

Setting new to TRUE tells R NOT to clean the previous frame before drawing the new one. It’s a bit counter intuitive but R is saying “Hey, theres a new plot for the same figure so don’t erase whatever is there before plotting the new data series“.

Example (plot series2 on the same plot as series1):

> plot(time, series1, type='l', xlim=c(0.0,20.0), 
+ ylim=c(0.0,1.0), xlab='t /s', ylab='s1')
> par(new=T)
> plot(time, series2, type='l', xlim=c(0.0,20.0), 
+ ylim=c(0.0,1.0), xlab='', ylab='', axes=F)
> par(new=F)

The par(new=T) tells R to make the second plot without cleaning the first. Two things to consider though: in the second set axes to FALSE, and xlabel and ylabel to empty strings or in the final result you’ll see some overlapping and bleeding of the several labels and axes.

Finally, because of all this superimposing you need to know your axes ranges and set them up equally in all plot commands (xlim, and ylim in this example are set to the range [0,20] and [0,1]).

R doesn’t automatically adjust the axes, as it doesn’t use the first frame as reference or the multiple data series. You need to supply these values or you’ll end up with a wrong looking plot like Marge Simpson’s hair.

In conclusion, either solution will work to plot multiple data series inside R, but sometimes one will be better than the other. Sometimes your data series represent different properties and you’ll need to specify the y ranges individually. In this case the latter option might be useful. Other times you just want a quick exploratory data analysis plot, or your data series are measuring the same property and the former method suffices.