Advertisement
Advertisement


ggplot with 2 y axes on each side and different scales


Question

I need to plot a bar chart showing counts and a line chart showing rate all in one chart, I can do both of them separately, but when I put them together, I scale of the first layer (i.e. the geom_bar) is overlapped by the second layer (i.e. the geom_line).

Can I move the axis of the geom_line to the right?

2019/05/27
1
237
5/27/2019 7:58:38 PM

Accepted Answer

Sometimes a client wants two y scales. Giving them the "flawed" speech is often pointless. But I do like the ggplot2 insistence on doing things the right way. I am sure that ggplot is in fact educating the average user about proper visualization techniques.

Maybe you can use faceting and scale free to compare the two data series? - e.g. look here: https://github.com/hadley/ggplot2/wiki/Align-two-plots-on-a-page

2017/11/28
110
11/28/2017 9:11:04 PM


Starting with ggplot2 2.2.0 you can add a secondary axis like this (taken from the ggplot2 2.2.0 announcement):

ggplot(mpg, aes(displ, hwy)) + 
  geom_point() + 
  scale_y_continuous(
    "mpg (US)", 
    sec.axis = sec_axis(~ . * 1.20, name = "mpg (UK)")
  )

enter image description here

2018/07/21

Taking above answers and some fine-tuning (and for whatever it's worth), here is a way of achieving two scales via sec_axis:

Assume a simple (and purely fictional) data set dt: for five days, it tracks the number of interruptions VS productivity:

        when numinter prod
1 2018-03-20        1 0.95
2 2018-03-21        5 0.50
3 2018-03-23        4 0.70
4 2018-03-24        3 0.75
5 2018-03-25        4 0.60

(the ranges of both columns differ by about factor 5).

The following code will draw both series that they use up the whole y axis:

ggplot() + 
  geom_bar(mapping = aes(x = dt$when, y = dt$numinter), stat = "identity", fill = "grey") +
  geom_line(mapping = aes(x = dt$when, y = dt$prod*5), size = 2, color = "blue") + 
  scale_x_date(name = "Day", labels = NULL) +
  scale_y_continuous(name = "Interruptions/day", 
    sec.axis = sec_axis(~./5, name = "Productivity % of best", 
      labels = function(b) { paste0(round(b * 100, 0), "%")})) + 
  theme(
      axis.title.y = element_text(color = "grey"),
      axis.title.y.right = element_text(color = "blue"))

Here's the result (above code + some color tweaking):

two scales in one ggplot2

The point (aside from using sec_axis when specifying the y_scale is to multiply each value the 2nd data series with 5 when specifying the series. In order to get the labels right in the sec_axis definition, it then needs dividing by 5 (and formatting). So a crucial part in above code is really *5 in the geom_line and ~./5 in sec_axis (a formula dividing the current value . by 5).

In comparison (I don't want to judge the approaches here), this is how two charts on top of one another look like:

two charts above one another

You can judge for yourself which one better transports the message (“Don’t disrupt people at work!”). Guess that's a fair way to decide.

The full code for both images (it's not really more than what's above, just complete and ready to run) is here: https://gist.github.com/sebastianrothbucher/de847063f32fdff02c83b75f59c36a7d a more detailed explanation here: https://sebastianrothbucher.github.io/datascience/r/visualization/ggplot/2018/03/24/two-scales-ggplot-r.html


There are common use-cases duel y axes, e.g., the climatograph showing monthly temperature and precipitation. Here is a simple solution, generalized from Megatron's solution by allowing you to set the lower limit of the variables to something else than zero:

Example data:

climate <- tibble(
  Month = 1:12,
  Temp = c(-4,-4,0,5,11,15,16,15,11,6,1,-3),
  Precip = c(49,36,47,41,53,65,81,89,90,84,73,55)
  )

Set the following two values to values close to the limits of the data (you can play around with these to adjust the positions of the graphs; the axes will still be correct):

ylim.prim <- c(0, 180)   # in this example, precipitation
ylim.sec <- c(-4, 18)    # in this example, temperature

The following makes the necessary calculations based on these limits, and makes the plot itself:

b <- diff(ylim.prim)/diff(ylim.sec)
a <- b*(ylim.prim[1] - ylim.sec[1])

ggplot(climate, aes(Month, Precip)) +
  geom_col() +
  geom_line(aes(y = a + Temp*b), color = "red") +
  scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
  scale_x_continuous("Month", breaks = 1:12) +
  ggtitle("Climatogram for Oslo (1961-1990)")  

Climatogram showing temperature as line and precipitation as barplot

If you want to make sure that the red line corresponds to the right-hand y axis, you can add a theme sentence to the code:

ggplot(climate, aes(Month, Precip)) +
  geom_col() +
  geom_line(aes(y = a + Temp*b), color = "red") +
  scale_y_continuous("Precipitation", sec.axis = sec_axis(~ (. - a)/b, name = "Temperature")) +
  scale_x_continuous("Month", breaks = 1:12) +
  theme(axis.line.y.right = element_line(color = "red"), 
        axis.ticks.y.right = element_line(color = "red"),
        axis.text.y.right = element_text(color = "red"), 
        axis.title.y.right = element_text(color = "red")
        ) +
  ggtitle("Climatogram for Oslo (1961-1990)")

which colors the right-hand axis:

Climatogram with red right-hand axis

2019/11/19

You can create a scaling factor which is applied to the second geom and right y-axis. This is derived from Sebastian's solution.

library(ggplot2)

scaleFactor <- max(mtcars$cyl) / max(mtcars$hp)

ggplot(mtcars, aes(x=disp)) +
  geom_smooth(aes(y=cyl), method="loess", col="blue") +
  geom_smooth(aes(y=hp * scaleFactor), method="loess", col="red") +
  scale_y_continuous(name="cyl", sec.axis=sec_axis(~./scaleFactor, name="hp")) +
  theme(
    axis.title.y.left=element_text(color="blue"),
    axis.text.y.left=element_text(color="blue"),
    axis.title.y.right=element_text(color="red"),
    axis.text.y.right=element_text(color="red")
  )

enter image description here

Note: using ggplot2 v3.0.0

2018/08/16

The technical backbone to the solution of this challenge has been provided by Kohske some 3 years ago [KOHSKE]. The topic and the technicalities around its solution have been discussed on several instances here on Stackoverflow [IDs: 18989001, 29235405, 21026598]. So i shall only provide a specific variation and some explanatory walkthrough, using above solutions.

Let us assume we do have some data y1 in group G1 to which some data y2 in group G2 is related in some way, e.g. range/scale transformed or with some noise added. So one wants to plot the data together on one plot with the scale of y1 on the left and y2 on the right.

  df <- data.frame(item=LETTERS[1:n],  y1=c(-0.8684, 4.2242, -0.3181, 0.5797, -0.4875), y2=c(-5.719, 205.184, 4.781, 41.952, 9.911 )) # made up!

> df
  item      y1         y2
1    A -0.8684 -19.154567
2    B  4.2242 219.092499
3    C -0.3181  18.849686
4    D  0.5797  46.945161
5    E -0.4875  -4.721973

If we now plot our data together with something like

ggplot(data=df, aes(label=item)) +
  theme_bw() + 
  geom_segment(aes(x='G1', xend='G2', y=y1, yend=y2), color='grey')+
  geom_text(aes(x='G1', y=y1), color='blue') +
  geom_text(aes(x='G2', y=y2), color='red') +
  theme(legend.position='none', panel.grid=element_blank())

it doesnt align nicely as the smaller scale y1 obviosuly gets collapsed by larger scale y2.

The trick here to meet the challenge is to techncially plot both data sets against the first scale y1 but report the second against a secondary axis with labels showing the original scale y2.

So we build a first helper function CalcFudgeAxis which calculates and collects features of the new axis to be shown. The function can be amended to ayones liking (this one just maps y2 onto the range of y1).

CalcFudgeAxis = function( y1, y2=y1) {
  Cast2To1 = function(x) ((ylim1[2]-ylim1[1])/(ylim2[2]-ylim2[1])*x) # x gets mapped to range of ylim2
  ylim1 <- c(min(y1),max(y1))
  ylim2 <- c(min(y2),max(y2))    
  yf <- Cast2To1(y2)
  labelsyf <- pretty(y2)  
  return(list(
    yf=yf,
    labels=labelsyf,
    breaks=Cast2To1(labelsyf)
  ))
}

what yields some:

> FudgeAxis <- CalcFudgeAxis( df$y1, df$y2 )

> FudgeAxis
$yf
[1] -0.4094344  4.6831656  0.4029175  1.0034664 -0.1009335

$labels
[1] -50   0  50 100 150 200 250

$breaks
[1] -1.068764  0.000000  1.068764  2.137529  3.206293  4.275058  5.343822


> cbind(df, FudgeAxis$yf)
  item      y1         y2 FudgeAxis$yf
1    A -0.8684 -19.154567   -0.4094344
2    B  4.2242 219.092499    4.6831656
3    C -0.3181  18.849686    0.4029175
4    D  0.5797  46.945161    1.0034664
5    E -0.4875  -4.721973   -0.1009335

Now I wraped Kohske's solution in the second helper function PlotWithFudgeAxis (into which we throw the ggplot object and helper object of the new axis):

library(gtable)
library(grid)

PlotWithFudgeAxis = function( plot1, FudgeAxis) {
  # based on: https://rpubs.com/kohske/dual_axis_in_ggplot2
  plot2 <- plot1 + with(FudgeAxis, scale_y_continuous( breaks=breaks, labels=labels))

  #extract gtable
  g1<-ggplot_gtable(ggplot_build(plot1))
  g2<-ggplot_gtable(ggplot_build(plot2))

  #overlap the panel of the 2nd plot on that of the 1st plot
  pp<-c(subset(g1$layout, name=="panel", se=t:r))
  g<-gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="panel")]], pp$t, pp$l, pp$b,pp$l)

  ia <- which(g2$layout$name == "axis-l")
  ga <- g2$grobs[[ia]]
  ax <- ga$children[[2]]
  ax$widths <- rev(ax$widths)
  ax$grobs <- rev(ax$grobs)
  ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
  g <- gtable_add_cols(g, g2$widths[g2$layout[ia, ]$l], length(g$widths) - 1)
  g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)

  grid.draw(g)
}

Now all can be put together: Below code shows, how the proposed solution could be used in a day-to-day environment. The plot call now doesnt plot the original data y2 anymore but a cloned version yf (held inside the pre-calculated helper object FudgeAxis), which runs of the scale of y1. The original ggplot objet is then manipulated with Kohske's helper function PlotWithFudgeAxis to add a second axis preserving the scales of y2. It plots as well the manipulated plot.

FudgeAxis <- CalcFudgeAxis( df$y1, df$y2 )

tmpPlot <- ggplot(data=df, aes(label=item)) +
      theme_bw() + 
      geom_segment(aes(x='G1', xend='G2', y=y1, yend=FudgeAxis$yf), color='grey')+
      geom_text(aes(x='G1', y=y1), color='blue') +
      geom_text(aes(x='G2', y=FudgeAxis$yf), color='red') +
      theme(legend.position='none', panel.grid=element_blank())

PlotWithFudgeAxis(tmpPlot, FudgeAxis)

This now plots as desired with two axis, y1 on the left and y2 on the right

2 axes

Above solution is, to put it straight, a limited shaky hack. As it plays with the ggplot kernel it will throw some warnings that we exchange post-the-fact scales, etc. It has to be handled with care and may produce some undesired behaviour in another setting. As well one may need to fiddle around with the helper functions to get the layout as desired. The placement of the legend is such an issue (it would be placed between the panel and the new axis; this is why I droped it). The scaling / alignment of the 2 axis is as well a bit challenging: The code above works nicely when both scales contain the "0", else one axis gets shifted. So definetly with some opportunities to improve...

In case on wants to save the pic one has to wrap the call into device open / close:

png(...)
PlotWithFudgeAxis(tmpPlot, FudgeAxis)
dev.off()
2016/05/14

Source: https://stackoverflow.com/questions/3099219
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]