I was determined to exactly reproduce a composite box-and-whisker plot that I had been seeing in the book Introduction to Statistical Learning. The data are from a set of 1,250 observations called Smarket that is found in the R package that goes with the book, known as ISLR
. It’s a pretty simple plot:
First, after loading the package with library(ISLR)
, one may want to take a cursory look at the data frame, which has 9 variable columns, using View(head(Smarket))
.
I was able to successfully put together the plot with this code:
ylabel <- "Percentage change in S&P" xlabel <- "Today's Direction" valnames <- c("Down", "Up") hue <- c("blue", "red") layout(matrix(c(1, 2, 3), nrow = 1, ncol = 3, byrow = TRUE)) boxplot(Lag1 ~ Direction, data = df, ylab = ylabel, xlab = xlabel, names = valnames, col = hue, main = "Yesterday") boxplot(Lag2 ~ Direction, data = df, ylab = ylabel, xlab = xlabel, names = valnames, col = hue, main = "Two Days Previous") boxplot(Lag3 ~ Direction, data = df, ylab = ylabel, xlab = xlabel, names = valnames, col = hue, main = "Three Days Previous") layout(1)
I realised, however, that I was seriously violating the DRY principle, so I tried to come up with a function instead. I struggled a bit with this because I didn’t know how to supply a character vector argument and place it into the “formula” bit that is required as the first argument of the version of the boxplot()
function I had used i.e. the y ~ x
part.
After scouring the documentation a bit – I tried out as.name()
, messed around with sQuote()
and dQuote()
, all to no avail – I discovered the documentation behind ?formula
, and BAMMM!, I got it.
This is how I put it to get the exact same plot:
box.stock <- function(dat, column, tt) { par(mfrow = c(1, 3)) for (i in 1:3) { boxplot(as.formula(paste(column[i], "~ Direction")), data = dat, ylab = "Percentage change in S&P", xlab = "Today's Direction", names = c("Down", "Up"), col = c("blue", "red"), main = tt[i]) } } choice <- c("Lag1", "Lag2", "Lag3") title <- c("Yesterday", "Two Days Previous", "Three Days Previous") box.stock(df, choice, title)
Now how’s that?
This experience helped me to better appreciate the value of using functions in general. Of course, with a little tweaking, the above code could be used to draw many more plots, indeed as much as is within available computing or application resources.
Reblogged this on The Opportunist.
LikeLike