Dancing between character vectors and formula objects

I was determined to exactly reproduce a composite box-and-whisker plot that I had been seeing in the book Introduction to Statistical Learning. The data are from a set of 1,250 observations called Smarket that is found in the R package that goes with the book, known as ISLR. It’s a pretty simple plot:

3 boxplots of stock market data

First, after loading the package with library(ISLR), one may want to take a cursory look at the data frame, which has 9 variable columns, using View(head(Smarket)).

I was able to successfully put together the plot with this code:

ylabel <- "Percentage change in S&P"
xlabel <- "Today's Direction"
valnames <- c("Down", "Up")
hue <- c("blue", "red")

layout(matrix(c(1, 2, 3), nrow = 1, ncol = 3, byrow = TRUE))
boxplot(Lag1 ~ Direction, data = df,
        ylab = ylabel, xlab = xlabel,
        names = valnames,
        col = hue,
        main = "Yesterday")

boxplot(Lag2 ~ Direction, data = df,
        ylab = ylabel, xlab = xlabel,
        names = valnames,
        col = hue,
        main = "Two Days Previous")

boxplot(Lag3 ~ Direction, data = df,
        ylab = ylabel, xlab = xlabel,
        names = valnames,
        col = hue,
        main = "Three Days Previous")
layout(1)

I realised, however, that I was seriously violating the DRY principle, so I tried to come up with a function instead. I struggled a bit with this because I didn’t know how to supply a character vector argument and place it into the “formula” bit that is required as the first argument of the version of the boxplot() function I had used i.e. the y ~ x part.

After scouring the documentation a bit – I tried out as.name(), messed around with sQuote() and dQuote(), all to no avail – I discovered the documentation behind ?formula, and BAMMM!, I got it.

This is how I put it to get the exact same plot:

box.stock <- function(dat, column, tt) {
    par(mfrow = c(1, 3))
    for (i in 1:3) {
        boxplot(as.formula(paste(column[i], "~ Direction")),
                data = dat,
                ylab = "Percentage change in S&P",
                xlab = "Today's Direction",
                names = c("Down", "Up"), 
                col = c("blue", "red"),
                main = tt[i])
    }
}

choice <- c("Lag1", "Lag2", "Lag3")
title <- c("Yesterday", "Two Days Previous", "Three Days Previous")

box.stock(df, choice, title)

Now how’s that?

This experience helped me to better appreciate the value of using functions in general. Of course, with a little tweaking, the above code could be used to draw many more plots, indeed  as much as is within available computing or application resources.

 

One thought on “Dancing between character vectors and formula objects

Comments

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s