Chris Chapman, Google & Kenneth Fairchild, Sawtooth Software
March 2018
Sawtooth Software Conference
Orlando, FL
These slides: https://goo.gl/ACdG6B
Slide code: https://goo.gl/PjBceN
Key Points
Switch to RStudio and look at the elements of the R IDE (integrated development environment)
We'll use a data set from Chapman & Feit (2015).
Let's switch to R and walk through some typical code …
satData <- read.csv("http://r-marketing.r-forge.r-project.org/data/rintro-chapter2.csv")
# examine the data
head(satData)
str(satData)
summary(satData)
# convert Segment to a factor variable
satData$Segment <- factor(satData$Segment)
summary(satData)
# Satisfaction by segment
library(ggplot2)
by(satData$iProdSAT, satData$Segment, mean_se, mult=1.96)
###### EXAMPLE OF PLOTTING
### first aggregate the data we want to plot
# ... there are other ways to do this, but this illustrates data frame manipulation
# get the mean and standard errors
prod.sat.seg <- aggregate(satData$iProdSAT, list(satData$Segment), mean_se, mult=1.96)
str(prod.sat.seg)
# coerce those to a nice data frame
prod.sat.seg <- data.frame(prod.sat.seg$Group.1, lapply(data.frame(prod.sat.seg$x), unlist))
prod.sat.seg
# and label the columns to be more readable
names(prod.sat.seg) <- c("Segment", "average", "lowerCI", "upperCI")
prod.sat.seg
### the plot itself ...
# now plot the interquartile range
p <- ggplot(data=prod.sat.seg,
aes(x=Segment, y=average, ymax=upperCI, ymin=lowerCI)) +
geom_point() +
geom_errorbar()
p
# color the points and make them larger
p <- ggplot(data=prod.sat.seg,
aes(x=Segment, y=average, ymax=upperCI, ymin=lowerCI)) +
geom_point(aes(color=Segment), size=3) + # <=======
geom_errorbar()
p
# color the error bars and make them narrower
p <- ggplot(data=prod.sat.seg,
aes(x=Segment, y=average, ymax=upperCI, ymin=lowerCI)) +
geom_point(aes(color=Segment), size=3) +
geom_errorbar(aes(color=Segment), width=0.3) # <=======
p
# adjust the Y axis range
p <- ggplot(data=prod.sat.seg,
aes(x=Segment, y=average, ymax=upperCI, ymin=lowerCI)) +
geom_point(aes(color=Segment), size=3) +
geom_errorbar(aes(color=Segment), width=0.3) +
coord_cartesian(ylim=c(1, 5)) # <========
p
# add some titles to be more readable
p <- ggplot(data=prod.sat.seg,
aes(x=Segment, y=average, ymax=upperCI, ymin=lowerCI)) +
geom_point(aes(color=Segment), size=3) +
geom_errorbar(aes(color=Segment), width=0.3) +
coord_cartesian(ylim=c(1, 5)) +
ggtitle("Average Sat and Confidence Interval by Segment") + # <=====
ylab("Mean satisfaction and 95% CI")
p
########## CORRELATIONS
# correlation matrix
library(corrplot)
corrplot(cor(satData[ , -3]))
# tinker with the plot
corrplot.mixed(cor(satData[ , -3]))
#### ADVANCED MODELING. EXAMPLE: SAT/REC STRUCTURAL MODEL
# define a structural model for Satisfaction and Recommendation
satModel <- "SAT =~ iProdSAT + iSalesSAT
REC =~ iProdREC + iSalesREC
REC ~ SAT "
# fit the structural model
library(lavaan)
sat.fit <- cfa(satModel, data=satData)
# look at the fit
summary(sat.fit, fit.measures=TRUE)
# plot the structural model
library(semPlot)
semPaths(sat.fit, what="est", nCharNodes=9, residuals=FALSE)
Pro
1. Extreme power and precision. It does exactly what you want.
2. Complete flexibility at every step
3. Once script is done it is reusable and re-runnable
Con
1. It only does exactly what you tell it. Defaults are missing or ugly.
2. You have to write code.
3. Everything takes longer … the first time.
To learn R, you will need:
Only try advanced/new methods after you are fluent in the basics.