Chris Chapman and Elea McDonnell Feit
February 2016
Chapter 7: Identifying Drivers of Outcomes: Linear Models
Website for all data files:
http://r-marketing.r-forge.r-project.org/data.html
Data represents customer responses to a survey about their satisfaction with diferent aspects of their recent visit to an amusement park.
Image source: hersheypark.com
To load the data:
sat.df <- read.csv("http://goo.gl/HKnl74")
summary(sat.df)
weekend num.child distance rides
no :259 Min. :0.000 Min. : 0.5267 Min. : 72.00
yes:241 1st Qu.:0.000 1st Qu.: 10.3181 1st Qu.: 82.00
Median :2.000 Median : 19.0191 Median : 86.00
Mean :1.738 Mean : 31.0475 Mean : 85.85
3rd Qu.:3.000 3rd Qu.: 39.5821 3rd Qu.: 90.00
Max. :5.000 Max. :239.1921 Max. :100.00
games wait clean overall
Min. : 57.00 Min. : 40.0 Min. : 74.0 Min. : 6.00
1st Qu.: 73.00 1st Qu.: 62.0 1st Qu.: 84.0 1st Qu.: 40.00
Median : 78.00 Median : 70.0 Median : 88.0 Median : 50.00
Mean : 78.67 Mean : 69.9 Mean : 87.9 Mean : 51.26
3rd Qu.: 85.00 3rd Qu.: 77.0 3rd Qu.: 91.0 3rd Qu.: 62.00
Max. :100.00 Max. :100.0 Max. :100.0 Max. :100.00
weekend
: was the visit on a weekend
num.child
: how may children were in the party
distance
: how far did the party travel to the park
rides
, games
, wait
, clean
, overall
: satisfaction ratings
We'll cover how to fit a linear model, i.e. a linear regression, using the lm()
function in R. Linear models relate one or more predictors (independant variables) to an outcome (dependant variables).
Key steps in linear modeling:
A scatterplot matrix can help you quickly visualize the relationships between pairs of variables in the data. Skewness of predictors or correlations between predictors are potential problems.
library(gpairs)
gpairs(sat.df)