Chris Chapman and Elea McDonnell Feit
Chapter 7: Identifying Drivers of Outcomes: Linear Models
Website for all data files:
Data represents customer responses to a survey about their satisfaction with diferent aspects of their recent visit to an amusement park.
Image source: hersheypark.com
To load the data:
sat.df <- read.csv("http://goo.gl/HKnl74")
weekend num.child distance rides no :259 Min. :0.000 Min. : 0.5267 Min. : 72.00 yes:241 1st Qu.:0.000 1st Qu.: 10.3181 1st Qu.: 82.00 Median :2.000 Median : 19.0191 Median : 86.00 Mean :1.738 Mean : 31.0475 Mean : 85.85 3rd Qu.:3.000 3rd Qu.: 39.5821 3rd Qu.: 90.00 Max. :5.000 Max. :239.1921 Max. :100.00 games wait clean overall Min. : 57.00 Min. : 40.0 Min. : 74.0 Min. : 6.00 1st Qu.: 73.00 1st Qu.: 62.0 1st Qu.: 84.0 1st Qu.: 40.00 Median : 78.00 Median : 70.0 Median : 88.0 Median : 50.00 Mean : 78.67 Mean : 69.9 Mean : 87.9 Mean : 51.26 3rd Qu.: 85.00 3rd Qu.: 77.0 3rd Qu.: 91.0 3rd Qu.: 62.00 Max. :100.00 Max. :100.0 Max. :100.0 Max. :100.00
weekend: was the visit on a weekend
num.child: how may children were in the party
distance: how far did the party travel to the park
overall: satisfaction ratings
We'll cover how to fit a linear model, i.e. a linear regression, using the
lm() function in R. Linear models relate one or more predictors (independant variables) to an outcome (dependant variables).
Key steps in linear modeling:
A scatterplot matrix can help you quickly visualize the relationships between pairs of variables in the data. Skewness of predictors or correlations between predictors are potential problems.