R for Marketing Research and Analytics

Chris Chapman and Elea McDonnell Feit
February 2016

Chapter 7: Identifying Drivers of Outcomes: Linear Models

Website for all data files:
http://r-marketing.r-forge.r-project.org/data.html

Satisfaction survey data

Data represents customer responses to a survey about their satisfaction with diferent aspects of their recent visit to an amusement park.
Image source: hersheypark.com

To load the data:

sat.df <- read.csv("http://goo.gl/HKnl74")

Inspecting the data

summary(sat.df)
 weekend     num.child        distance            rides       
 no :259   Min.   :0.000   Min.   :  0.5267   Min.   : 72.00  
 yes:241   1st Qu.:0.000   1st Qu.: 10.3181   1st Qu.: 82.00  
           Median :2.000   Median : 19.0191   Median : 86.00  
           Mean   :1.738   Mean   : 31.0475   Mean   : 85.85  
           3rd Qu.:3.000   3rd Qu.: 39.5821   3rd Qu.: 90.00  
           Max.   :5.000   Max.   :239.1921   Max.   :100.00  
     games             wait           clean          overall      
 Min.   : 57.00   Min.   : 40.0   Min.   : 74.0   Min.   :  6.00  
 1st Qu.: 73.00   1st Qu.: 62.0   1st Qu.: 84.0   1st Qu.: 40.00  
 Median : 78.00   Median : 70.0   Median : 88.0   Median : 50.00  
 Mean   : 78.67   Mean   : 69.9   Mean   : 87.9   Mean   : 51.26  
 3rd Qu.: 85.00   3rd Qu.: 77.0   3rd Qu.: 91.0   3rd Qu.: 62.00  
 Max.   :100.00   Max.   :100.0   Max.   :100.0   Max.   :100.00  

weekend: was the visit on a weekend
num.child: how may children were in the party
distance: how far did the party travel to the park
rides, games, wait, clean, overall: satisfaction ratings

Fitting a linear model with lm()

  • We'll cover how to fit a linear model, i.e. a linear regression, using the lm() function in R. Linear models relate one or more predictors (independant variables) to an outcome (dependant variables).

  • Key steps in linear modeling:

    • Evaluate the data for suitability for modeling
    • Fit model
    • Evaluate the model
    • Interpret

Plotting the data

A scatterplot matrix can help you quickly visualize the relationships between pairs of variables in the data. Skewness of predictors or correlations between predictors are potential problems.

library(gpairs)
gpairs(sat.df)