R for Marketing Research and Analytics

Chris Chapman and Elea McDonnell Feit
February 2016

Chapter 7: Identifying Drivers of Outcomes: Linear Models

Website for all data files:
http://r-marketing.r-forge.r-project.org/data.html

Satisfaction survey data

Data represents customer responses to a survey about their satisfaction with diferent aspects of their recent visit to an amusement park.
Image source: hersheypark.com

To load the data:

sat.df <- read.csv("http://goo.gl/HKnl74")

Inspecting the data

summary(sat.df)
 weekend     num.child        distance            rides       
 no :259   Min.   :0.000   Min.   :  0.5267   Min.   : 72.00  
 yes:241   1st Qu.:0.000   1st Qu.: 10.3181   1st Qu.: 82.00  
           Median :2.000   Median : 19.0191   Median : 86.00  
           Mean   :1.738   Mean   : 31.0475   Mean   : 85.85  
           3rd Qu.:3.000   3rd Qu.: 39.5821   3rd Qu.: 90.00  
           Max.   :5.000   Max.   :239.1921   Max.   :100.00  
     games             wait           clean          overall      
 Min.   : 57.00   Min.   : 40.0   Min.   : 74.0   Min.   :  6.00  
 1st Qu.: 73.00   1st Qu.: 62.0   1st Qu.: 84.0   1st Qu.: 40.00  
 Median : 78.00   Median : 70.0   Median : 88.0   Median : 50.00  
 Mean   : 78.67   Mean   : 69.9   Mean   : 87.9   Mean   : 51.26  
 3rd Qu.: 85.00   3rd Qu.: 77.0   3rd Qu.: 91.0   3rd Qu.: 62.00  
 Max.   :100.00   Max.   :100.0   Max.   :100.0   Max.   :100.00  

weekend: was the visit on a weekend
num.child: how may children were in the party
distance: how far did the party travel to the park
rides, games, wait, clean, overall: satisfaction ratings

Fitting a linear model with lm()

  • We'll cover how to fit a linear model, i.e. a linear regression, using the lm() function in R. Linear models relate one or more predictors (independant variables) to an outcome (dependant variables).

  • Key steps in linear modeling:

    • Evaluate the data for suitability for modeling
    • Fit model
    • Evaluate the model
    • Interpret

Answers (4)

Repeat the linear model (salary in response to sex, rank, discipline, and years since PhD) using Bayesian estimation. Summarize the results.

library(MCMCpack)
set.seed(98108)
salary.lm.b <- MCMCregress(
                  salary ~ sex + rank + discipline + yrs.service,
                  data=Salaries)
options("scipen"=100, "digits"=4)    # force non-scientific notation
summary(salary.lm.b)$quantiles
                     2.5%         25%          50%          75%
(Intercept)       59547.9     65230.0     68373.96     71426.16
sexMale           -2796.3      2103.3      4747.86      7405.16
rankAssocProf      6451.9     11861.5     14570.61     17311.99
rankProf          41611.2     46577.3     49200.72     51737.03
disciplineB        8910.1     11895.7     13460.51     15007.37
yrs.service        -307.6      -163.5       -89.82       -12.29
sigma2        448001458.5 489995080.7 513985664.78 539697427.20
                    97.5%
(Intercept)       77157.3
sexMale           12498.5
rankAssocProf     22629.5
rankProf          56678.2
disciplineB       17920.9
yrs.service         133.7
sigma2        593796060.9

That's all for Chapter 7!

Break time

Notes

This presentation is based on Chapter 7 of Chapman and Feit, R for Marketing Research and Analytics © 2015 Springer.

Exercises here use the Salaries data set from the car package, John Fox and Sanford Weisberg (2011). An R Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion

All code in the presentation is licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.