R for Marketing Research and Analytics

Chris Chapman and Elea McDonnell Feit
January 2016

Chapter 4: Relationships between Variables (Bivariate Statistics)

Website for all data files:
http://r-marketing.r-forge.r-project.org/data.html

Load CRM data

As always, see the book for details about data simulation. Meanwhile, we'll load it. This is example data with data on customers' visits, transactions, and spending for online and retail purchases:

cust.df <- read.csv("http://goo.gl/PmPkaG")
str(cust.df)
'data.frame':   1000 obs. of  12 variables:
 $ cust.id          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ age              : num  22.9 28 35.9 30.5 38.7 ...
 $ credit.score     : num  631 749 733 830 734 ...
 $ email            : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 2 2 1 1 ...
 $ distance.to.store: num  2.58 48.18 1.29 5.25 25.04 ...
 $ online.visits    : int  20 121 39 1 35 1 1 48 0 14 ...
 $ online.trans     : int  3 39 14 0 11 1 1 13 0 6 ...
 $ online.spend     : num  58.4 756.9 250.3 0 204.7 ...
 $ store.trans      : int  4 0 0 2 0 0 2 4 0 3 ...
 $ store.spend      : num  140.3 0 0 95.9 0 ...
 $ sat.service      : int  3 3 NA 4 1 NA 3 2 4 3 ...
 $ sat.selection    : int  3 3 NA 2 1 NA 3 3 2 2 ...

Converting data to factors

Text data is automatically converted to factors when reading CSVs. However, sometimes data that appears to be numeric is really not.

The factor() function will convert data to nominal factors:

str(cust.df$cust.id)
 int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
cust.df$cust.id <- factor(cust.df$cust.id)

str(cust.df$cust.id)
 Factor w/ 1000 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

Option: ordered=TRUE (or ordered() function) creates ordinal factors.

Basic scatterplot

Let's look at scatterplots. How does age relate to credit score?

plot(x=cust.df$age, y=cust.df$credit.score)

plot of chunk unnamed-chunk-3

Answers (3)

Draw a visualization of all bivariate relationships

library(car)
scatterplotMatrix(Salaries)   # could use pairs() instead

plot of chunk unnamed-chunk-23

That's all for Chapter 4

Thank you! Time for Q&A.

Notes

This presentation is based on Chapter 6 of Chapman and Feit, R for Marketing Research and Analytics © 2015 Springer. http://r-marketing.r-forge.r-project.org/

Exercises here use the Salaries data set from the car package, John Fox and Sanford Weisberg (2011). An R Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion

All code in the presentation is licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.