R for Marketing Research and Analytics

Chris Chapman and Elea McDonnell Feit
January 2016

Chapter 4: Relationships between Variables (Bivariate Statistics)

Website for all data files:
http://r-marketing.r-forge.r-project.org/data.html

Load CRM data

As always, see the book for details about data simulation. Meanwhile, we'll load it. This is example data with data on customers' visits, transactions, and spending for online and retail purchases:

cust.df <- read.csv("http://goo.gl/PmPkaG")
str(cust.df)
'data.frame':   1000 obs. of  12 variables:
 $ cust.id          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ age              : num  22.9 28 35.9 30.5 38.7 ...
 $ credit.score     : num  631 749 733 830 734 ...
 $ email            : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 2 2 1 1 ...
 $ distance.to.store: num  2.58 48.18 1.29 5.25 25.04 ...
 $ online.visits    : int  20 121 39 1 35 1 1 48 0 14 ...
 $ online.trans     : int  3 39 14 0 11 1 1 13 0 6 ...
 $ online.spend     : num  58.4 756.9 250.3 0 204.7 ...
 $ store.trans      : int  4 0 0 2 0 0 2 4 0 3 ...
 $ store.spend      : num  140.3 0 0 95.9 0 ...
 $ sat.service      : int  3 3 NA 4 1 NA 3 2 4 3 ...
 $ sat.selection    : int  3 3 NA 2 1 NA 3 3 2 2 ...

Converting data to factors

Text data is automatically converted to factors when reading CSVs. However, sometimes data that appears to be numeric is really not.

The factor() function will convert data to nominal factors:

str(cust.df$cust.id)
 int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
cust.df$cust.id <- factor(cust.df$cust.id)

str(cust.df$cust.id)
 Factor w/ 1000 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

Option: ordered=TRUE (or ordered() function) creates ordinal factors.

Basic scatterplot

Let's look at scatterplots. How does age relate to credit score?

plot(x=cust.df$age, y=cust.df$credit.score)

plot of chunk unnamed-chunk-3

A better plot

Add color, labels, and adjust the axis limits:

plot(cust.df$age, cust.df$credit.score, 
     col="blue",
     xlim=c(15, 55), ylim=c(500, 900), 
     main="Active Customers as of June 2014",
     xlab="Customer Age (years)", ylab="Credit Score ")