R for Marketing Research and Analytics

Chris Chapman and Elea McDonnell Feit
January 2016

Chapter 5: Differences Between Groups

Website for all data files:
http://r-marketing.r-forge.r-project.org/data.html

Load Segmentation/Subscription data

As usual, check the book for details on the data simulation. For now:

seg.df <- read.csv("http://goo.gl/qw303p")
summary(seg.df)
      age           gender        income            kids        ownHome   
 Min.   :19.26   Female:157   Min.   : -5183   Min.   :0.00   ownNo :159  
 1st Qu.:33.01   Male  :143   1st Qu.: 39656   1st Qu.:0.00   ownYes:141  
 Median :39.49                Median : 52014   Median :1.00               
 Mean   :41.20                Mean   : 50937   Mean   :1.27               
 3rd Qu.:47.90                3rd Qu.: 61403   3rd Qu.:2.00               
 Max.   :80.49                Max.   :114278   Max.   :7.00               
  subscribe         Segment   
 subNo :260   Moving up : 70  
 subYes: 40   Suburb mix:100  
              Travelers : 80  
              Urban hip : 50  


Descriptives: Selecting by group

mean(seg.df$income[seg.df$Segment == "Moving up"])
[1] 53090.97
mean(seg.df$income[seg.df$Segment == "Moving up" & 
                   seg.df$subscribe=="subNo"])
[1] 53633.73

This quickly gets tedious!

Descriptives: apply a function by group

by(VARIABLE of interest, GROUPING variable, FUNCTION)

by(seg.df$income, seg.df$Segment, mean)
seg.df$Segment: Moving up
[1] 53090.97
-------------------------------------------------------- 
seg.df$Segment: Suburb mix
[1] 55033.82
-------------------------------------------------------- 
seg.df$Segment: Travelers
[1] 62213.94
-------------------------------------------------------- 
seg.df$Segment: Urban hip
[1] 21681.93

Use list() to have more than one grouping variable:

by(seg.df$income, list(seg.df$Segment, seg.df$subscribe), mean)
: Moving up
: subNo
[1] 53633.73
-------------------------------------------------------- 
: Suburb mix
: subNo
[1] 54942.69
-------------------------------------------------------- 
: Travelers
: subNo
[1] 62746.11
-------------------------------------------------------- 
: Urban hip
: subNo
[1] 22082.11
-------------------------------------------------------- 
: Moving up
: subYes
[1] 50919.89
-------------------------------------------------------- 
: Suburb mix
: subYes
[1] 56461.41
-------------------------------------------------------- 
: Travelers
: subYes
[1] 58488.77
-------------------------------------------------------- 
: Urban hip
: subYes
[1] 20081.19

if()

if() is used for basic program flow control.

if (A) { B else C } means:
“If A is true, compute B [any commands inside {}], otherwise compute C.”

x <- 2
if (x > 0) {
  print ("Positive!")
} else {
  print ("Zero or negative!")
}
[1] "Positive!"

Rules of brackets are confusing, so simplify: always use { and } !

else C is optional. If !A and no C block, nothing will occur.

ifelse()

ifelse() is a vectorized version of if(). Use it to create a vector using logic, not to control program flow.

x <- -2:2

if (x > 0) {      # bad code -- only tests once!
  "pos"
} else { 
  "neg/zero"
}
[1] "neg/zero"

The correct way to do this is:

ifelse(x > 0, "pos", "neg/zero")
[1] "neg/zero" "neg/zero" "neg/zero" "pos"      "pos"     

Instead of simply getting values as the result, you could perform actions (e.g., by calling functions to do something).

Notes

This presentation is based on Chapter 6 of Chapman and Feit, R for Marketing Research and Analytics © 2015 Springer. http://r-marketing.r-forge.r-project.org/

Exercises here use the Salaries data set from the car package, John Fox and Sanford Weisberg (2011). An R Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion

All code in the presentation is licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.