R for Marketing Research and Analytics

Note: these are errata for the 2015 1st Edition

store.df <- read.csv("http://goo.gl/QPDdMl", stringsAsFactors=TRUE)  # added stringsAsFactors=TRUE

                                RNGversion("3.5.0")
                                
xlab="Prior 12 months in-store sales ($)",
  plot(cust.df$distance.to.store, cust.df$store.spend)
  plot(1/sqrt(cust.df$distance.to.store), cust.df$store.spend)

  i.seq <- NULL
  for (i in seq_along(i.seq)) { print (i) } #better

library(gpairs)  # also, install gpairs package if needed
  summary(compareFit(pies.fit.NH1, pies.fit.NH3, pies.fit)) # p. 281
  summary(compareFit(sat.fit, satAlt.fit, nested=TRUE))     # p. 288

  set.seed(04635)
  old.mc <- "https://cran.r-project.org/src/contrib/Archive/mclust/mclust_4.4.tar.gz"
  install.packages(old.mc, repos=NULL, type="source")
  mclust.options(hcUse = "SVD")
  seg.rf.class <- predict(seg.rf, seg.df.test)
  cbc.df <- read.csv("http://goo.gl/5xQObB", 
                     colClasses = c(seat = "factor", price = "factor", 
                                    choice="integer"))
  cbc.df$eng <- factor(cbc.df$eng, levels=c("gas", "hyb", "elec"))
  cbc.df$carpool <- factor(cbc.df$carpool, levels=c("yes", "no"))
  summary(cbc.df)

    ...
    cbc.df <- rbind(cbc.df, conjoint.i)
  }   # <-- goes here, not at end

  # Tidy up, keeping cbc.df and attrib
  rm(a, i, resp.id, carpool, mu, Sigma, coefs, coef.names,
     conjoint.i, profiles, profiles.i, profiles.coded, utility,
     wide.util, probs, choice, nalt, nques)

install.packages(c("devtools", "lmtest", "statmod"))
library(devtools)
install_version("mlogit", version="1.0-1", repos="http://cran.us.r-project.org")
                                
  coef.mu <- model[1:dim(coef.Sigma)[1]]

Page (1st edition)	Change
(throughout, where read.csv() is called)	R 4.0 changed its handling of nominal/factor variables when reading data frames. To match previous results, you may add `stringsAsFactors=TRUE` to calls of `read.csv()`. In older versions of R (prior to R 4.0 in May 2020), text strings were converted by default to categorical factor variables. Starting in R 4.0 they are read as raw text and not converted to factors. To obtain results as shown in the book, `stringsAsFactors=TRUE` must be added to occurrences of `read.csv()` unless it is already specified otherwise (such as being set to FALSE). For example, on page 48, you may obtain the results shown in the book by using this command: store.df <- read.csv("http://goo.gl/QPDdMl", stringsAsFactors=TRUE) # added stringsAsFactors=TRUE
(throughout)	Random data generation: In April 2019, R versions 3.6.0 and later changed the way random numbers are generated. In many chapters we simulate data and perform other functions that use random numbers. Those results will change slightly from what the book shows. Options include: To match the book, when using downloaded data: no action is needed. You will notice few differences, except for minor details such as results from the `some()` function, and slightly different results in some Bayesian statistics (which use randomization). To match the book exactly (especially when simulating the data as we recommend): give the following command after starting R and before running code from the book: RNGversion("3.5.0") To see how things change: just go ahead and use R's new default random number generator. Compare results to the book. They will be slightly different in the exact data points, yet quite similar for the overall statistical results. If you're interested to read more about the reason for the change, see Bias in R's random integers?
86-87	The X axis label refers to a different variable in the code and Figure 4.3). This could be changed in line 4 of the code block on p. 86 to read: xlab="Prior 12 months in-store sales ($)", instead of referring to `online` sales.
101	In the printed book, the code and caption for Figure 4.10 refer to `cust.df$store.trans` but the plot shows `cust.df$store.spend`. Although the lesson is the same with either variable, code to generate the plot as shown is: plot(cust.df$distance.to.store, cust.df$store.spend) plot(1/sqrt(cust.df$distance.to.store), cust.df$store.spend) The .R code file for Chapter 4 is correct; this erratum applies only to the printed code.
116	The first code block should index along `i.seq` in the loop instead of `NULL`, as follows: i.seq <- NULL for (i in seq_along(i.seq)) { print (i) } #better
163	Add following line at beginning of first code block: library(gpairs) # also, install gpairs package if needed (for print only; the available .R file is correct)
281, 288	`compareFit()` should now be surrounded with `summary()` to show results: summary(compareFit(pies.fit.NH1, pies.fit.NH3, pies.fit)) # p. 281 summary(compareFit(sat.fit, satAlt.fit, nested=TRUE)) # p. 288
294, 296	A different random may work better in the latest release of `semPLS`. For example, try: set.seed(04635) This emphasizes the point in the chapter: path models may fail to converge with small samples (and may be dependent on random starting points to "succeed").
314	The call to `Mclust()` hangs or returns an error. This was caused by changes in the mclust package update from version 4.4 to 5.0. There are two possible workarounds: (1) to match the book, install mclust 4.4 instead. This can be done in several ways per the general instructions here. For example, on Mac or Linux, this can be done as follows: old.mc <- "https://cran.r-project.org/src/contrib/Archive/mclust/mclust_4.4.tar.gz" install.packages(old.mc, repos=NULL, type="source") (2) alternatively, to use mclust v5 (but obtain results that differ from those in the book), add the following line before calling mclust(): mclust.options(hcUse = "SVD")
329	Add following line at beginning of first code block: seg.rf.class <- predict(seg.rf, seg.df.test) (for print only; the available .R file is correct)
365	To ensure that the results obtained from the simulated data match the results obtained when the data is loaded from the data file, the code block should be replaced with: cbc.df <- read.csv("http://goo.gl/5xQObB", colClasses = c(seat = "factor", price = "factor", choice="integer")) cbc.df$eng <- factor(cbc.df$eng, levels=c("gas", "hyb", "elec")) cbc.df$carpool <- factor(cbc.df$carpool, levels=c("yes", "no")) summary(cbc.df)
369	In the printed book, the `rm()` line is misplaced, and should occur after the final "`}`" that closes the `for` loop: ... cbc.df <- rbind(cbc.df, conjoint.i) } # <-- goes here, not at end # Tidy up, keeping cbc.df and attrib rm(a, i, resp.id, carpool, mu, Sigma, coefs, coef.names, conjoint.i, profiles, profiles.i, profiles.coded, utility, wide.util, probs, choice, nalt, nques) The .R code file for Chapter 13 is correct; this erratum applies only to the printed code.
372, 382	`mlogit` 1.10 changed its data structure, and code that calls `mlogit.data()` gives an error about the `dfidx` package (a new data indexing package used by `mlogit` ). We plan an update soon for `mlogit` 1.10, but meanwhile, you may use one of the following approaches for CBC data: Install an older version of `mlogit` (such as `mlogit 1.0-1`). Installing an older package can be complex and may require developer tools (such as a gcc compiler). We are not able to provide assistance with that process, but see here for more details. Assuming you have the required tooling, the following will install `mlogit 1.0-1`, which works with the code in our book: install.packages(c("devtools", "lmtest", "statmod")) library(devtools) install_version("mlogit", version="1.0-1", repos="http://cran.us.r-project.org") Alternatively, skip the sections about `mlogit` estimation (e.g., Sections 13.3.2, 13.3.5), and use hierarchical Bayes estimation instead (Section 13.5).
384	The output of "`summary(m1.hier)`" shows negative estimates for `sd.seat7` and `sd.price`, which is incorrect because variance cannot be negative. The negative sign is an artifact of the estimation routine that `mlogit()` uses when reporting `summary()`, as of Oct 30, 2015. The command "`stdev(m1.hier)`" is another way to check the estimates of standard errors for the population distribution, and correctly reports them all as positive.
387	The first line of code at the top of the page should reference the model input to the function, rather than the m2.hier model object as follows: coef.mu <- model[1:dim(coef.Sigma)[1]] This will not affect the output of the code in the book, but it would produce an error, if the function is used with a different model.

What	How
Report a suspected bug Include the chapter and page, with a reproducible example.	email: cnchapman+rbug@gmail.com or better, report to the bugs mailing list.
Join the bugs mailing list	Sign up here
Check mailing list archives	Bug archives

R for Marketing Research and Analytics

Note: these are errata for the 2015 1st Edition

Specific errata and package update notes

Bug reports