Note: these are errata for the 2015 1st Edition

For the 2019 2nd edition, see: Errata for the 2nd edition

General note. R packages often change their computation algorithms, function names, and requirements. If your code doesn't work or the output doesn't match what is shown in the book, see Section 1.5.4 in the book for notes and ideas.

Specific errata and package update notes

Page (1st edition) Change
(throughout, where read.csv() is called) R 4.0 changed its handling of nominal/factor variables when reading data frames. To match previous results, you may add stringsAsFactors=TRUE to calls of read.csv(). In older versions of R (prior to R 4.0 in May 2020), text strings were converted by default to categorical factor variables. Starting in R 4.0 they are read as raw text and not converted to factors.

To obtain results as shown in the book, stringsAsFactors=TRUE must be added to occurrences of read.csv() unless it is already specified otherwise (such as being set to FALSE). For example, on page 48, you may obtain the results shown in the book by using this command:

store.df <- read.csv("http://goo.gl/QPDdMl", stringsAsFactors=TRUE)  # added stringsAsFactors=TRUE
(throughout)

Random data generation: In April 2019, R versions 3.6.0 and later changed the way random numbers are generated. In many chapters we simulate data and perform other functions that use random numbers. Those results will change slightly from what the book shows. Options include:

  • To match the book, when using downloaded data: no action is needed. You will notice few differences, except for minor details such as results from the some() function, and slightly different results in some Bayesian statistics (which use randomization).

  • To match the book exactly (especially when simulating the data as we recommend): give the following command after starting R and before running code from the book:

                                    RNGversion("3.5.0")
                                    
  • To see how things change: just go ahead and use R's new default random number generator. Compare results to the book. They will be slightly different in the exact data points, yet quite similar for the overall statistical results.

  • If you're interested to read more about the reason for the change, see Bias in R's random integers?
  • 86-87 The X axis label refers to a different variable in the code and Figure 4.3). This could be changed in line 4 of the code block on p. 86 to read:

    xlab="Prior 12 months in-store sales ($)",

    instead of referring to online sales.
    101 In the printed book, the code and caption for Figure 4.10 refer to cust.df$store.trans but the plot shows cust.df$store.spend. Although the lesson is the same with either variable, code to generate the plot as shown is:

      plot(cust.df$distance.to.store, cust.df$store.spend)
      plot(1/sqrt(cust.df$distance.to.store), cust.df$store.spend)
    

    The .R code file for Chapter 4 is correct; this erratum applies only to the printed code.
    116 The first code block should index along i.seq in the loop instead of NULL, as follows:

      i.seq <- NULL
      for (i in seq_along(i.seq)) { print (i) } #better
    
    163 Add following line at beginning of first code block:

    library(gpairs)  # also, install gpairs package if needed

    (for print only; the available .R file is correct)
    281, 288 compareFit() should now be surrounded with summary() to show results:

      summary(compareFit(pies.fit.NH1, pies.fit.NH3, pies.fit)) # p. 281
      summary(compareFit(sat.fit, satAlt.fit, nested=TRUE))     # p. 288
    
    294, 296 A different random may work better in the latest release of semPLS. For example, try:

      set.seed(04635)

    This emphasizes the point in the chapter: path models may fail to converge with small samples (and may be dependent on random starting points to "succeed").
    314 The call to Mclust() hangs or returns an error. This was caused by changes in the mclust package update from version 4.4 to 5.0. There are two possible workarounds:

    (1) to match the book, install mclust 4.4 instead. This can be done in several ways per the general instructions here.

    For example, on Mac or Linux, this can be done as follows:

      old.mc <- "https://cran.r-project.org/src/contrib/Archive/mclust/mclust_4.4.tar.gz"
      install.packages(old.mc, repos=NULL, type="source")

    (2) alternatively, to use mclust v5 (but obtain results that differ from those in the book), add the following line before calling mclust():

      mclust.options(hcUse = "SVD")
    329 Add following line at beginning of first code block:

      seg.rf.class <- predict(seg.rf, seg.df.test)

    (for print only; the available .R file is correct)
    365 To ensure that the results obtained from the simulated data match the results obtained when the data is loaded from the data file, the code block should be replaced with:

      cbc.df <- read.csv("http://goo.gl/5xQObB", 
                         colClasses = c(seat = "factor", price = "factor", 
                                        choice="integer"))
      cbc.df$eng <- factor(cbc.df$eng, levels=c("gas", "hyb", "elec"))
      cbc.df$carpool <- factor(cbc.df$carpool, levels=c("yes", "no"))
      summary(cbc.df)
    
    369 In the printed book, the rm() line is misplaced, and should occur after the final "}" that closes the for loop:

        ...
        cbc.df <- rbind(cbc.df, conjoint.i)
      }   # <-- goes here, not at end
    
      # Tidy up, keeping cbc.df and attrib
      rm(a, i, resp.id, carpool, mu, Sigma, coefs, coef.names,
         conjoint.i, profiles, profiles.i, profiles.coded, utility,
         wide.util, probs, choice, nalt, nques)
    

    The .R code file for Chapter 13 is correct; this erratum applies only to the printed code.
    372, 382 mlogit 1.10 changed its data structure, and code that calls mlogit.data() gives an error about the dfidx package (a new data indexing package used by mlogit ). We plan an update soon for mlogit 1.10, but meanwhile, you may use one of the following approaches for CBC data:

    • Install an older version of mlogit (such as mlogit 1.0-1). Installing an older package can be complex and may require developer tools (such as a gcc compiler). We are not able to provide assistance with that process, but see here for more details. Assuming you have the required tooling, the following will install mlogit 1.0-1, which works with the code in our book:

      install.packages(c("devtools", "lmtest", "statmod"))
      library(devtools)
      install_version("mlogit", version="1.0-1", repos="http://cran.us.r-project.org")
                                      
    • Alternatively, skip the sections about mlogit estimation (e.g., Sections 13.3.2, 13.3.5), and use hierarchical Bayes estimation instead (Section 13.5).
    384 The output of "summary(m1.hier)" shows negative estimates for sd.seat7 and sd.price, which is incorrect because variance cannot be negative. The negative sign is an artifact of the estimation routine that mlogit() uses when reporting summary(), as of Oct 30, 2015. The command "stdev(m1.hier)" is another way to check the estimates of standard errors for the population distribution, and correctly reports them all as positive.
    387 The first line of code at the top of the page should reference the model input to the function, rather than the m2.hier model object as follows:

      coef.mu <- model[1:dim(coef.Sigma)[1]]

    This will not affect the output of the code in the book, but it would produce an error, if the function is used with a different model.

    Bug reports

    What How
    Report a suspected bug
    Include the chapter and page, with a reproducible example.
    email: cnchapman+rbug@gmail.com
    or better, report to the bugs mailing list.
    Join the bugs mailing list Sign up here
    Check mailing list archives Bug archives