710 C OHEN ,S ANBORN , AND S HIFFRIN

710 C OHEN ,S ANBORN , AND S HIFFRIN

APPENDIX A Bayesian Model Selection

The use of BMS for model selection raises many deep issues and is a topic we hope to take up in detail in future research. We report here only a few interesting preliminary results. Let Model A with parameters θ be denoted A θ , with associated prior probability of A θ,0 . In the simplest approach, the posterior odds for Model A over Model B are given by

∑ θ  A θ , 0 / B θ , 0   PDA ( | θ )( / PDB | θ )  ,

where D is the data. The sum is replaced by an integral for continuous parameter spaces. Because BMS inte- grates likelihood ratios over the parameter space, the simulated difference of log maximum likelihoods is not an appropriate axis for exhibiting results. Our plots for BMS show differences of log (integrated) likelihoods.

As is usual in BMS applications, the answers will depend on the choice of priors. A1 Consider first a flat uni- form prior for each parameter in each model, ranging from 0.001 to 1.0 for both coefficients and 0.001 to 1.5 for both decay parameters (these ranges were chosen to encompass the range of plausible possibilities). This approach produces the smoothed histogram graphs in Figure A1. The natural decision statistic is zero on the log likelihood difference axis. For these priors, performance was terrible: The probability of correct model selection was .52 for group analysis and .50 (chance) for individual analysis. It seems clear that the choice of priors overly penalized the exponential model, relative to the power law model, for the data sets to which they were applied.

Exponential generated Power law generated

Log p(Pow | D) � Log p(Exp | D)

Figure A1. Histograms showing the distribution of Bayesian model selection results for the exponential and power law models with 34 subjects and two trials per condi- tion. The group data results are shown in the top panel, and the individual data results are shown in the bottom panel. A uniform prior was used in this simulation, with the same parameter range for both models.

By changing the priors, we can greatly change the results of BMS. For example, employing priors that matched the parameter distributions used to generate the data increased model selection accuracy to levels comparable to those of PBCM and NML. We did not pursue these approaches further because, in most cases, in practice one would not have the advance knowledge allowing selection of such “tailored” priors. In future research, we intend to explore other Bayesian approaches that could circumvent this problem. In particular, various forms of hierarchical Bayesian analysis look promising. For example, one could assume that individual parameter estimates are drawn from a Gaussian distribution with some mean and variance (and covariance). It seems likely that such an assumption would produce a tendency for individual parameter estimates to move closer to the group mean estimates—in a sense, interpolating between individual and group analysis. Other possible approaches are the BNPMS method recently proposed by Karabatsos (2006) and modeling individual subjects with a Dirichlet process (Navarro, Griffiths, Steyvers, & Lee, 2006).

NOTE

A1. Note that all inference implicitly involves prior assumptions. That this assumption is made explicit is one of the strengths of Bayesian analysis.