Helen Steingroever writes:

I’m currently working on a model comparison paper using WAIC, and

would like to ask you the following question about the WAIC computation:I have data of one participant that consist of 100 sequential choices (you can think of these data as being a time series). I want to compute the WAIC for these data. Now I’m wondering how I should compute the predictive density. I think there are two possibilities:

(1) I compute the predictive density of the whole sequence (i.e., I consider the whole sequence as one data point, which means that n=1 in Equations (11) – (12) of your 2013 Stat Comput paper.)

(2) I compute the predictive density for each choice (i.e., I consider each choice as one data point, which means that n=# choices in Equations (11) – (12) of your 2013 Stat Comput paper.)

My quick thought was that Waic is an approximation to leave-one-out cross-validation and this computation gets more complicated with correlated data.

But I passed the question on to Aki, the real expert on this stuff. Aki wrote:

This a interesting question and there is no simple answer.

First we should consider what is your predictive goal:

(1) predict whole sequence for another participant

(2) predict a single choice given all other choices

or

(3) predict the next choice given the choices in the sequence so far?If your predictive goal is

(1) then you should note that WAIC is based on an asymptotic argument and it is not generally accurate with n=1. Watanabe has said (personal communication) that he thinks that this is not sensible scenario for WAIC, but if (1) is really your prediction goal, then I think that this is might be best you can do. It seems that when n is small, WAIC will usually underestimate the effective complexity of the model, and thus would give over-optimistic performance estimates for more complex models.

(2) WAIC should work just fine here (unless your model says that there is no dependency between the choices, ie. having 100 separate models with each having n=1). Correlated data here means just that it is easier to predict a choice if you know the previous choices and the following choices. This may make difference between some models small compared to scenario (1).

(3) WAIC can’t handle this, and you would need to use a specific form of cross-validation (I think I should write a paper on this).

Hi,

I am working on what sounds like a very similar modelling problem. I have a series of data points representing multiple sequential decisions by a group of participants. I am very much interested in how previous decisions influence future decisions here. I am pretty sure this puts me in case (3):

(3) predict the next choice given the choices in the sequence so far?

(3) WAIC can’t handle this, and you would need to use a specific form of cross-validation (I think I should write a paper on this).

I am wondering if that paper ever got written?

Thanks in advance.