It *could* be publication bias. Maybe. Or it could be small study effects. It’s not particularly easy to tell and the only way you would make these assumptions is if you go with the sample size/effect size model, which doesn’t always model the real world.

]]>I can understand pilot studies for the purposes of working out the kinks before doing the real thing, but that doesn’t sound like what’s meant here by “scale-up as positive results arise” (from the OP).

The only reasons I can see for publishing the _results_ of smaller trials are (a) if the smaller study shows a huge, clear positive effect and you’re confident that biases are under control at that level, or (b) if the smaller study shows a reasonably clear _negative_ effect and the intervention seems to be harmful.

So my question is: what is the basis for the decision to scale up? (If it was articulated in the original post or the comments, I’ve missed it so far.)

]]>I’ve been participating on this blog for over a decade, and I don’t often blow my own consulting horn here, but seriously if someone actually has this kind of problem, contact me because this is the kind of thing I specifically set up my consulting company to help people with

]]>There is little guidance on this; in fact, asking for detailed power calculations is now part of the proposal writing, and this has become increasingly salient with the emphasis on publication bias and false positives in small samples. I don’t know this literature very well, but the math for Bandit problems and optimal experimentation used to be quite hard.

The specific problem for the funder could perhaps be described as:

There are M Treatments, and you have infinite resources (!). You would like to find the most effective treatment, where effective is defined using some metric, say present discounted value. You have little prior information on any of these M treatments but you don’t think that they will harm people.

The cost structure for running an experiment for any of the M treatments is K + c(N), where K is the fixed cost, N is the sample size in the experiment and c(N) is the marginal cost, a function of sample size.

What is the sequence of sample sizes for each of the M treatments so that you converge to the best treatment at minimum cost?

]]>for points 1) and 2) in David’s comment. Seems like the right set up for this question.

]]>“Hi Andrew,

Berk Ozler here from the World Bank’s research department and the Development impact blog. You have, on occasion, commented on a thing or two that I wrote.

I have a question from a colleague and we’re trying to crowd source a few answers from academics as well as policymaker types to see if we cannot put some view points on this together. We’d love your thoughts if you have any – doesn’t matter if you send them to us or post on your own blog…”

So, JIC this goes viral, please note that I am not trying to spend (or allow or prevent someone else to spend) $2 million on any kind of 5-minute training…

Berk.

]]>1) What is the optimal sequential experimentation method, with stopping rules, so that the funder doesn’t necessarily spend $2 million on something that might not work.

2) How to analyze the experiment after this sequential decision-making.

To make this concrete, suppose the population has mean income of $100, standard deviation of $50, and the proposed intervention might increase income by 1%, (so by $1), and if delivered at scale, might only cost 10 cents per person to deliver. So small effects for any one person, but a cost-benefit ratio that could potentially be very high.

A standard power calculation would say we need 52,538 treatment and 52,538 control to have 90% power to detect this effect in an experiment on this population. But running an experiment at that size will be very expensive (hence the $2 million price tag).

The question is then asking whether the funder can instead recommend a sequence of conditional funding, where a tranche is paid to do this on a smaller sample, the results are looked at, and then a decision is made as to whether to stop the experiment (because the treatment does not seem to be working), or to fund another tranche, and so on. I know this type of sequential work is done in medical trials – but it is not something we have seen done in these types of economic experiments, and so the goal is to reach out to other fields and see ideas for how to handle such a problem.

As Andrew notes, the funder may also be getting presented continually with many such ideas for 1% improvements, so this could be generalized to them having 20 proposals that all suggest something which could raise income by 1%, but that all would need $2 million to run a fully powered study – and they don’t want to spend $40 million.

]]>…they want to show…

Either drop the project or charge them 10-100x to be the patsy for pre-specified conclusions. You are in $500/hr lawyer land now.

]]>