Skip to content

When engineers fail the bridge falls down: When statisticians fail millions of dollars of scarce research funding is squandered and serious public health issues are left far more uncertain than they needed to be

Saw a video link talk at a local hospital based research institute last Friday

Usual stuff about a randomized trail not being properly designed nor analyzed – as if we have not heard about that before

But this time is was tens of millions of dollars and a health concern that likely directly affects over 10% of the readers of this blog – the males over 40 or 50 and those that might care about them

Its was a very large PSA screening study and and

the design and analysis apparently failed to consider the _usual_ and expected lag in a screening effect here (perhaps worth counting the number of statisticians in the supplementary material given)

for an concrete example from colon cancer see here

And apparently a proper reanalysis was initially hampered by the well known – “we would like to give you the data but you know” …. but eventually a reanalysis was able to recover enough of the data from the from published documents

but even with the proper analysis – the public health issue – does PSA screening do more good than harm ( half of US currently males get PSA screening at some time? ) will likely remain largely uncertain or at least more uncertain than it needed to be

and it will happen again and again (seriously wasteful and harmful design and analysis)

and there will be a lot more needless deaths from either “screening being adopted” if it truly shouldn’t have been or “screening was not more fully adopted, earlier” when it truly should have been (there can be very nasty downsides from ineffective screening programs, including increased mortality)

OK I remember being involved in this PSA screening stuff many years ago in Toronto and think we argued that given the size of the study required – scarce research dollars would likely have a much better return studying some other health concerns (most of us were males but young)

But the PSA screening studies were funded – but apparently defectively designed and analyzed.

Now I had been involved in the design of a liver screening trial around 1990 (not funded because it was percieved as being too expensive) and the lag of an effect did not actually occur to me

until I started to write up the power simulation studies (discouraged by my advisor who told me professional statisticians should not have to stoop to simulations to calculate power)

and then I had to think up a treatment effect.

The immediate effect would likely not appear right away – the early treatable tumours would not have a mortality outcome for a while – after all they were early.

But maybe as important – I did some literature searching (breaking Ripley’s rule that statisticians don’t read the literature) and there were papers discussing the lag effect in treatment effects in screening trials and they suggested ways to design and analyze given these.

Then to hear of a huge disaster happening much later – in the 2000,s – why does it happen?

Statisticians have to think through the biological details of studies – somehow

Simulating planned trials is very important – even if you can get away with (i.e. fool reviewers) highly non-robust over simplified closed form professional looking power formulas

Statisticians have to confer widely – especially when designing a large expensive trial

Anyone can be “blind sided”

Do literature searches specifically on the study design and clinical topic

Read some of that literature

Try to contact other statisticians who have worked with such designs and that clinical topic

Try to have some of them look at your design

Simulate the details – that’s were the devils are

And if you notice someone else has blown it and you could fix it if you just could get their data…

Well you should be able to get their data – but there are good and not so good reasons why that won’t be feasible – but sometimes you can get more from the published data that you might think

First and technically challenging – there is always the marginal likelihood – the probablity of the published (rather than actual|) observations gives the appropriate likelihood (some math details here “justimportance.pdf” )

But sometimes you can get everything:

under Normal assumptions just the means and variances are sufficient (that just means the marginal likelihood exactly equals the full data likelihood)

in correspondence analysis there is something called the Burt matrix which is a summary from which you can (with some algebra) redo the full correspondence analysis as if you had the actual data

and for survival data the Kaplan-Meier curve – with enough resolution – will allow you to read off the raw data (event and censoring times). Modern pdf’s can provide full resolution?

Perhaps most importantly to avoid statistical bridges falling down:

We should try to worry more about public health rather our public (professional) images or even our publications!



  1. wei says:

    why are statisticians responsible for this? In a confirmatory study, the endpoint is clearly defined with respect to the measurement and the time of the measurement, which is primarily the doctors' call.

    I definitely agree power simulation is indispensable for complex designs, but what I cannot see is the advantage of simulation when a design detail can be handled by power formulas. If you missed a point in the design, you will miss with or without simulation.

  2. David says:

    K O'Rourke, I don't want to be rude but I would like to point out that your posts are very difficult to read as they are poorly punctuated. Sentences stop without full-stops and paragraphs start in the middle of nowhere. Could this be caused by the editor and format you are composing your posts in? (indeed, even your name appears as "K? O'Rourke", which seems a little odd). You also regularly omit the subject of a sentence, which adds to the confusion.

    I know it's just some stupid blog, but if you are going to write here, why not make it as easy as possible for readers to engage?

  3. Patrick says:

    Well done and very interesting blog post K O'Rourke, I must say – as a fledgling researcher – that I am only now starting to learn about all of the threats to reliability and validity that poorly conceived and executed statistics/research design can be responsible for. Unfortunately this is often learned the hard way and my senior peers and supervisors seem either blissfully ignorant of these issues or callously uncaring of them.

    As an important aspect of modern science, I think that we should definitely strive to be knowledgeable of these issues and adequately address them. Ethically speaking, not doing so is reprehensible. Made only worse by the fact that many researchers intentionally ignore these issues so as to publish more prolifically.


  4. John Johnson says:

    I'll offer my perspective on Wei's question. Here is my background as a matter of context. I graduated with a PhD from a highly theoretical statistics program and went to work as a biostatistician. The first five years of my new career were very painful as I had to meet the reality of case report forms, study coordinators who fill them out several weeks after data is collected, highly controlled data management processes, and working with doctors to make sure that endpoints are scientifically, medically, and statistically sound (not to mention the clinical research associates who will tell you what data is and is not easy to collect in the clinic). Now as the associate director of statistics for my group I look for not only a good grasp of statistical techniques but a solid background in biology or medicine or the ability and willingness to quickly pick it up.

    The fact of the matter is our input is very important on the design of clinical trials to assure that the data collection, time and events table, endpoints, analyses, and sample size are adequate and appropriate to support the study's objectives.

    For example, the comment above asked why do simulations when a simple power formula will do. The simple answer is that the simple power formula will not do, except perhaps as a first estimate in a feasibility analysis. In the particular case above, an important biological effect was not considered in the power analysis, which led to a study which had a low probability of meeting its objectives.

    I'll give you another reason to do simulations, especially of Phase 3 and larger trials. It's to make sure that you understand the trial's statistical properties and what it is and is not capable of saying. In the event the trial fails, if you have a well-done, validated, and well-documented simulation, you have another tool available to examine your assumptions behind the design in order to improve for next time. If you miss the point of a design, the simple power analysis will give you no chance of catching your error. A properly executed simulation (one where the clinical team is involved) gives you a better shot at catching the error early on and avoiding this situation.

  5. K? O'Rourke says:

    David: anything worth doing is worth doing well but I am up against a learning curve and time constraints (and here in particular was rushing to post this in case anyone now is currently involved in designing a study where a non-simple treament effect _should_ be anticipated.

    I do find Movable Type an awkward program and have had a lot of difficulty finding _basic_ things in thier help files. Likely me being to use to having windows type help – but I am open to pointers on that.

    The question mark in my name is to allow most to know who I am while preventing anyone from claiming they depended on knowing these are _my_ opinions (or worse still those of my current employer). Sorry if its distracting – I tried anonomous posting and found that too distracting to me.

    I do though believe blog posts should not be polished but rather rough.

    The rest will have to wait till the weekend – which also gives others an opportunity to comment.

    really out of time now

  6. K? O'Rourke says:

    Wei: you are right in that I did not make the point clearly or didn't realize what point to make

    But first some technical stuff

    I would not expect clinicians to fully appreciate the issues of modeling hazards over time nor the proportional hazards finessing of this by assuming hazards are approximately proportional nor the simplicity of the treatment effect assumptions in most parametric survival methods

    Here the screening effect lag makes proportional hazards just too badly wrong … and I would be surprised if there is a formula for non-proportional hazards with unknown lag effect

    Now the points

    Formulas _force_ simplification while simulation encourages or at least allows complexification and detailing – but that sorts misses the point

    The problem I suspect is statisticians following what other experienced statisticians _know well_ how to do – here mortality outcome – proportional hazards modeling – simple exponential assumption based power formulas

    I was taught not to ever do that!

    Until I worked the problem through from first principles – and then be sure to look at what others do.

    (In statistics by David Andrews)

    An important step in applying statistics is to be very critical of how the representation (statistical model) _captures_ the important features of what is needed to be represented (the screening process).

    Simulating the process may _trick_ one into spending more adequate effort on this

    Or it may not


  7. wei says:

    John and K?: thanks for sharing the precious experiences with me. I really appreciate these. I agree the exponential assumption based power formula is too crude to be relevant for practical problems.