Skip to content

We need to practice our best science hygiene.

Of course I am not referring to hand-washing and social distancing but rather heightened social interactions between those now engaged or who can get engaged in trying to get less wrong about Covid19.

That is, being open about one’s intentions (the purpose of the effort), one’s methods and one’s data and data sources.

For instance these data sources, Canada testing and results, US testing and results

and some information on ongoing trials  (which underlines the need to for good expertise and advice.)

I know, conjectures and opinions can be help, but I would suggest comments here be limited to data sources and methods to analyse data sources and of course trial designs.

p.s. We each need to find where our particular mix of skills will be most useful and join in there if and when we can. I am currently on standby where I work, so I won’t know exactly what I will be working on. In light of this, I am trying to get a scan of were good clinical research/evaluation material and advice might be located.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author.



  1. jrc says:

    This is a time when I think two aspects of data combination will be really important:

    1) Appending multiple studies from different sources, as repeated cross-sections and panels with overlapping measurements.

    2) Merging multiple data sources based on location, demographic traits, biological traits, and individual level observables.

    Something that can help and help more if implemented quickly is to try to standardize a certain set of measurements that all agencies and researchers would collect starting now in ways that can be easily appended or merged later. And to set up a website to publicly provide the resulting data and to document the particular measurements and characteristics of each study. The clinical trial registry model might be a kind of starting point for that, but with something of the IPUMS standardization and ease of download.

    I say this as someone who makes a living as a professor whose research involves both questions related to global health and various ways of combining data through appending and merging, and thus I obviously have a financial and professional interest in people finding those models useful.

    • Keith O’Rourke says:

      > have a financial and professional interest in people finding those models useful.
      That needs to be set aside, it’s commendable that you acknowledge it, but just concentrate on how you can contribute and go for it.

      • jrc says:

        I was trying to be overly precautious (and a little bit obviously/theatrically over precautious) with the “being open about one’s intentions” part. It isn’t that it feels like a conflict of interest, it is just a little bit of “here is the perspective I’m coming to this comment from,” which I think helps to frame the “purpose of the effort”. Here the purpose is to think about boring measurement issues that might pay off in the short/medium term.

        I really do think the ability to bring together information from different sources is going to be a huge difficulty, and that the more we do now the faster we will be able to learn things. For instance, if 10 groups are running studies trying out different but overlapping drug combinations, there is no reason to keep all of those observations separate and analyze each study individually – pool it all together, account for the data structure, and find things that actually work and not the 1 out of 20 (or 15 or 10 or 5) things that show up with significance stars in a particular study. That is going to be much easier if we spend some time now thinking about how to standardize measurements and data collection, and how to organize that data and make it searchable and downloadable for researchers.

        Beyond pooling multiple drug trials for efficiency and precision in treatment effectiveness estimates, I can imagine data mashups of these types being absolutely fundamental to answering a lot of other important questions:

        What is the optimal strictness of quarantine?

        Who are the people who are going to be most affected economically and how can we keep them from homelessness?

        And later, What forms of government economic stimulus and relief are most effective in recovery?

        In many of these cases, variation in public health or government approaches across space and time will be crucial to identifying coefficients of interest. In other cases it will be merges by demographic groups or biological traits. All this requires appending state/federal infection data and merging it with measures of people’s characteristics or government responses or quarantine strictness measures or whatever (and just as importantly having the information needed in all datasets to make the merge possible and/or make the stacking possible). The more we do to standardize data early the faster we can organize it for dissemination to researchers and they can get to work.

        I guess my long-term dream is that the 50,000 or so studies that will be conducted in the next few years have a chance of being formed into one huge, growing and evolving database that we can all use in various ways to answer the pressing questions that arise using all the data available. It’s an unreachable goal, but it seems like a potentially useful ambition.

  2. Hi Keith, thanks for the efforts here. I appreciate the links to the data you did put up. I’m hoping as things shake out a bit more, people will have more information on raw data sources. So far it seems SOO fragmented, the site is particularly a good effort.

  3. Njnnja says:

    Do you have any leads for how those of us not normally involved in public health, in the private sector, might be able to contribute to any data cleaning/analysis efforts?

  4. Keith O’Rourke says:

    To review modern clinical study designs that are adaptively designed and thoughtfully analyzed I would suggest Biostatistics and Bioinformatics in Clinical Trials Hobbs, Berry, Coombes

    Unfortunately, that is behind a paywall :-(

    But I have an earlier version.

  5. Anoneuoid says:

    Just think about the mindset of the people writing this:

    They cite papers reporting the opposite of what they say (few smokers in the data and ACE2 is downregulated in smokers).

    Meanwhile, the evidence continues to accumulate:

  6. has collected a bunch of data from various sources and curated it a bit, organized it by country, date etc. Here are some links:

Leave a Reply