Humans interacting with the human-computer interaction literature

Steve Haroz writes:

An upcoming publication at the conference on human-computer interaction (CHI 2022) presents an application, called Aperitif, that automates statistical analysis and analysis writeups. I am concerned about what it claims to do, and I thought it might interest you for your blog.

It presents a preregistration application with a user interface to enter variables, specify their types, and enter hypotheses. Then it automatically generates analysis code and methods description text.

As amazing as it would be to make preregistration easier, the application seems to attempt too much with the automation. While facilitating statistical reasoning could be helpful, full automation seems to result in questionable choices and severe limitations:
– The application cannot preregister hypotheses with multiple independent variables. No ANOVAs. No interaction effects.
– Non-linear models can’t be preregistered. So preregistering a logistic regression for a forced-choice experiment would be impossible.
– The process of checking assumptions, such as normality, is inflexible. For example, it will include a Shapiro-Wilk test in the generated code even if there are many thousands of observations (which a Shapiro-Wilk test doesn’t handle well).
– It has a built-in power analysis that doesn’t seem to account for whether an experiment is within-subject or between-subject.

I [Haroz] was a reviewer for the submission, and I’ve made my review public. It discusses these concerns and more. I worry that an automated system that so severely constrains what kinds of models and hypotheses can be preregistered and analyzed will do more harm than good, especially for users who do not have the statistical training to question the application’s limitations. I urge caution from anyone considering using Aperitif.

Interesting. I guess that nobody was actually going to use this tool in a real problem—it seems more like a demonstration than something that would be used in applications. That said, it makes sense to point out what’s missing here. A challenge here is that the paper is on human-computer interaction, and so it makes sense for the researchers to try to develop a tool that’s easy to use—but then they end up in the awkward position of making a user-friendly tool that you wouldn’t want users to actually use!

4 thoughts on “Humans interacting with the human-computer interaction literature

  1. It seems there’s a question of design philosophy that becomes important in this kind of work. On some level one could argue we should avoid putting more layers of abstraction between the human and the computations, since treating inference like a black box rather than taking the time to master things and be thoughtful about it is one of the reasons we have a replication crisis. On the other hand, some software layer is necessary, since I doubt anyone wants to go back to the days of calculating statistics manually. But in working on the interface there’s a question of what mindset to bring to navigating the tradeoffs that get made between flexibility vs convenience, and hiding the details vs requiring the user to understand what the code is doing. Questions like, How much training/instruction do you require the person to have before using it?, and When do you stop implementing special cases? might seem like that shouldn’t matter that much because you could also go back and make more improvements/add exceptions later (I agree with Andrew that often work in this vein is more about demonstrating a possibility). But where you stop does end up mattering a lot because analysis is in the details.

    • “where you stop does end up mattering a lot because analysis is in the details”

      Very well-said Jessica. When an analysis tool tries to do everything but has numerous oversimplifications and gaps (e.g., lacking basic model specification and coefficient estimation), we have to wonder who it helps and who may be mislead.
      Helping with the mindless parts like variable specification and numerical reporting: great!
      Automating the critical thinking aspect of preregistration and analysis by opaquely forcing a simple (and sometimes inappropriate) default model: may mislead more than it helps.

  2. Thirty-five or 40 years ago, I think it was when SPSS first released a point and click interface, I had similar qualms about its non-default options being not obvious enough that statistically naive researchers could too easily generate and publish inappropriate analyses.

Leave a Reply

Your email address will not be published. Required fields are marked *