Elections for the Stan Governing Body 2025

(this post is by Charles)

Calling all members of the Stan community to action!

We’re renewing the Stan Governing Body. The SGB co-organizes StanCon and related events (such as StanConnect!), works on initiatives to fund developers, and more generally helps set the directions of the project.

From the Stan forum:

We’re renewing the Stan Governing Body (SGB) with all 5 seats up for grabs. Current SGB members may still run, however they are not guaranteed to preserve their seats. As in previous years, we will use the Stan forum for nominations.

Please respond to this post to self-nominate for either a 1-year or 2-year term on the SGB before April 14th. We encourage all nominees to briefly summarize their experiences with Stan and their goals for the SGB. Feel free to add links to any content you think is relevant. And if you know someone who you think would be a good fit for the SGB please let them know and encourage them to nominate themselves.

You can find more details on the original post.

 

On a more personal note…

On a more personal note, I have decided to not run for a third term. Serving for two years has been an immense privilege.

It doesn’t take much reflection to realize what an integral part of my career Stan has been (and continues to be). I discovered Stan during my first job out of college at Metrum Research Group: I was contributing to the C++ library and building features which would enable applications in Pharmacometrics. My first pull request was the matrix exponential function in 2016. My first research poster was titled “Stan functions for pharmacometrics.” My first conference talk was at StanCon 2017, aka the inaugural StanCon. The project and the people working on it encouraged me—even empowered me—to pursue a PhD in Statistics, which I did at Columbia with, as my advisor, the illustrious stranger who created this blog. Throughout the years, Stan has connected me to people, both in statistics and in many applied domains.

I must recognize that, as I dove deeper into academia, I started dedicating less and less time to Stan. I remember during my first year in grad school proposing, as a final class project, adding a new feature to Stan. The prof told that me that such work was important for the scientific community but that I needed to find something that was conceptually more substantive. I was also given a clear signal that the PhD program was to train researchers, not engineers.

It’s unfortunate that incentives in academia often misalign with the goals of developing (and maintaining!) high-quality open-source software—at least in the short term. What the creators of Stan pulled off in an academic context was remarkable and unorthodox. In the end, it took an international collaboration across academia and industry to carry the project. I can’t do everyone justice, so I’ll refer you to the list of developers.

I have often felt torn between my work as a researcher and as an engineer. Sure, sometimes you can align the goals: I did leverage that final class project to build a prototype feature in Stan and wrote a research paper on it. But this anecdote strikes me more as the exception than the rule. Even after I started working at the Flatiron Institute—an institution that champions the development of scientific software and hires full-time software engineers—I only devoted so much time to Stan. My colleague Brian Ward jokes that during my almost 3 years here, he has never seen me write any C++ (and I almost haven’t). I’m embarrassed by how much time it’s taken me to write up documentation on our suite of HMM functions; I’m not very active on the Stan forum; and it is now a StanCon tradition to have someone publicly bash me for not finishing Stan’s integrated Laplace approximation (this month I hope—Steve Bronder and I are one unit test away from completing the C++ pull request. Steve has done a lot of work to create a clean user interface.)

The reality is that I’ll never get to work full time on Stan like I did before the PhD. And I suppose that’s ok. I enjoy the “conceptual” work—that is the more methodological/theoretical research that I’ve been doing, and I trust that some of it is useful to Stan users and to the broader Bayesian modeling community. I’m actually very fond of my non-Stan collaborators—if you can believe it! But the feeling of not doing enough for Stan… yeah, that’s a real thing.

Two years ago, when the announcement for the SGB election was posted, I saw an opportunity to devote more time to the Stan project in a structured manner. The SGB does a lot and I focused on bringing back StanCon. Concretely, I co-organized StanCon’23 and StanCon’24, and laid the groundwork for StanCon’26 (yes, we’re taking a one-year break). I liked working on these conferences. Sure, it’s work, but a lot of people generously contribute their time, and if the tasks are properly delegated, it all becomes very manageable. Ultimately, it’s very rewarding to see a vision brainstormed over a zoom call come to life when we all gather, say, at a pub in Oxford, and streams of colleagues we haven’t see in months, sometimes years, keep pouring in and gathering around a large table.

I also believe that StanCon is the best applied Bayesian conference out there. Period. And I think its participants benefit immensely from attending—whether by learning a lot from the tutorials or exchanging ideas with top experts. We’re a community of doers.

I hope the next batch of SGB members will tackle this opportunity with the same ambition and pride that the past bodies have displayed. And even as some of us move on, we’ll be here to insure a smooth transition and provide support where we can.

17 thoughts on “Elections for the Stan Governing Body 2025

  1. Charles:

    Thank you for all your work for Stan, which includes coding, algorithm development, research, applications, case studies, and, yes, service on the SGB. And, yeah, I’m super excited about nested Laplace!

  2. This caught my eye (in your personal statement):

    “It’s unfortunate that incentives in academia often misalign with the goals of developing (and maintaining!) high-quality open-source software—at least in the short term.”

    I believe the same can be said about developing and maintaining high quality open access data sets. And I think it is unfortunate. We often rely on government data for this – but we can’t afford that reliance anymore, and it was far from perfect anyway (e.g. see all the discussions we’ve had about the deficiencies of traditional cost of living measures). I think much of the problem with academic research is the asymmetry between credit for analysis of data vs creation of high quality data (measurement, collection, documentation, and maintenance). And, the more technology changes, the less important (perhaps not the ideal word here, but the idea is that analysis is becoming an ever smaller proportion of the total work required with data – understanding and preparing/cleaning the data is increasing) the actual analysis methodology becomes.

    • What I don’t understand about open access data sets is whether they might replicate and/or hide many problematic issues in data analysis that are deemed problematic.

      I mean, the entire pre-registration kind of loses much of its charm for me if the data set already exists and people might have played around a few months ago and at a later point in time pre-register some analysis (and maybe have to indicate that they did not test this before, but that may be something they forgot at this point in time).

      I also don’t understand how all the analyses performed on this open access big data set can be controlled for multiple analyses, or however that’s called.

      And I fear large data sets may have the feature that they may be hard to replicate, which may imply that certain mistakes or spurious correlations may be nearly impossible to correct or spot, especially in the shorter term.

      It seems to me that many of the problematic issues concerning data analyses that have been talked about a lot in the past decade or so may sort of still be present but more hidden when making large data sets available and using these large data sets. But with the added new feature now of being so large and costly that they are nearly impossible to replicate (especially in the short term), which seems highly problematic from a scientific perspective (see Schmidt 2009 for the several functions of replication).

      If this makes any sense, I don’t understand why people seem to be arguing for this stuff…

      • I don’t really understand your comment. Of course, very large data sets are costly to reproduce and still involve lots of judgements about what to measure, how to collect it, and how to document it. But what is the alternative? Small data sets? Sure a very small data set would not be difficult to replicate, but I don’t see the existence of hundreds of disparate and poorly collected/documented data sets as somehow being better than a fewer number of larger well documented open access data sets. To the extent that you are pointing that that data collection, maintenance, and documentation remains a problem with open data, I certainly agree. But I don’t see what you are suggesting as a better alternative. In addition, I don’t think a focus on the “size” of the dataset is a particularly good dimension to focus on – it is more the quality of the dataset that I am thinking about than its size.

        • I am thinking about data sets that are used in psychological science measuring a few variables and performing some statistical analyses using NHST. If I understood things correctly, several problematic issues concerning that have become apparent recently, such as p-hacking or multiple testing or whatever it is called (e.g. performing many analyses and only reporting the ones who were found to be statistically significant).

          I wonder how this all related to gathering and making available larger data sets in psychological science which seem to be applauded recently, and concerning which I keep wondering why. I reason that certain issues are still possible in this scenario, such as testing several combinations of variables and see whether one of them is found to be statistically significant. If that’s possible, I then reason that certain problematic issues are still present in this scenario only the may be less apparent.

          For instance, if someone else than Dr. P-Hack looks for statistically significant findings using who knows how many combinations of variables and then forgets about it for a few weeks and after that decides to pre-register a certain analyses it may look like it’s an improvement but it may not truly be. I think I can clearly remember some papers about p-hacking in 2011-2012 mentioning how “a researcher may forget that they looked at the data before” or something like that, which seems to not be problematic when it comes to using available open data sets.

          And the possible fact that gathering large data sets, regardless of the quality, might be very hard to replicate is also something that I have thought about, and I seem to not really hear people mention. If that is indeed the case, does this not deserve some more attention because it seems to me that this might imply that a possible critical function of replication is nearly impossible, at least in the short-term.

        • It sounds like you are voicing these concerns: that preregistration does not solve all problems; that p-hacking and multiple testing are problems; and that large and open data sets do not eliminate these problems. I don’t disagree with any of those worries and I think there are plenty of posts on this blog covering all of these. But I don’t see what that has to do with providing open data – in fact, open data is perhaps the best tool for preventing, or at least, identifying such problems. I would not say open data is sufficient, and perhaps it is not even necessary. But I don’t see any of these concerns as representing an argument against open data. Are you suggesting that if we abandon open data we somehow will reduce any of these problems?

          Your concerns about psychological science “measuring a few variables and performing some statistical analyses using NHST” don’t seem relevant to the issue. These are certainly problems and making data openly available can only shed more light on the problems. Hiding the data can only promote more headlines and hyped results with little opportunity for meaningful criticism.

        • Quote from above: “But I don’t see what that has to do with providing open data – in fact, open data is perhaps the best tool for preventing, or at least, identifying such problems.”

          Well, it seems to me that it in fact might not identify such problems, and might even sort of hide them. For instance, if I am understanding things correctly, open data can result in many different people analyzing this data at one point in time, forgetting about this for a few months, and later on test something and perhaps even pre-register this.

          This all might then look like a more solid version of NHST, one that might be better than what has been criticized in recent years, but it seems to me to possibly be about as bad concerning some things like multiple testing and not correcting for that (or whatever the problematic issues are). Just like one researcher might test several hypotheses in their data set on their computer in their office and only report the ones that are statistically significant, so might dozens of researchers testing several hypotheses in the open data set posted somewhere and only report the statistically significant findings. And just like a researcher may have “forgotten” that they analyzed the data before on their data set on their computer in their office, so might researchers using this open data set posted somewhere.

          And even if the possible pre-registration when using this open data set might contain some wording like “have you looked at the data before” or I don’t know what, that might also be “forgotten” when the case. Or put differently, I reason that there might be a difference in how to judge pre-registration of analyses using open data or using still to be gathered data. I have way more confidence in the latter version of pre-registration, also for reasons that seem to me to be mentioned a few years ago but seem now all of a sudden not to be things to worry about (i.c. a researcher “forgetting” they have looked at the data before).

        • I won’t address the substance of your latest comment other than to say I don’t agree. But my original point was about the misaligned incentives vis a vis data curation and data analysis. And I think that lack of open data provides the worse incentives of all: no incentive to measure, document, and maintain quality data at all – all the incentive is on the analysis. That is exactly the environment in which p-hacking, forking paths, fraud, and associated ills will prosper.

  3. Hi Dale, hi Anonymous,
    Thanks for the thoughtful conversation.

    Yes, the curation of open data sets is something that is not sufficiently encouraged in academia. They are attempts to promote such work; for example, NeurIPS’24 had a publication track for new data sets. Likewise, there are efforts to recognize the development of open-source software, such as the Journal of Open-Source Software (https://joss.theoj.org/).

    There are many roles, other than hypothesis testing, that an open data set can play:

    (1) evaluating the performance algorithms. I often rely on libraries such as PosteriorDB (https://github.com/stan-dev/posteriordb) and the inference gym (https://pypi.org/project/inference-gym/). There are also many curated data sets commonly used to evaluate the performance of machine learning algorithms. The pitfall is to overtune algorithms to work on a specific data set / application (I believe Bob Carpenter has examples of this happening in NLP). But a rich and diverse data set alleviates this problem. Unfortunately, many algorithms are only evaluated on a narrow set of problems and it is then unclear how they perform on a broad class of applications.

    For a paper that does an excellent job evaluating the performance of algorithms, I recommend the pathfinder paper by Lu Zhang and colleagues: https://arxiv.org/abs/2108.03782.

    (2) training models (e.g. large language models, foundation models) for specific uses and applications. The Well is a recent example of such a data set: https://polymathic-ai.org/the_well/

    (3) data exploration. Here I’m thinking about all the public data we have on COVID-19 which epidemiologists extensively leverage.

    In all these cases, I’m very happy that people have put a lot of work curating and sharing these data sets!

    But Anonymous’ point is well taken: if many labs analyze the same data set—and this data set is relatively poor—, we may overfit the data (when building algorithms, training models) or we may fall privy to “collective p-hacking” when testing hypothesis. I hadn’t thought too much about the latter, but yes, I can see how this would be a problem.

    • Thank you for your post, and also thanks to Dale of course. It helped me a bit I think to differentiate between certain analysis methods of (large) data sets, and most importantly to lightly underscore my intuitive thoughts concerning the possible issues regarding NHST and psychological studies where large data sets are used (and even applauded and welcomed).

      I am pretty bad at statistics so I seem to mostly try to conceptually think about these things, and maybe sort of “translate” things in my own words. I think this is largely how my questions arose: what if professor P. Hack did not do all the problematic stuff alone but posted a note on the door with “open data” and left the door open and all the professor’s colleagues could walk in and do the things this professor might normally do alone. It amounted to the same result, in my mind at least, which is why I keep wondering about this stuff from time to time…

      Anyway, perhaps someone more qualified than I can talk about this some more. Or even write about this. You may have introduced a nice term in light of this all: “collective p-hacking”. And maybe “covert multiple testing” can also be introduced in this light. How do you even know how many multiple comparisons/testing has been done in the case of open data? How can you possibly correct for multiple testing when there is open data that anyone can access and analyze at any time, and nobody knows what is or has been being analyzed, by whom, and at which time. Again, I don’t know much about statistical analyses, but I hope some other people might think or even write some more about this stuff if it is important to do so.

    • I am confused about this point:

      “if many labs analyze the same data set—and this data set is relatively poor—, we may overfit the data (when building algorithms, training models) or we may fall privy to “collective p-hacking” when testing hypothesis. I hadn’t thought too much about the latter, but yes, I can see how this would be a problem.”

      I agree that this would be a problem, as would any badly done analysis and as would the use of any use of poor data. But I still don’t understand the relevance to open data. Are you and anonymous advocating that somehow restricting access to data will reduce such problems? Are you saying that erecting hurdles to who can see or use data will somehow reduce problems with bad data and/or bad analysis? If so, then I’ll have to disagree. As I said, open data is not sufficient to eliminate bad research practice. It isn’t necessary either, but it is almost so. I’d say that it is the single most important tool we have – and that in the absence of open data, all of the problems with p-hacking, forking paths, fraudulent practice, etc. become more serious – harder to detect, harder to question, and harder to bring into the open.

      The only relevance I can think of is that if many researchers use the same data set, then it sort of builds credibility into that data. But surely the repeated and widespread usage of a data set is evidence that it has some level of quality, isn’t it? Or have we reached a point where we so distrust researchers that we can’t even rely on the community of researchers to impose any discipline on what passes as quality data? I would offer as counter-evidence: there are many widely used government data sources (various national health, transportation, time use, surveys, etc.) that attract significant criticism. They are not perfect, but they are carefully curated and the criticisms are open and part of the research record that, hopefully, leads to improved understanding of the world. I don’t think that limiting access to this data improves either the quality of the data or the quality of the research based on the data.

      • Quotes from above: “But I still don’t understand the relevance to open data”

        From my perspective, the examples given of some possible version of covert multiple testing or performing many analyses with many variables without necessarily publishing the results might be made possible exactly because there is open data. That’s part of what I am wondering about and trying to make clear.

        The fact that there is open data that is accessible to other researchers can possible result in some more hidden forms of problematic statistical analyses and questionable research practices that have been mentioned as being problematic in recent years.

        I am not saying there should be no open data. I am not saying there are no benefits of open data. I am just wondering whether there might be certain more hidden problematic issues in the use of open data that might not be talked and written about much. If these are indeed problematic, I think they deserve mentioning, regardless of the possible benefits of open data such as reproducing statistical analyses reported in papers to check for possible errors.

        • Respectfully, No. Surely that could happen. But going down that road leads to some kind of filtering who can safely look at data. The NEJM went through this with their infamous SPRINT competition to investigate the potential benefits and costs of making clinical trial data open to public access. Even in that competition, they erected barriers to participation that resulted in half of the groups attempting to partake not passing the hurdle they erected to ensure that the research groups were “worthy” of accessing the data and being in the competition. I see no way that screening who is qualified to look at the data can improve the way research is conducted. We’ve had discussions on this blog about potentially requiring a professional statistician to be part of all research teams that do statistical work. As appealing as the idea is (and is often practiced), the details always derail it – who qualifies as a “professional statistician,” and more importantly, who decides?

          There is no guarantee that open data will always improve research, nor do I deny that there are problems associated with providing data access (including privacy issues, proprietary data sources, and protecting the work of data curators). But the idea of limiting access to data because of how it might contribute to poor research practice is a nonstarter to me. I’ll go along with it as soon as you can provide a mechanism for screening access that isn’t worse than the problems it is meant to overcome.

        • I am also not talking about filtering or giving access to certain researchers and not others or whatever you are talking about now. It’s hard to discuss something for me when there is the adding of new stuff to the discussion. I am not talking about filtering, or how open data is useless, or whatever.

          I like to keep it simple, at least at first, to see if I understand things correctly. I am thinking and talking about open data sets that can be used by other researchers to test new hypotheses and in doing so might essentially repeat questionable research practices and analyses that have been mentioned in recent years but in a slightly different way that directly has to do with the fact that there is open data.

        • My last post on this topic. I recognize that you are pointing out issues that are valid and worth thinking about. But I think you are talking about filtering access. If we don’t have open access, then that means there is some mechanism to restrict who can access data. In our present system, access is restricted by legislation that protects a variety of rights (privacy and ownership). Much interesting data can potentially be accessed through various approval processes (e.g. pharma companies often have a process for requesting data; many government sources have an application process for accessing personal data; many publications have data availability statements that say data will be provided upon reasonable request; etc.). Whenever data is not open, it means there is some mechanism to limit who can get it – I refer to that as a filter.

          Since you are talking about issues that can arise when multiple researchers access the same data, possibly ignorant of the other analyses of that data, or even ignorant of their own prior analyses of that data, you keep suggesting that open data can make these problems worse. But if that is the case (conjecture at this point), then you can either limit access to the data or you need to find some other method of protecting against bad research practices. Limiting access seems like a step backward to me. By all means, we should explore ways to mitigate or prevent practices such as p-hacking, (over)use of NHST, multiple comparison abuses, fraudulent data creation, etc. But I think limiting access to data will not meaningfully reduce these problems – it is likely to exacerbate them.

        • Quote from above: “My last post on this topic. I recognize that you are pointing out issues that are valid and worth thinking about.”

          My last post on this topic.

          If these issues are worth thinking about why does it seem like I haven’t heard about them much? Is that just me? Is there mention of these issues in, for instance, papers that advocate for open data? Has someone looked at a few large scale psychological science studies which made their data available and looked at how this data has been used since then? Could it be that this data has been used to test new hypotheses, and if so, is that problematic in some way? Should we count all the analyses of these latter kinds of studies and retrospectively Bonferrori multiple correct for them all (or however that works)? Is it strange that I think about such things (I am like really bad at statistics)? Why aren’t all these people who advocate for open data talking about this kind of practical stuff that might directly relate to providing open data? Or are they? Or aren’t there possible problematic issues regarding this all worthy of some more thought and discussion? I don’t know. I am just trying to understand.

  4. Thank you for all you’ve done for Stan! StanCon 2024 was amazing and it really showcased how much time and effort you and the other organizers put in! It was wonderful to meet you and others in person. I’m glad our contributions to Stan and community overlapped as it was a joy seeing what you accomplished.

    I’m admittedly relieved there’s a one year break before another StanCon, I’m still recovering from the last one. Not being an academic myself, I don’t get much time to talk or polish my statistics/Stan stuff so the polish on all the material for my talk and workshop was quite labor intensive!

Leave a Reply

Your email address will not be published. Required fields are marked *