Econ grad student asks, “why is the government paying us money, instead of just firing us all?”

Someone who wishes anonymity writes:

I am a graduate student at the Department of Economics at a European university.

Throughout the last several years, I have been working as RA (and sometimes co-author) together with multiple different professors and senior researchers, mainly within economics, and predominantly analysing very large datasets.

I have 3 questions related to my accumulated professional experiences from “the garden of forking paths” etcetera. Maybe it would suffice with just asking the actual questions, but I nevertheless begin with some background and examples…

In my experience, all social science researchers that I have worked with seem to treat the process of writing a paper as some kind of exercise in going “back-and-forth” between theoretical analysis and empirical evidence. Just as an example, they (we) might run X number of regressions, and try to find a fitting theory that can explain the results. Researchers with the most top publications often seem to get/have access to the greatest number of RAs and PhD students, who perform thousands of analyses that only very few people will ever hear about unless something “promising” is found (or unless you happen to share the same office). I have performed plenty of such analyses myself. In one recent case, my task was to attempt to replicate a paper published in a top journal using data from my country (instead of from the country whose data was used in the original paper). When asking my boss in that project whether we could perhaps publish the results of this replication as a working paper, he replied that him and his collaborator (a famous professor from yet another country) just wanted me to perform this replication in order to see whether it was “worthwhile” to test some other (somewhat related) hypotheses they had. The idea, he wrote, was never to make any independent product out of this replication, or even to incorporate it into any related research product. In this case, I found “promising” results, so they decided to pursue the investigation of their (somewhat) related hypotheses. In other similar cases, where I didn’t find any such “promising” results, my boss decided to try something else or even drop the subject entirely.

Using e.g. “pre-analysis plans” never seems to be an option in practice for most researchers that I have worked with, and the more honest(?) ones explicitly tell me that it would be career suicide if they chose not to try out multiple analyses after looking at the data. Furthermore, at seminars people often ask if you have tried this or that analysis, if you haven’t (yet) found enough “valuable” stuff in your analyses. (To be fair, they also ask whether you have performed this or that robustness check or sensitivity analysis.)

I might also note that when writing my M.Sc. thesis, some supervisors explicitly and openly encouraged me and other students to exclude results that were not consistent with the overall “story” (we) tried to “tell”, with the motivation that ”this is only a M.Sc. thesis”. However, while maybe not as openly encouraged, similar stuff nevertheless continues also during Ph.D studies (if one is admitted to the Ph.D. program, perhaps partly thanks to a M.Sc. thesis telling a sufficiently convincing “story”). And, as my experiences as RA suggest, it continues also after finishing a Ph.D. In my experience, the incentives to be (partially) dishonest often seem to be quite overwhelming at most stages of a researcher’s career.

No one seems to worry too much about statistical significance, not necessarily because they do not care about “obtaining” significant results, but because if one analysis doesn’t yield any stars, you can almost always just try a different one (or ask your PhD student or RA to do it for you). I have “tried” hundreds of analyses, models and specifications during my four years as research assistant. I’d say that I might easily have produced sufficient material to publish at least 5 complete studies with null results or results that were not regarded as “interesting” or “clear” enough. No one except me and a few other researchers will hear of these results.

In the project where I am working at the moment, we are currently awaiting a delivery of data. While waiting, I suggested to my current boss, who has published articles in top journals, that I could write all the necessary code for our regression analyses as well as the empirical method section of our paper. In that way, we would have everything completely ready when finally getting access to all the data. My boss replied that this might be sensible with regards to the code used for e.g. the final merging of the data and some variable construction, but he argued that writing code for any subsequent regression analyses before obtaining access to the final datasets would be less useful for us since “after seeing the data you’ll always have to try out multiple different analyses”. To be fair, I want to stress here that my impression was not at all that he had any intention to fish. I simply interpreted his comment as a spontaneous and frank/disillusioned statement about what is (unfortunately) current standard practice at the department and in the field more generally. Similar to at least some other researchers that I have worked with, my current boss seems like a genuinely honest person and he also seems quite aware of many of the problems you mention in your articles and blog. My impression is that he is also transparent with most of what he does (at least compared to some others I’ve worked with). In my opinion, he is both generous, fair-minded and honest. Thus, my concern in this particular example is more about the existing incentives and standard practices which seem to make even the most honest researchers do stuff that perhaps they should not. I also want to stress that the sample sizes in our analyses are very large (using high-quality data, sometimes with millions of observations). Furthermore, (top) publications in economics often include e.g. a very comprehensive Web Appendix with sensitivity analyses, robustness checks and alternative specifications (and sometimes also replication dofiles).

Now, if you have time, my questions to you are the following:

1. Does e.g. having access to very large sample sizes in the analyses, and publishing a 100+ page Web Appendix together with any article, mitigate “the garden of forking paths” problems outlined above somewhat? And what can I do to contribute less to these problems?

2. A small number of researchers that I have collaborated with argue (at least in private) that their research is mainly to be regarded as “exploratory” because of the stuff I have outlined above. Would simply stating that one’s research is “exploratory” in a paper be a half-decent excuse to do any of the p-hacking and other stuff outlined in my email?

3. Has my job throughout the last several years been completely useless (or even destructive) for society? That is how I personally feel sometimes. And if I am right, why should we fund any social science research at all? It often seems to me that it would be impossible to get any of the incentives right. When recently asking a prominent researcher at my current workplace whether he believed that the current system of peer-review is successful in mitigating problems related to :the garden of forking paths” or even outright “cheating”, he simply started to laugh and replied “No, no… No! Absolutely not!” If this prominent researcher is right, why is the government paying us money, instead of just firing us all?

My reply:

First, it’s too bad you have to remain anonymous. I understand it, but I just want to lament the state of the world, by which the expectation is that people can’t gripe in public without fear of retaliation.

Now to answer your questions in reverse order:

Why is the government paying us money, instead of just firing us all? All cynicism aside, our society has the need, or the perceived need, for social science expertise. We need people to perform cost-benefit analyses, to game out scenarios, to make decisions about where to allocate resources. Governments need this, and business need it too. There’s more to decision analysis than running a spreadsheet. We need this expertise, and universities are where we train people to learn these tools, so the government funds universities. It doesn’t have to be done this way, but it’s how things have been set up for awhile.

Should they be funding this particular research? Probably not! The trouble is, this is the research that is given the expert seal of approval in the “top 5 journals” and so on. So it’s not quite clear what to do. The government could just zero all the funding: that wouldn’t be the worst thing in the world, but it would disrupt the training pipeline. So I can see why they are funding you.

Has my job during the last 4 years been completely useless (or even destructive) for society? Possibly. Again, though, perhaps the most important part of the job is not the research but the training. Also, even if your own research has been a complete waste of time, or even negative value in that it’s wasting other people’s time and money also, there might be other research being done by similarly-situated people in your country that’s actually useful. In which case it could make sense to quit your current job and work for some people who are doing good work. In practice, though, this could be difficult to do or even bad for your career, so I’m not sure what to actually recommend.

Would simply stating that one’s research is “exploratory” in a paper be a half-decent excuse to do any of the p-hacking and other criminal stuff? Lots of my research is exploratory, and that’s fine. The problem with p-values, p-hacking, etc., is not that they are “exploratory” but rather that they’re mostly a way to add noise to your data. Take a perfectly fine, if noisy, experiment, run it through the statistical-significance filter (made worse by p-hacking, but often pretty bad even when only one analysis is done on your data), and you can end up with something close to a pile of random numbers. That’s not good for exploratory research either!

So, no, labeling one’s research as exploratory is no excuse at all for doing bad work. Honesty and transparency are no excuse for being bad work. A good person can do bad work. Doing bad work doesn’t mean you’re a bad person; being a good person doesn’t mean you’re doing good work.

Does e.g. having access to very large sample sizes in the analyses, and publishing a 100+ page Web Appendix together with any article, mitigate “the garden of forking paths” problems? I don’t think the 100+ page web appendix will get you much. I mean, fine, sure, include it, but web appendixes are subject to forking paths, just like everything else. I’ve seen lots of robustness studies that avoid dealing with real problems. My recommendation is to perform the analyses simultaneously using a multilevel model. Partial pooling is your friend.

51 thoughts on “Econ grad student asks, “why is the government paying us money, instead of just firing us all?”

  1. IMO people are confused about “forking paths”

    From a *scientific* perspective, there’s nothing wrong with getting some data and running a billion different analysis to see what works. That’s just a high-powered analytical version of trying to hammer the shape through the hole in the Playschool bench – if it fits through the hole, it fits. When you find a model that solves your problem, boom. Be happy.

    The problem with running multiple analyses is a statistical methods problem, not a scientific problem. Yes, p-hacking can yield spurious results. But even that isn’t a problem really. Any method can yield spurious results. The problem is that people should be verifying the results through multiple different means *no matter what method is used* long before they make dramatic claims about what the results mean for society, and long before Congress passes billion dollar programs based on the results.

    So the upshot is that nothing is wrong with anything this person is doing. The problem is that people make ridiculous claims on the basis of a single analysis that has a high probability of being spurious. People need to test and retest these claims. If they do that, the methods that don’t work will gradually fall out of use.

    • Doesn’t ”forking paths” refer to a more subtle issue than multiple comparisons? My understanding (which is probably confused) is that it refers to decisions that a researcher theoretically could have taken but did not (so it’s not multiple comparison or p-hacking, because the researchers didn’t backtrack and start over again on a more promising path that leads to the magic p-value). This is a subtle issue, and I think people **are** confused about it, because they don’t realize all the possible paths that could have been taken in the analysis, and they really are affronted because they think you are accusing them of p-hacking, which they actually aren’t.

      • Its just stuff like:

        1) Rate of severe covid = a*age + b*bmi + c*wealth

        But if you also add in blood vitamin c levels at day 3 of infection youd get:

        2) Rate of severe covid = a*age + b*bmi + c*wealth + d*vitC

        Then the a, b, and c coefficients all change greatly since you included a major common cause for the correlations. If you never collected the vitamin c data to begin with its not going to be in your model.

        Now model #2 obviously isnt the whole story either. We can add interactions, othrr variables, or use some nonlinear model instead. It can be used for predictive skill but the values of these coefficients dont mean anything since they are the result of the “path” you took.

      • Right, there’s a difference in *algorithm* between “try every possibility and take the one you like best” and “make a intelligence-directed random walk and at each step only backtrack if it works out very badly” which can look quite linear with maybe only a couple backtracks.

        So if you get from the root of a tree to a leaf with only say 2 backtracking choices that you explicitly mention in your paper, it can feel like you just “got to the answer”. But when it comes to interpreting your p value, the potential branch points were so numerous that the “try every possibility” method might easily have included millions or trillions of potential options.

        It’s like saying that chess is trivial because you won a particular game. There were trillions of possible other games you didn’t play in which you might well have lost. The “directed thought” you put into your chess moves is what made you win, not that “the answer to chess was always this sequence of moves”

      • “My understanding (which is probably confused)”

        No doubt mine is as well. But my point isn’t about p-hacking specifically. That’s just one example.

        The point is that if you conclude from your data that women who have toe fungus are more likely to murder Latino men on Fridays, even though you’ve p-hacked or wandered through the garden of forking paths or whatever, your conclusions could be true. And if you concluded that black folks are more likely to become NBA players than white folks, even though your conclusion looks right, it could be wrong too. There’s no way to tell except to keep retesting.

        So it’s not wrong to use traditional statistical methods. What’s wrong is to hype results obtained with a method that regularly produces spurious results. .

    • Wise comment, Jim. The posting graduate student also needs to realize that 99% of research is “wasted” even when it’s done perfectly, because it’s really hard to find important results. We *should* be tearing up most of what we do and starting over. Even when something is publishable and published, we usually decide as a profession that it was a mistake to publish it. We show that by never citing the paper, or by string-citing it, or by never using it or its successors ten years later. You have to think of yourself as being part of the scientific project and not necessarily making any advancement yourself, because, of course, we don’t know ex ante which 99% is wasteful and which 1% makes it all worthwhile.

      It’s a bit like the Buffett-Munger theory of investing. They beat the market (or used to) by investing in very few stocks. If I remember right, some years they decide they haven’t found any new stocks worth investing in. Rather, they research companies very carefully and make gobs of money on a few big hits. If you’re not as clever as them, you can’t even do that, though you could work as their employees helping them.

      • Sure. But I guess you are not saying that we cannot gain anything by reducing the amount of garden of forking paths, p-hacking, and publication bias among researchers? If, say, every economists included multiverse analyses in the appendix, and/or used pre-analysis plans when feasible, and published null results, etcetera etcetera, then this would arguably result in a lower percentage of ”wasted” research, albeit it would still be high. Or am I wrong here?

        Also consider the following statement by Gelman:

        ”The problem with p-values, p-hacking, etc., is not that they are “exploratory” but rather that they’re mostly a way to add noise to your data. Take a perfectly fine, if noisy, experiment, run it through the statistical-significance filter (made worse by p-hacking, but often pretty bad even when only one analysis is done on your data), and you can end up with something close to a pile of random numbers. That’s not good for exploratory research either!”

        If this is the case, then we must surely try to do something about these problems, right?

      • Is Jim (and Eric B. Rasmusen) arguing that it’s perfectly fine to do p-hacking, garden of forking paths and never publish null results? If so, I fail to understand the logic. Wouldn’t we all be better off if there were less of this stuff? If all economists also published any null results, and if they included e.g. multiverse analyses in the appendix, and if they used pre-analysis plans when feasible, etcetera, research would progress faster and less resources would be wasted. Flawed research practices would not be rewarded to the same extent that it is today, and less charlatans would climb to the top. Or am I missing something?

    • @jim

      Perhaps I’m stupid, but I don’t really follow the logic of the last part of your comment. I apologize if I am hereby repeating some of the stuff I already wrote in another comment, but when you write that ”the upshot is that nothing is wrong with anything this person is doing”, are you then saying that it’s perfectly fine to do any kind of p-hacking, garden of forking paths and never publish e.g. null results? Wouldn’t society be better off if economists also published any null results, and if they included e.g. data and model multiverse analyses in the appendix, and if they used pre-analysis plans when feasible, etcetera? My guess is that if they did, research would progress faster and less resources would be wasted. Producing flawed research and spurious results would not be rewarded to the same extent that it is today, and less charlatans would climb to the top.

      Or what am I missing here? If ”nothing is wrong with anything this person is doing”, why not just commit outright fraud using made up data instead of paying research assistants and others to produce spurious results etcetera? I can imagine this would actually save tax dollars. I mean, any method can yield spurious results! People just need to test and retest these claims!

      @ Eric B. Rasmusen

      You write that ”[t]he posting graduate student also needs to realize that 99% of research is “wasted” even when it’s done perfectly, because it’s really hard to find important results. We *should* be tearing up most of what we do and starting over. Even when something is publishable and published, we usually decide as a profession that it was a mistake to publish it. We show that by never citing the paper, or by string-citing it, or by never using it or its successors ten years later. You have to think of yourself as being part of the scientific project and not necessarily making any advancement yourself, because, of course, we don’t know ex ante which 99% is wasteful and which 1% makes it all worthwhile.”

      Agreed, science makes progress, most research is ”wasted” (in the sense you imply above) even when done perfectly, any single individual is only part of”the scientific project” etcetera etcetera. But how is any of this justifying the behaviour discussed in this blog post? See my reply to Jim above. What am I missing here?

  2. Although I think it may be a necessary evil, I still have a beef with pre-registered analysis. It just exchanges the current problem of overfitting with that of underfitting. We enter into any analysis with a set of priors over possible models, and pre-registration hamstrings us into only testing those models with the highest prior, even when they totally whiff on fitting the actual data. While it’s still better than overfitting by optimizing for the likelihood or p-values, pre-registered analysis still fails to achieve our goal of actually explaining the data. Then again, I don’t really have a better suggestion, since degree of freedom will be abused with the current incentive structures.

    • “I don’t really have a better suggestion”

      One way would be to simply accept that most statistical analyses in social science have very high uncertainties and results have a high chance of failing repetition no matter what. So do the work then let others test the work with many different data sets and many different methods.

      Really that’s the paradigm shift that needs to occur in social sciences: more repetition, less being convinced on the basis of a single model.

    • Maybe I’m wrong, but I didn’t think that pre-registration precluded from trying other analyses. It just constrains you to present the pre-registered analysis first, before you say “when we looked at the data it was obvious that XXX (which had not occurred to us when designing the analysis), so we ran analysis YYY: here are the results”. Obviously this doesn’t stop you from forking/hacking in the exploratory phase, but at least it’s clear which analyses are truly confirmatory and which are explanatory …

      I’m sure someone has made this point before, probably even in this venue. I tried googling “pre-registration is not a straitjacket”, which I thought I remembered from somewhere, but didn’t get any useful hits …

  3. Agree with Jim…. Can’t count how many times I’ve read an Econ survey paper says something like, “It appears that X does not affect Y.” or “It seems that this effect is large and positive.” and the conclusion is based on *one* study published in a top Econ journal. Interestingly, in political science for many important “effects” there are sometimes a hundred reasonably well-done studies(eg GOTV studies). As the letter writer to Gelman suggested, in Econ it appears that the incentives to work on a paper that replicates/reproduces something in another context in order to ascertain external validity and improve (even if slightly) internal validity, drop off very sharply after the 3rd paper… and the incentives for R1 researchers to read and cite papers published in second tier journals, no matter how competently done, are negative.

  4. Some economists who follow this site may dispute this, but here is my .02 as a nonconforming member of the tribe. Journal articles are far easier to place in high impact journals if they use empirical results to support an acceptable theory or model, at least in the core realms of economics. Sure, you can have an atheoretical finding about how some historical event or pattern 2000 years ago still shows up in current data, but if you’re doing work in trade theory, labor markets, financial markets, etc. you better have the right sort of model to narrate your findings.

    There is a lot of elasticity in what that model can say, a whole universe of tweaks. The thing is, however, the starting point has to be outcome-rationality, agents who are solely concerned with how outcomes affect their utility and are rational maximizers with respect to them, and the model needs to be determinative (multi-equilibrium models need to be closed). Even if you want to bring in, say, social norms, you have to do it so that each agent is maximally rewarded for acting according to the norm, and the conditions under which any particular norm will be selected have to be specified. Economists often wax on how flexible this theory rulebook is — “a clever grad student can come up with a model that shows anything” — but it isn’t infinitely flexible. A lot of potential empirical results are simply resistant to being explained via any “U-max plus tweak” strategy.

    An interesting result in empirical work is one that “is consistent with” an interesting tweak to the basic optimizing model. On the one hand, that rules out a large number of potential results, including a lot of nulls. If you get one of those but you still want a crack at a top journal, you have to try something else. On the other it rules in a number that can be modeled tweakishly. It doesn’t surprise me that successful research economists want to leave methodological space that allows them to scan for such results.

    My own view is that, while U-max (and π-max etc.) models are useful in many cases, the straightjacket is deadly. Part of that deadliness is what it does to empirical work. The anonymous RA strikes me as a truth-teller, but the problem is not only econometric but also ultimately theoretical.

    • “…the starting point has to be outcome-rationality, agents who are solely concerned with how outcomes affect their utility and are rational maximizers with respect to them…” Well said. How anybody who has been married or has children can take this idea seriously reminds me of the joke that “matriculation can make you blind;” it seems true of economics.

      • Unfortunately, there is a whole strand of economic research on marriage and child-bearing that begin with the utility maximizing rational framework. Of course, most economists are men, so perhaps marriage and children don’t affect their beliefs as much as the might in other disciplines.

    • Peter –

      > while U-max (and π-max etc.) models…

      If I might ask, does “π-max” = profit maximization? What other “max’s” are there to make up the “etc.?”

      • Yes, pi for profit. You’re right that no other maximand comes to mind at the moment, but both utility and profit come in multiple flavors. Some public choice models maximize votes, yes?

        • Peter –

          Thanks for the benefit of the doubt – but it wasn’t that I was right, merely that I was not aware of whether there might be other possible maximizations…

    • I disagree. I think it’s easier to get published if you come up with something surprising and new, rather than merely confirming existing theory, and excessive love of novelty is a bigger problem than love of papers that just confirm what everybody already thinks they know.

      Of course, if you don’t have an optimizing theory, it ordinarily means you have no theory at all, and people won’t believe your empirical results for the reasons discussed in the post– that it’s too easy to come up with spurious results so they’re probably accidental rather than being the result of some behavioral theory you invent after the fact to explain why people are, e.g. buying more when the price is higher, failing to look two months ahead to see problems, etc.

    • this is implicitly assuming that there is a single overarching theory that is dominant with no competitors. of course econ is dominated by a rational choice *framework*, but as any well-read economist would understand, rat choice is surprisingly unrestrictive. i can’t think of what you’re really thinking of that’s so resistant to “max-U + tweaks,” except something like the allais paradox which is…not exactly an undertheorized area itself. and frankly, the amount of mostly-atheoretic work is *so* large these days to make it even less restrictive, especially in labor, development, etc. labor is such an odd example for you to choose given the well-published rise of recent work explicitly overturning orthodoxy on competitiveness and (an assumed lack of) market power in labor markets.

  5. Could you please expand a little on your last recommendation, i.e. how to perform multiple disparate analyses using multilevel models?

    I suppose I keep thinking about mixed-effects/multilevel models as hierarchical or models with varying intercepts and slopes. I can’t quite fit the idea of estimating multiple different models at once into this framework.

  6. > Has my job throughout the last several years been completely useless (or even destructive) for society? That is how I personally feel sometimes. And if I am right, why should we fund any social science research at all?

    Terence Kealey has done some interesting research on this:

    > The British agricultural and industrial revolutions took place in the 18th and 19th centuries in the complete absence of the government funding of science. It simply wasn’t government policy. The British government only started to fund science because of [World War I​]. The funding has increased heavily ever since, and there has been absolutely no improvement in our underlying rate of economic growth.
    >
    > But the really fascinating example is the States, because it’s so stunningly abrupt. Until 1940 it was American government policy not to fund science. Then, bang, the American government goes from funding something like $20 million of basic science to $3,000 million, over the space of 10 or 15 years. I mean, it’s an unbelievable increase, which continues all the way to the present day. And underlying rates of economic growth in the States simply do not change. So these two historical bits of evidence are very, very powerful…
    >
    > It’s a myth that science is a public good. Science is constructed in “invisible colleges”–small groups of people who understand each individual discipline. So the number of people who can really understand the scientific papers is few. To become a member of this club, you have to pay a very high entrance fee. [The late] Ed Mansfield, an economist at the University of Pennsylvania, showed empirically that the average cost to one company of copying the science of another company is 70 percent. But it’s worse than that because you’ve also got to pay for the costs of information. The company has got to have enough scientists out there to read the papers, to read the patents, to go to the conferences, so that you actually know what people are discovering, so you know how to copy it. Add that to the 70 percent, and add the premium you pay in the scientist’s salary for all the training he’s gone into, and the costs of copying and the costs of doing things originally come out exactly equal. That’s in Mansfield, and others have shown this as well.

    And:

    > The big myth about scientific research is that government must fund it. The argument is that private companies will not fund science, especially pure science, for fear that their competitors will “capture” the fruits of that investment. Yet, in practice, companies fund pure science very generously, and government funding displaces private research money.
    >
    > Without government funding of science, the United States overtook Britain around 1890 as the richest country in the world.
    >
    > War changed everything. The National Academy of Sciences was created in 1863, at the height of the Civil War, to help build ironclads to beat the South. The Office of Scientific Research and Development, which ultimately spawned the National Science Foundation and the National Institutes of Health, was created in 1941.
    >
    > Then the USSR launched Sputnik, the first artificial satellite, in 1957. The Soviets were going to destroy us from space! So in 1958 the National Aeronautics and Space Administration was created, and the U.S. Congress passed the National Defense Education Act to pour money into higher education and science. Yet, remarkably, U.S. economic growth was unaffected. The U.S. per capita gross domestic product has grown at around 2 percent a year since 1820, and the government largesse of the last 50 years has not altered that.
    >
    > Further, government funding of university science is largely unproductive. When Edwin Mansfield surveyed 76 major American technology firms, he found that only around 3 percent of sales could not have been achieved “without substantial delay, in the absence of recent academic research.” Thus some 97 percent of commercially useful industrial technological development is, in practice, generated by in-house R&D. Academic science is of relatively small economic importance, and by funding it in public universities, governments are largely subsidizing predatory foreign companies.

    • Interesting thoughts. Right, there’s nothing better than letting big corporations take care of research. They usually know what’s better for us better than we do and are WAY more efficient. Their success is never founded on government subsidies and yet they’re so successful. Just get rid of all the unnecessary academia. Who need language studies, cultural anthropology, history, climate change research and philosophy anyways? You can’t sell it. And if it was needed companies could still do the research. And since the wouldn’t share it this would lead to a fruitful competition! The GDP would immediately grow close to exponentially if the government would stop wasting money. Why do we actually need the government when we got an economy….

    • Hm. I mean, this makes sense if you think the only valid reason to do science is to make money, I guess? But almost nobody actually thinks that, as far as I can tell.

      • From the standpoint of government funding – that is, expenditure of the citizens’ wealth – we should consider the overall benefit to the citizens.

        No doubt, science has generated many benefits, especially in medicine. But it has also produced some medical bombs. Despite decades of Federal funding, we apparently don’t know any more about nutrition now than we did in 1940, and it’s not clear if that was accurate. And while we have lots of tools to treat things like cancer and have saved many lives, we’ve frequently harmed people who would otherwise have been OK, with the treatment (e.g., prostate and breast cancer).

    • I don’t know enough about government science expenditures, and what they have done for us and for the world, to have a strong belief either way about the economic benefits. But I do want to point out that the claim that “the U.S. per capita gross domestic product has grown at around 2 percent a year since 1820” does not appear to be true. https://bfi.uchicago.edu/insight/chart/u-s-real-gdp-per-capita-1900-2017-current-economy-vs-historical-trendline/ shows (or claims to show) US real GDP per capita starting in 1900, and the growth rate from 1900-1929 was substantially slower than the rate from 1946 to 2010. (I I exclude 1930-1945 because of the Great Depression and WWII).

      I’m not making any claim or statement about why the growth rate was higher after WWII than before — I can think of several plausible reasons other than government funding of science, such societal changes that allowed women and blacks to contribute more to the economy for example.

      I also agree with anon e mouse, but that’s a separate issue.

      • There’s also the problem of measuring anything substantial with GDP, let alone measuring anything with the _speed of growth_ (yep, that’s a second derivative the quoted authors seem to be interested in) of GDP.

      • The claim of 2% GDP growth is shown here. The chart in this link extends back to 1880 and erases the 1900-1920 period of slower growth with a period of more rapid growth from ~1893-1900. The author notes:

        “As per Maddison’s GDP data, RGDP per capita growth in the US has been 1.76% on average since 1800. Taking the time series from 1870 would yield 1.95% instead. (Unless otherwise noted, all data below comes from that dataset)”

        • jim,
          I’m not sure what 1870 has to do with it. The claim by A Country Farmer is that ” Until 1940 it was American government policy not to fund science. Then, bang, the American government goes from funding something like $20 million of basic science to $3,000 million, over the space of 10 or 15 years. I mean, it’s an unbelievable increase, which continues all the way to the present day. And underlying rates of economic growth in the States simply do not change. So these two historical bits of evidence are very, very powerful…” so the relevant comparison is between the growth of GDP per capita prior to 1940 vs after 1940. On the plot of US GDP per capita in the website you pointed to — the second of the GDP plots — the growth rate is definitely higher after 1940 than before.

          As I said, I’m not making any claims about the causality, I’m just pointing out that one of ACF’s major factual claims — which he says is ‘very, very powerful’ — does not appear to be true.

        • jim –

          > Ha, nice link, we’re a little behind Sweden, but well ahead of the Progressive Canadian Utopia. Hilarious.

          But when you consider illness and deaths per capita. We’ve even passed Sweden on that metric.

        • Phil,

          “the growth rate is definitely higher after 1940 than before.”

          But:

          “Taking the time series from 1870 would yield 1.95%”

          The constancy of US GDP growth seems to be widely agreed among economists. There is even a text book reference to it on the page [Jones & Vollrath, Introduction to Economic Growth (2013)], so I guess you can argue it with the economists if you want.

        • jim,
          I don’t dispute that the time series from 1870 would yield 1.95%. I’m not saying I’ve checked it, but sure, could be.

          What I’m disputing is that the graph that you referred me to shows the same average growth rate after 1945 than before. It does not. I don’t need to argue with an economist or with anyone else. Perhaps you want to say that the graph is inaccurate, and if so that’s fine, you can say that, but then maybe you shouldn’t have pointed to it as evidence?

        • Phil. I really don’t know what you’re looking at. It’s smack on the line from 1918 onward sans depression / WWII. Prior to that it rises above the line and returns to it a few times. Shrug. It’s just longer term variation.

          Your statement that economic growth as shown on the chart in question is higher after 1945, wow, I just don’t know where you get that.

    • well, the primary issue originally raised here was “Government Economists” and their perhaps questionable value to society.

      but the comments quickly strayed to other topics about other economics, science/STEM, private-sector, government, issues.

      — so for simplification, can somebody specify any output from a government-economist that has significantly improved society generally and the daily lives of the general population ??

      (same question should be asked about the general economics profession over past 75 years, but that’s a broader topic)

      • “same question should be asked about the general economics profession over past 75 years”

        Point against economists: It is the dismal science and they’re almost always wrong. Point for economists: almost everyone of every political stripe intentionally misrepresents what economics actually says about their agenda.

    • While it’s possible that government funding of the social sciences might be useless, I don’t think that your Terence Kealey quote can really be taken very seriously. You might want to read ”Is War Necessary for Economic Growth: Military Procurement and Technology Development” by Vernon Ruttan. Perhaps, you could also read ”The Entrepreneurial State: Debunking Public vs. Private Sector Myths” by Mariana Mazzucato (certainly not nearly as good as Ruttan’s book, but some parts of it are nevertheless still worth reading).

  7. As to question #1 on implications of “having access to very large sample sizes in the analyses” I’d like to plug a paper (disclosure: I’m a co-author) that specifically addresses iteration between data analysis and theory (see, for example, Figure 1). The paper is written for an audience of Information Systems researchers (which is, broadly speaking, a sub-discipline of business/management research), but the discussion is relevant to anyone analyzing large archival data sets.

    “Revisiting IS research practice in the era of big data” by Steven L.Johnson, PeterGray, and SuprateekSarker
    https://doi.org/10.1016/j.infoandorg.2019.01.001

    Abstract
    Through building and testing theory, the practice of research animates data for human sense-making about the world. The IS field began in an era when research data was scarce; in today’s age of big data, it is now abundant. Yet, IS researchers often enact methodological assumptions developed in a time of data scarcity, and many remain uncertain how to systematically take advantage of new opportunities afforded by big data. How should we adapt our research norms, traditions, and practices to reflect newfound data abundance? How can we leverage the availability of big data to generate cumulative and generalizable knowledge claims that are robust to threats to validity? To date, IS academics have largely welcomed the arrival of big data as an overwhelmingly positive development. A common refrain in the discipline is: more data is great, IS researchers know all about data, and we are a well-positioned discipline to leverage big data in research and teaching. In our opinion, many benefits of big data will be realized only with a thoughtful understanding of the implications of big data availability and, increasingly, a deliberate shift in IS research practices. We advocate for a need to re-visit and extend traditional models that are commonly used to guide much of IS research. Based on our analysis, we propose a research approach that incorporates consideration of big data—and associated implications such as data abundance—into a classic approach to building and testing theory. We close our commentary by discussing the implications of this hybrid approach for the organization, execution, and evaluation of theory-informed research. Our recommendations on how to update one approach to IS research practice may have relevance to all theory-informed researchers who seek to leverage big data.

  8. I’d be somewhat more positive about the “exploratory” question. In fact, as others have stated before, I don’t think it’s a problem to run many analyses on the same data, and to learn from that (one thing to learn is in fact how unstable certain results are when using a different model or methods that are supposed to show more or less the same thing). It is exploratory, and it makes sense, as long as (a) results are appropriately interpreted and (b) it is clear from the beginning that this is the aim of the analyses.

    (b) is important. If originally the hope is to properly confirm a certain theory, but researchers are prepared to do something else and present results differently once they know that what they originally wanted to do didn’t give them the result they wanted, this attitude invalidates even a positive result found in the first attempt, because it changes the overall procedure and its characteristics, which are the basis of p-values and significance or coverage levels.

    Regarding (a), the original question was: “A small number of researchers that I have collaborated with argue (at least in private) that their research is mainly to be regarded as “exploratory” because of the stuff I have outlined above. Would simply stating that one’s research is “exploratory” in a paper be a half-decent excuse to do any of the p-hacking and other stuff outlined in my email?” Doing proper exploratory analysis means that from the beginning you know that p-values are not technically valid and that no p-value that arises can be interpreted as if the analysis had been confirmatory. This should to some extent take the incentive for p-hacking away. You can still do it (I think an interpretation of the kind “the data point more in this than in that direction” should be OK), but if you’re honest about the fact that you hacked, you will not pretend that any p-value means more than it actually does, and I don’t think an “excuse” is needed for this. (Whether this is then accepted in a top journal is of course a different matter.) Obviously if people tell you this just “in private”, this is not honest. They’ve got to say this loudly and clearly in their papers for making it OK in my view.

    Andrew’s linked paper “Honesty and Transparency Are Not Enough” actually says: “I am in favour of more post-publication review as part of a larger effort to make the publication process more continuous, so researchers can release preliminary and exploratory findings without the requirement that published results be presented as being certain”; this doesn’t sound as negative about exploratory analyses than his answer sounds. I’d also agree that bad work is still bad work even if you are honest, but trying out many things on the same data isn’t necessarily bad work if you don’t interpret results as if you hadn’t done what you actually did.

  9. “some supervisors explicitly and openly encouraged me and other students to exclude results that were not consistent with the overall “story” (we) tried to “tell”, with the motivation that ”this is only a M.Sc. thesis”. (…) And, as my experiences as RA suggest, it continues also after finishing a Ph.D.”
    Isnt this what also happend to “why we sleep” by Matthew Walker? That’s at least my interpretation, that he was trying to tell a story that he felt was true and well-grounded in evidence and didn’t bother to check all the facts (and consistency with non-agreeing facts) all that closely…
    I even see a more sympatic interpretation to this behavior where you could argue that it’s important to lay out a story in the clearest and simplest way possible and that all he ambiguities and shades in between will distract us from seeing the whole picture. Then again, this is so not what science is supposed to be like (even pop science books, though most fall into this trap). It could work in the form of “first present a simple version to see the whole picture, then bring all the messiness back in.”. And I don’t think leaving the story away would work, stories are important (reminds me of all the US death rate estimates Andrew put online and explained not publishing them elsewhere with “we don’t have any stories to go along with them”).

  10. Vale David Graeber:

    Econ Grad, why not cast an eye down Graeber’s checklist?:

    “The author contends that more than half of societal work is pointless, both large parts of some jobs and, as he describes, five types of entirely pointless jobs:

    * Flunkies, who serve to make their superiors feel important, e.g., receptionists, administrative assistants, door attendants
    * Goons, who oppose other goons hired by other companies, e.g., lobbyists, corporate lawyers, telemarketers, public relations specialists
    * Duct tapers, who temporarily fix problems that could be fixed permanently, e.g., programmers repairing shoddy code, airline desk staff who calm passengers whose bags do not arrive
    * Box tickers, who create the appearance that something useful is being done when it is not, e.g., survey administrators, in-house magazine journalists, corporate compliance officers
    * Taskmasters, who manage—or create extra work for—those who do not need it, e.g., middle management, leadership professionals”

    Ref: https://en.wikipedia.org/wiki/Bullshit_Jobs

  11. There’s an interesting discussion about this post on Econjobrumours:

    https://www.econjobrumors.com/topic/gelman-anonymous-econ-grad-student

    It seems that many economists have similar experiences as this anonymous grad student.

    However, perhaps the way the grad student framed his/her questions might have led some economists to misunderstand Andrew’s points.

    For example, many seem to have missed Andrew’s and Eric’s point that the multiple comparison problem can still be there even if ”regressions are run once” (i.e. “the garden of forking paths”):

    ”Researcher degrees of freedom can lead to a multiple comparisons problem, even in settings where researchers perform only a single analysis on their data. The problem is there can be a large number of potential comparisons when the details of data analysis are highly contingent on data, without the researcher having to perform any conscious procedure of fishing or examining multiple p-values.” *

    There’s also plenty of evidence out there that numerous economists (including Nobel prize winners) still fail to understand that the assumption that measurement error always reduces effect sizes is false. **

    Perhaps some discussion participants at Econjobrumours should also have read this part of Andrew’s answer twice:

    ”Take a perfectly fine, if noisy, experiment, run it through the statistical-significance filter (made worse by p-hacking, but often pretty bad even when only one analysis is done on your data), and you can end up with something close to a pile of random numbers. That’s not good for exploratory research either!”

    I’m not sure, however, why Andrew did not mention, for instance, multiverse analysis in his answers? Correct me if I’m wrong, but wouldn’t many of the problematic research practices listed by the grad student be less problematic if accompanied by dataset/model multiverse analyses as proposed by Andrew and others? ***

    In my (perhaps mistaken) view, multiverse analysis is a more useful solution compared to preanalysis plans when analyzing e.g. large administrative datasets (that the researcher herself/himself have usually already played with multiple times before when initiating any new research project).

    Footnotes:

    * http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

    ** http://www.stat.columbia.edu/~gelman/research/published/measurement.pdf

    *** http://www.stat.columbia.edu/~gelman/research/published/multiverse_published.pdf

  12. This post is currently discussed at Econjobrumours:

    https://www.econjobrumors.com/topic/gelman-anonymous-econ-grad-student

    Some economists seem to have experiences similar to the anonymous RA. Perhaps the framing of his/her questions might nevertheless have led some forum users to misunderstand some of Andrew’s points.

    For example, many seem to have missed Andrew’s and Eric’s point that the multiple comparison problem can still be there even if ”regressions are run once” (i.e. “the garden of forking paths”). *

    There’s also plenty of evidence out there that numerous economists (including Nobel prize winners) still fail to understand that the assumption that measurement error always reduces effect sizes is false. **

    Some Econjobrumours users may also benefit from reading this part of Andrew’s answer twice:

    ”Take a perfectly fine, if noisy, experiment, run it through the statistical-significance filter (made worse by p-hacking, but often pretty bad even when only one analysis is done on your data), and you can end up with something close to a pile of random numbers. That’s not good for exploratory research either!”

    I’m not sure, however, why Andrew’s answers did not mention, for instance, multiverse analysis? Correct me if I’m wrong, but wouldn’t many of the problematic research practices listed by the anonymous RA be less problematic if accompanied by dataset/model multiverse analyses of the type proposed by Andrew and others elsewhere? ***

    In my (perhaps mistaken) view, multiverse analysis might be a more useful solution, compared to preanalysis plans, when analyzing e.g. large administrative datasets (where the researcher herself/himself have usually already played with the data multiple times before when initiating any new research project).

    It would be interesting to hear Andrew’s comments on this.

    Footnotes:

    * http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

    ** http://www.stat.columbia.edu/~gelman/research/published/measurement.pdf

    *** http://www.stat.columbia.edu/~gelman/research/published/multiverse_published.pdf

  13. ”Yes, it’s fine to chase noise, but then you should chase noise in all directions, i.e., do a multiverse analysis. The mistake is when a researcher picks out one particular piece of noise and grabs on to it, while ignoring or downplaying all the other correlations in the data.”

    I like the above comment by Andrew below this post:

    https://statmodeling.stat.columbia.edu/2020/09/20/his-data-came-out-in-the-opposite-direction-of-his-hypothesis-how-to-report-this-in-the-publication/

Leave a Reply to Farhia Cancel reply

Your email address will not be published. Required fields are marked *