The more I thought about them, the less they seemed to be negative things, but appeared in the scenes as something completely new and productive

This is Jessica. My sabbatical year, which most recently had me in Berkeley CA,  is coming to an end. For the second time since August I was passing through Iowa. Here it is on the way out to California from Chicago and on the way back.

A park in Iowa in AugustA part in Iowa in November

If you squint (like, really really squint), you can see a bald eagle overhead in the second picture.

One association that Iowa always brings to mind for me is that Arthur Russell, the musician, grew up there. I have been a fan of Russell’s music for years, but somehow had missed Iowa Dream, released in 2019 (Russell died of AIDS in 1992, and most of his music has been released posthumously). So I listened to it while we were driving last week. 

Much of Iowa Dream is Russell doing acoustic and lofi music, which can be surprising if you’ve only heard his more heavily produced disco or minimalist pop. One song, called Barefoot in New York, is sort of an oddball track even amidst the genre blending that is typical of Russell. It’s probably not for everyone, but as soon as I heard it I wanted to experience it again. 

NPR called it “newfound city chaos” because Russell wrote it shortly after moving to New York, but there’s also something about the rhythm and minutae of the lyrics that kind of reminds me of research. The lyrics are tedious, but things keep moving like you’re headed towards something. The speech impediment evokes getting stuck at times and having to explore one’s way around the obstruction. Sometimes things get clear and the speaker concludes something. Then back to the details that may or may not add up to something important. There’s an audience of backup voices who are taking the speaker seriously and repeating bits of it, regardless of how inconsequential. There’s a sense of bumbling yet at the same time iterating repeatedly on something that may have started rough but becomes more refined. 

Then there’s this part:

I really wanted to show somehow how things deteriorate

Or how one bad thing leads to another

At first, there were plenty of things to point to

Lots of people, places, things, ideas

Turning to shit everywhere

I could describe these instances

But the more I thought about them

The less they seemed to be negative things

But appeared in the scenes as something completely new and productive

And I couldn’t talk about them in the same way

But I knew it was true that there really are

Dangerous crises

Occurring in many different places

But I was blind to them then

Once it was easy to find something to deplore

But now it’s even worse than before

I really like these lyrics, in part because they make me uncomfortable. On the one hand, the idea of wanting to criticize something, but losing the momentum as things become harder to dismiss closer up, seems opposite of how many realizations happen in research, where a few people start to notice problems with some conventional approach and then it becomes hard to let them go. The replication crisis is an obvious example, but this sort of thing happens all the time. In my own research, I’ve been in a phase where I’m finding it hard to unsee certain aspects of how problems are underspecified in my field, so some part of me can’t relate to everything seeming new and productive. 

But at the same time the idea of being won over by what is truly novel feels familiar when I think about the role of novelty in defining good research. I imagine this is true in all fields to some extent, but especially in computer science, there’s a constant tension around how important novelty is in determining what is worthy of attention. 

Sometimes novelty coincides with fundamentally new capabilities in a way that’s hard to ignore. The reference to potentially “dangerous crises” brings to mind the current cultural moment we’re having with massive deep learning models for images and text. For anyone coming from a more classical stats background, it can seem easy to want to dismiss throwing huge amounts of unlabeled data at too-massive-and-ensembled-to-analyze models as a serious endeavor… how does one hand off a model for deployment if they can’t explain what it’s doing? How do we ensure it’s not learning spurious cues, or generating mostly racist or sexist garbage? But the performance improvements of deep neural nets on some tasks in the last 5 to 10 years is hard to ignore, and phenomena like how deep nets can perfectly interpolate the training data but still not overfit, or learn intermediate representations that align with ground truth even when fed bad labels, makes it hard to imagine dismissing them as a waste of our collective time. Other areas, like visualization, or databases, start to seem quaint and traditional. And then there’s quantum computing, where the consensus in CS departments seems to be that we’re going all in regardless of how many years it may still be until its broadly usable. Because who doesn’t like trying to get their head around entanglement? It’s all so exotic and different.

I think many people gravitate to computer science precisely because of the emphasis on newness and creating things, which can be refreshing compared to fields where the modal contribution it to analyze rather than invent. We aren’t chained to the past the way many other fields seem to be. It can also be easier to do research in such an environment, because there’s less worry about treading on ground that’s already been covered.

But there’s been pushback about requiring reviewers to explicitly factor novelty into their judgments about research importance or quality, like by including a seperate ranking for “originality” in a review form like we do in some visualization venues. It does seem obvious that including statements like “We are first to …” in the introduction of our papers as if this entitles us to publication doesn’t really make the work better. In fact, often the statements are wrong, at least in some areas of CS research where there’s a myopic tendency to forget about all but the classic papers and what you saw get presented in the last couple years. And I always cringe a little when I see simplistic motiations in research papers like, no one has ever has looked at this exact combination (of visualization, form of analysis etc) yet. As if we are absolved of having to consider the importance of a problem in the world when we decide what to work on.

The question would seem to be how being oriented toward appreciating certain kinds of novelty, like an ability to do something we couldn’t do before, affects the kinds of questions we ask, and how deep we go in any given direction over the longer term. Novelty can come from looking at old things in new ways, for example developing models or abstractions that relate previous approaches or results. But these examples don’t always evoke novelty in the same way that examples of striking out in brand new directions do, like asking about augmented reality, or multiple devices, or fairness, or accessibility, in an area where previously we didn’t think about those concerns much.

If a problem is suddenly realized to be important, and the general consensus is that ignoring it before was a major oversight, then its hard to argue we should not set out to study the new thing. But a challenge is that if we are always pursuing some new direction, we get islands of topics that are hard to relate to one another. It’s useful for building careers, I guess, to be able to relatively easily invent a new problem or topic and study it in a few papers then move on. And I think it’s easy to feel like progress is being made when you look around at all the new things being explored. There’s a temptation I think to assume that  it will all “work itself out” if we explore all the shiny new things that catch our eye, because those that are actually important will in the end get the most attention. 

But beyond not being to easily relate topics to one another, a problem with expanding, at all times, in all directions at once, would seem to be that no particular endeavor is likely to be robust, because there’s always an excitement about moving to the next new thing rather than refining the old one. Maybe all the trendy new things distract from foundational problems, like a lack of theory to motivate advances in many areas, or sloppy use of statistics. The perception of originality and creativity certainly seem better at inspiring people than obsessing over being correct.

Barefoot in NY ends with a line about how, after having asked whether it was in “our best interest” to present this particular type of music, the narrator went ahead and did it, “and now, it’s even worse than before.” It’s not clear what’s worse than before, but it captures the sort of committment to rapid exploration, even if we’re not yet sure how important the new things are, that causes this tension. 

10 thoughts on “The more I thought about them, the less they seemed to be negative things, but appeared in the scenes as something completely new and productive

  1. Jessica:

    Wonderful post, the sort that makes me wish we had only 50 posts per year instead of 500 so that people would spend more time chewing over each one . . .

    Here are a couple of thoughts inspired by your post:

    1. I feel like the requirement or expectation of “novelty” should be replaced by something like this:
    a. Explain how your idea or results are different from what came before.
    b. Explain why, if your idea or results are new, nobody went to the trouble of doing this already. (This is not a rhetorical question on my part. Possible reasonable answers here include: new data, new technology, an applied problem that could not be solved using existing approaches, porting an existing method from a different field, . . .)
    c. Give some discussion of why the new approach works better, and—especially important—where the new approach would not be expected to perform well.

    My point here is that, yeah, any publication should have some novelty or else simply be labeled as a reprint or as a routine application of an existing method. Novelty should not be considered as a goal or a hoop to jump through (e.g., “This paper does not have enough novelty to be published in this super-important journal.”). Rather, novelty, like correctness of results, is a minimal requirement, and what the paper should really do is answer questions a, b, and c above.

    2. I get what you’re saying about the amazingness of machine learning methods and the need for statisticians to integrate this into their mindset. You mention research on data visualization, which as you say seems to be chugging along at a steady 5 miles per hour even while machine learning is moving at warp speed. Something similar could be said about traditional Bayesian statistics such as what I do. Back in 2013-2014, my colleagues and I fit some cool models in Stan that people are still talking about: postratification for the Xbox survey, custom nonlinear model for golf putting, and an item-response-style model for the World Cup. Now in 2022, yes, Stan has made lots of improvements and is much faster and more reliable, but we’re still fitting models of similar complexity, constructed by hand. Meanwhile, machine learning has moved at warp speed. I guess things are even worse for classical theoretical statistics, where concepts such as sufficiency, the strong law of large numbers, etc., are more pointless than ever (except for getting journal publications, academic jobs, etc.).

    That all said, the concepts of visualization, Bayesian inference, and classical statistics seem as relevant as they have ever been, given that researchers continue to visualize their results in misleading or uninformative ways, researchers continue to publish results that contradict prior knowledge and fail to integrate inferences from different sources, and researchers continue to try to estimate quantities that are not identified from their data and models. So, yeah, statistical ideas remain important. But all this still makes me a bit uncomfortable, to be the traditionalist making a “We are the Greece to your Rome” kind of claim in a kind of last-ditch attempt to remain relevant.

    3. You write, “I think many people gravitate to computer science precisely because of the emphasis on newness and creating things.” To me, a big appeal of computer science is more the “creating things” than the “newness.” Maybe I say this because I’m taking “newness” for granted. Let me put it this way: if I could easily build a Rube Goldberg-style machine—you know, one of those things with marbles and pulleys and gears that takes up a whole room, where on one end of the room you flip a switch and then a million fun things happen and you end up with a cup of coffee at the end—that would be really cool. Heck, an actual coffee-maker is cool. But to build such things requires lots of machinery, right here. With computers, I can build something cool (for example, I dunno, a simulation of a population of agents circulating in a social network and negotiating with each other), right away, using Python or whatever other language would serve the purpose. This sort of power seems cool: it’s building something.

    I guess that statistics also gives us power, but of a different sort: rather than power to build, it’s power to understand. Visualization, design, data analysis: all of these are about understanding the world. It’s because I know statistics that I was able to understand what went wrong with that claim that losing an election for governor was costing people 5-10 years of their lives. Part of this is that if you know statistics you will be less intimidated by claims that come attached to statistical analysis, but part of it was at the more technical level of being able to see what was missing from the analysis and understand how this related to the mistaken claims.

    • I like the idea of novelty as a minimum requirement, and especially the question: Explain why, if your idea or results are new, nobody went to the trouble of doing this already.

      And agree, Bayesian stats and uncertainty visualization are very relevant conceptually, though it can feel like we are simply standing at the sidelines waiting for things like deep learning to come along and give us new visualization problems. This is the annoying part in visualization (but perhaps not all bad if you like following the latest advances) – the dependence on other fields to provide the content, and, in cases like visualization for ML interpretability, to even tell us when the techniques are reliable versus not.

      I also agree, its building things that excites people about computer science, and being drawn to technology, which is sort of by definition most exciting when it changes. As I’ve gotten more senior though, I care less about creating things and more about deepening understanding, but it can feel like understanding is only tolerated to a point in some fields before you start to seem obsessed.

      That claim about losing an election shortening one’s life reminds me of a conversation I overhead recently about how some organization in Italy adjusts your life expectancy by several years based on your response to a multiple choice question about how frequently you drive.

      • I don’t get why you are reminded: about 1% of people who drive end up dead due to “accidents”. If you insist on driving a lot, that goes up. This is just folks getting things right.

        The causality in being bumbed out over losing an election reducing life expectancy, on the other hand, is more subtle…

        • A couple years because you drive seemed like an overadjustment to me (car accidents kill a lot of people, but still). Though its possible that level of adjustment was for people who drive more than some number of miles or amount of time per day.

  2. I hope you’ll continue to post such thoughtful takes once your sabbatical ends. You’ve been an amazing contributor to this blog, and I just wanted to say thanks!

    I find it interesting that mentality in industry with respect to novelty tends to be on the opposite end of the spectrum. We tend to think that someone has solved our issue before (probably 50+ years ago) and likely has done so in a superior way. In other words, there is a tendency to believe very little we do is new, and we should seek out the solutions of others before venturing down our own misguided road. This mentality seems to hold in the corners that I’ve worked in at least with respect to technical work (though when it comes to PR the academic way tends to be favoured).

    While it has its drawbacks, I think the industry mentality is somewhat healthier than having to present to others (or worse yet being deluded into actually believing) that everything one does is novel in a non-trivial sense. Such a view can significantly hamper the progress of science by reducing the emphasis on replication for instance. Similarly, there is tremendous value in the dissemination of important ideas / knowledge simply by the fact they reach a broader audience, even if there is no novelty attached. This very blog serves such a purpose at times by introducing people (myself included) to the works of Meehl and company.

    I would be hard pressed to demonstrate this but it’s my feeling that we would do better if it was permissible for academics to go “I’m doing pretty average work cleaning up certain aspects of theories generated by others. Sometimes I generate my own, but I am rather unsure of their verisimilitude.” When everyone must present themselves as a genius doing industry leading work with theories/ideas beyond reproach too many resources are devoted to the lie rather than the substance of what actually gets done (because of course not everyone can be exceptional).

    You attribute this phenomenon to the excitement of it all, which is no doubt true to a large degree (who doesn’t want to be the first….who was the second person on the moon again?), but I also think at least some of it is institutionalized in the way universities are funded. You just are not likely to get the funds by writing on the grant “this is pretty routine stuff but will add another data point for a theory we are not 100% certain about, so I feel its worth doing”. Though I may be wrong in my impression that this is has a significant impact because the funding aspect is not all that dissimilar from industry (read: $$ from clients) , and yet the mentality is still rather different. As a fun example, in one of his lectures Dr. Joseph Lstiburek talks about how in the vast majority of window water penetration tests one can determine all the relevant info from spraying a garden hose for a few minutes. But instead we use calibrated spray racks with fancy vacuum pressure chambers and yadda yada conforming to specific ASTM standards…the reason he gives is that “if you don’t put on the show you don’t get the dough”. And he’s not wrong.

    • Oh man, that takes me back to 2007 or so, watching that calibrated window rack spray show at field test sites… If a single drip of water appeared at the seal after 9 minutes and 59 seconds it failed, if it lasted 10 minutes… well nothing wrong with that… (or whatever… maybe it’s 15 or 20 mins, I don’t remember). I’m sure there’s good to come out of testing some windows, but you’re unlikely to convince me that in Palm Springs desert a volume of water sprayed onto a window that exceeds the volume in some of the heaviest rainforests in the world causing a single drip after 10 mins is actually a fact to be concerned about. There were “technical failures” like that all the time, but then there were also cases like the 3rd story deck in an apartment complex in San Jose being held up by an 8×8 wood column that I could easily push an awl through with two fingers. The Architect had specified that the gutters should drain onto the 3rd story walkway deck instead of being carried down pipes to the ground, and leaks into the column stucco casing were held there indefinitely by the waterproofing preventing any sort of air exchange. The column had been wet continuously for ~10 years since the first time it rained..

      The difference between technical failures and “holy shit red tag this building today and move everyone out by the end of the week after we install temporary supports” is major. Distracting attention from the latter is not good for the world. But, if you don’t put on the show you don’t get the dough…

    • Thanks, glad you appreciate the posts!

      When it comes to academics admitting where they are cleaning up or using aspects of prior work, we are technically supposed to be good at stating contributions clearly.. So if you made some system but chunks of it draw on prior techniques then you should make clear which techniques or abstractions you had to come up with in the abstract or introduction of your paper. But what is a contribution versus not can be interpreted differently by people and there’s still the incentive to make it seem like you contributed more than you did. So, I agree that part of this ‘novelty industry’ comes from the incentive structure.

  3. I appreciate this post a lot. It reminds me of a comment Brad Efron (I think) made in a talk several years ago warning statisticians about the danger of ignoring practical usefulness to the point of irrelevance, as mathematicians had done over the previous several decades. (Full disclosure: I’m a recovering mathematician myself, having long ago fled the field for more or less this reason.)

    What I think is missing from a lot of valuations of research is the importance of contributing to long-term knowledge digestion. By this, I mean the process by which some cutting-edge idea or tool becomes ever-more refined over the years, to the point where decent faculty eventually becomes accessible to self-teachers, undergraduates, or maybe even high schoolers. A lot of hugely valuable work gets done in this area. I’d count BDA3, Regression and Other Stories, and a lot of graduate and undergraduate textbooks and research monographs as providing this kind of value (especially those that work hard to assimilate knowledge with clear themes and strongly held views on what’s important — straight-up expositions can have value but I don’t include them in this category). I’d also include the Stan project itself as a great example, along with Wickham’s tidyverse stuff, Wilkinson’s gg stuff, to name just a few that spring to mind.

    My point is that for fairly mature fields, work that helps digest existing knowledge and expand accessibility is often much more valuable than work that throws another knick-knack on the pile of stuff humans have looked at to date.

    • Agreed. This kind of work can seem much less novel and less prestigious to researchers but have way more impact. One has to convince oneself that the challenge of figuring out how to express the concepts elegantly and effectively is an interesting problem in its own right. I’m very appreciative of the Andrews, the McElreaths, and others who invest the time to do this!

  4. > For anyone coming from a more classical stats background, it can seem easy to want to dismiss throwing huge amounts of unlabeled data at too-massive-and-ensembled-to-analyze models as a serious endeavor… how does one hand off a model for deployment if they can’t explain what it’s doing? How do we ensure it’s not learning spurious cues, or generating mostly racist or sexist garbage?

    This is tangential to the main theme (new vs. old), but this bugs me. I just don’t think it’s good to frame these problems in a way that anyone (classical statisticians included) throw up their hands by default. Andrew has a more detailed post on the frequentist thing today that I think is related.

    Handing off a model for deployment — how do you do it for an unexplainable model? My expectation is you’d do it the same as for an explainable model. The model solves some problem; that is part of a bigger system; you need to test that end to end and evaluate it (more models, and a lot of assumptions, and a lot of specifics).

Leave a Reply

Your email address will not be published. Required fields are marked *