Here’s some academic advice for you: Never put your name on a paper you haven’t read.

Success has many fathers, but failure is an orphan.

Jonathan Falk points to this news article by Tom Bartlett which has this hilarious bit:

What at first had appeared to be a landmark study . . . seemed more like an embarrassment . . .

[The second-to-last author of the paper,] Armando Solar-Lezama, a professor in the electrical-engineering and computer-science department at MIT and associate director of the university’s computer-science and artificial-intelligence laboratory, says he didn’t realize that the paper was going to be posted as a preprint. . . .

The driving force behind the paper, according to Solar-Lezama and other co-authors, was Iddo Drori, [the last author of the paper and] an associate professor of the practice of computer science at Boston University. . . . The two usually met once a week or so. . . .

Solar-Lezama says he was unaware of the sentence in the abstract that claimed ChatGPT could master MIT’s courses. “There was sloppy methodology that went into making a wild research claim,” he says. While he says he never signed off on the paper being posted, Drori insisted when they later spoke about the situation that Solar-Lezama had, in fact, signed off. . . .

Solar-Lezama and two other MIT professors who were co-authors on the paper put out a statement insisting that they hadn’t approved the paper’s posting . . . Drori didn’t agree to an interview for this story, but he did email a 500-word statement providing a timeline of how and when he says the paper was prepared and posted online. In that statement, Drori writes that “we all took active part in preparing and editing the paper” . . . The revised version doesn’t appear to be available online and the original version has been withdrawn. . . .

This reminds me of a piece of advice that someone once gave me: Never put your name on a paper you haven’t read.

9 thoughts on “Here’s some academic advice for you: Never put your name on a paper you haven’t read.

  1. “Never put your name on a paper you haven’t read.”

    Going a step further, never trust that just because it seems self-evident to you that a major flaw that you bring to the lead author’s attention requires revising before the paper is published, it will actually get fixed before the camera ready version of the paper is submitted. I learned this the hard way recently.

  2. I presume that many of us have been noodling around with LLM chatbots for the last year or so. Based upon my experience, it seems inconceivable that an LLM chatbot is capable of passing any kind of technical class at MIT. I can’t really get them to do anything substantial.

    I have gotten some help from them. On two occasions, the Google search function failed miserably on easy questions. One was “where is the Superbowl going to be played?” The first three hits were info boxes about the game, none of which gave the location of the game. But Bing Copilot recognized the question – Google clearly didn’t – and answered it correctly.

    The other question was about the Microsoft clipboard. A recent software update eliminated the pop-up dialog box that appears after a screen grab, with the image going straight to the clipboard. That’s fine, except I had no idea how to find and open the clipboard. This is the exact sort of thing that “help” functions choke on, and so did Google. Every answer on the first page told me how to open the clipboard control box in settings, with no information on how to open the clipboard itself. Copilot immediately recognized the exact same question and told me how to open it in the first sentence.

    So AI LLM chatbots are a step forward in terms of understanding the question, at least. But I would not call these responses “generative.”

    I also tried something more challenging but got nothing. Baseball “prospects” are rated as to their Future Value (FV) on a 20-80 scale. I’ve been looking at these for years as part of fantasy baseball, but I never really understood some basic things about them, such as whether, when, and how the ratings change during the time a player is in the minors. This is the kind of question to which you can’t just go find the answer because no one has anticipated the question. This despite the fact that there are pages and pages of explainers for prospect FV ratings on the net. Here the chatbot had nothing for me. It began its response by defining some of the terms I had just fed it, then it wished me good luck in fantasy baseball season. The only satisfactory way to answer would have been truly generative – piecing together information from multiple sources and combining it all in one (simple) logic structure – and it couldn’t do it.

    Other than writing computer programs, which does seem genuinely generative, I’d be curious if anyone has had better experiences in similar situations.

    • Matt:

      Bob Carpenter reports that a chatbot was really helpful for him in writing scenarios for role-playing games. This seems like something that’s in the zone for chatbots: there’s lots of material online, the task is generative, and you want something that’s unique but kinda different from what came before. You can see how producers of pop-culture products (movies, TV shows, literature, music, etc.) would be excited about chatbots for the same reason.

      • Interesting plots! I have also read parts of the study. The design of the study seems to me to be well done (compared to what is possible in a social science RCT). They did some statistical analyses that I am not familiar with, so I explicitly exclude them from my review.
        I disagree with some of the strong language about the generalisability of the study in the results section. A group of university students recruited in an RCT does not generalise to the whole population.
        Something that was not discussed in the paper is the effect of the incentive structure on the outcome. The implicit assumption that a lump sum (flat) payment would not affect the results is far too optimistic. There is a quality-time trade-off, and the incentive structure will tilt the choice between quality and less time spent towards less time spent.
        Although I may sound quite negative, the results seem plausible to me (compared to my prior beliefs) and are as supported by the data as one might reasonably expect. (Reasonable expectations in this context being that no single study can prove a hypothesis beyond reasonable doubt).

    • Twos:

      Typo fixed; thanks.

      As to how it happened: I write these posts by typing into the text box, then I go back and do some editing. Sometimes the editing introduces an error. Perhaps I first wrote, “I’m reminded of” and then I decided that “This reminds me” worked better, but when making the edit I didn’t change all the words.

      Writing-and-then-editing introduces an entirely different sort of typo that would never have occurred in the old days of typing straight from the typewriter onto the page.

  3. > claimed ChatGPT (LLM) could master MIT’s courses

    The target is a moving goal post, especially once the assertion is made, future MIT exams will be exploiting the weakness of LLM. Current LLM weakness like arithmetic, complex logic, comprehending charts or latest unusual discoveries not in training datasets. It will be interesting to see the MIT EECS UNDERGRADUATES performance on those exams. It is also pitting the finite size MIT faculty against ALL THE GLOBAL EXPERTS who will developing LLM training datasets that speculating on the potential directions of the MIT faculty. Also that LLM accumulated performance will improve over time whereas MIT will have new batches of undergraduates with about the same quality every year.

    https://twitter.com/dux_ie/status/1759745462784147767

  4. I had a colleague who faced an accusation of data fabrication. Ultimately, the study at issue was found to have been certainly fraudulent (someone, probably the first author who was not my colleague, faked data) as there were clearly fake data points added to manipulate significance tests, but the responsibility was deemed not clearly my colleague’s. This colleague faced some sanction for the negligence that allowed it to happen. As I understand it, some other studies co-authored by my colleague faced further scrutiny and at least one was retracted for implausible data/findings. Despite the common denominator, I still believe the colleague was probably not knowingly involved in fraud.

    What the colleague (probably) was guilty of is what Andrew discusses here: putting one’s name on a paper they haven’t read and likely had too little involvement with prior to the final draft as well. Colleague was quite prolific and tended to have large teams of co-authors. I think those situations are somewhat paradoxically higher risk to have fraud. Large team means things will be delegated and of course more people means higher chance of a dishonest person being involved. And if you’re publishing a new paper once a week or something like that, it’s just unlikely that you’re sufficiently involved in all the necessary steps to know with certainty how things were done.

    And as we see here, the risk is not just that someone on your team does something fraudulent. They might just make some claims or other decisions that are so scientifically wrong that you want nothing to do with them!

Leave a Reply

Your email address will not be published. Required fields are marked *