When is detecting AI-generated text worthwhile?

Posted on June 6, 2026 4:04 PM by Jessica Hullman

This is Jessica. AI-text detectors are coming to play a bigger role in adjudicating what texts are worthy of our attention. There was the surprising case of an apparently AI-generated short story winning the Commonwealth Foundation Short Story Prize, which returns 100% AI generated by Pangram, the leading detector whose false positive rate is reported as roughly 1 in 10,000 in its own audits and near zero on medium-to-long passages in an external audit. Applying Pangram to the other 4 stories that won awards this year suggests two others were heavily AI-assisted. More recently, the NeurIPS Position Paper track announced that it was desk rejecting 18% of submitted papers that were detected by Pangram as fully AI-generated. Another 13% are getting followed up on with the authors to investigate AI use. In this case the Call for Papers made clear that submissions should be “substantially written by human authors,” so this should not have come as a surprise.

We’re having to reconsider what authorship means. Can a person create literature or express their position on a subject without writing a single sentence themselves? When do we really care who strung the words together?

Some people think detection is a waste of our collective time because we will never reach an equilibrium. AI-generated text will keep shifting toward what passes the detector. Human writers will continually update their beliefs about what features are indicative of AI-writing, but will also be influenced to write more like AI by reading so much AI text. There’s no stable target, just an endless cat and mouse game that incentivizes being savvy enough at any given time to avoid getting flagged. Meanwhile people are being morally scorned and suffering reputational damage for being caught on the wrong side of things. This may disproportionately affect some writers (like non-native english speakers) who are finally seeing the playing field leveled a bit.

On the other hand, there are situations where it really is important to know who strung the words together. Education is the most obvious one. It’s just very hard to teach someone to think if they’re not writing down their ideas themselves.

The problem is that outside of select scenarios like teaching, what we really tend to care about is who controlled the ideas, and this is not equivalent to who strung the words together. Some would argue that the latter is becoming increasingly irrelevant given that AI can write more fluently than many people and many people prefer AI-generated text.

Of course the reason we’re seeing detection used to filter paper submissions is because the ideal process–where the content of each paper is carefully considered on its own merits–is increasingly untenable given the huge surge in submissions in some fields. It’s easy to pump out credible-seeming papers with minimal human oversight using AI, and enough people are doing this to create serious problems.

Mostly my response is that if we are going to debate the value of detection we should be willing to make our assumptions explicit. So let’s walk through a toy model to think about what we’re really conjecturing about.

One way to think of the latent state that we actually care about in paper review is the author type. Let’s say type A authors come up with their ideas and do a lot of the writing themselves. Type B authors rely on AI to do much of the thinking for them, and also use AI to do much of the writing. Type C authors come up with their own ideas, but engage in extensive prompting to get AI to write everything they want to say for them.*

For each paper, we choose to either pass or reject, conditional on the output of a Pangram check. Let’s say we only care about whether it flags 100% AI generated or not, so the signal s is binary, where s=1 means AI detected.

Based on available Pangram audits, if a text is actually written heavily by AI there is a very high chance it flags as AI-generated: beta=P(s=1|AI written) with beta very close to 1. If a text is not written by AI, there is a very small chance it flags as AI-generated: alpha=P(s=1|human written). Pangram’s internal audits put alpha around 10^−4 but other audits find essentially zero false positives for medium-to-long passages.

So P(s=1| A)=alpha, and if we assume Types B and C use AI to a similar extent for the writing, then \beta=P(s=1|B) = P(s=1|C). The posterior probability that a flagged paper is from a Type B author is then:

P(B|s=1) = (beta × p_B)/(alpha × p_A + beta × p_B + beta × p_C), and since alpha is tiny and beta is close to 1, P(B|s=1) ≈ p_B/(p_B + p_C)

The relevant considerations become what we think the author population looks like, and how costly we think a false positive versus a false negative are.

As a starting point, let’s say that for our conference submissions this year, Type C is the rarest, at 20%, and Type A and Type B equally split the remaining mass at 40% each. Let’s also say that we consider rejecting an acceptable paper, c_FP, to be twice as bad as passing an unacceptable one c_FN.

The optimal decision rule is to reject if c_FN * P(B|s=1)>c_FP * P(A or C|s=1), or equivalently P(B|s=1)>c_FP/(c_FN+c_FP)

With c_FP=2 and c_FN=1, this means we reject if P(B|s=1) > 2/3.

Under the prevalence assumptions above, P(B|s=1) is approximately 2/3, so we are right on the boundary. From the standpoint of making the right decisions for this particular conference cycle, it’s not obviously bad. But if Type C is a little more common, e.g., we shift a little mass from p_A to p_C to make p_C 0.25, then P(B|s=1) is 0.62, then we shouldn’t desk reject only based on the flag. Similarly if we were to decide that falsely rejecting an acceptable paper is three times as bad as passing an unacceptable one, we shouldn’t rely on it alone.

This model is obviously very simple. But it shows us what kinds of things we have to make assumptions about in the most basic case. Obviously I don’t really know how many people are using AI blindly to write papers, nor how many people are relying heavily on AI to write up their own ideas. You should take my numbers with a grain of salt. Personally I can’t imagine how relying on AI to do all the writing when I came up with the ideas would ever feel efficient, because I tend to have strong opinions on how things are said. But I can accept I am probably more of a control freak than many others. And AI overreliance is easy to slip into. Maybe papers chairs from recent ML conferences (or arXiv moderators) have estimates on bad-actor rates based on what they are seeing.

What this exercise can’t tell us is how scientific progress is impacted by the warping of incentives that can happen when we use AI-detection as a filter. Classic principal-agent problems suggest that when we care about something hard to observe—like scientific quality or long-term epistemic value—but must rely on observable proxy signals to judge authors’ outputs, we should expect authors to shift more effort toward improving exclusively on those proxies. Avoiding m-dashes and ‘not this, but this’ constructions and whatever else currently ups the posterior probability of AI-generation is orthogonal to the actual thinking that research requires. What if relying more heavily on AI to write up our ideas is a good idea for science in the long run, in terms of more clearly communicating the ideas or saving a lot of time, so that we can get more good ideas out in the same amount of time? Then too much emphasis on detection might slow us down. However, I’m doubtful we are currently anywhere near a state of the world where discouraging writing with AI is as costly for scientific progress as spending time reviewing and reading many more questionable AI-generated papers is. The bigger threat at the moment is the slop overwhelming our ability to find the good stuff.

*We could also posit Type D authors that get AI to generate the ideas, but then write the papers themselves to evade detection, or are extremely good at getting AI-written text to evade detection. But this seems much less likely so I’m ignoring it.

38 thoughts on “When is detecting AI-generated text worthwhile?”

Total on June 6, 2026 4:23 PM at 4:23 pm said:

“ Meanwhile people are being morally scorned and suffering reputational damage for being caught on the wrong side of things. This may disproportionately affect some writers (like non-native english speakers) who are finally seeing the playing field leveled a bit. ”

Excuse me? That’s a mealy mouthed way of endorsing cheating and lying.

Reply ↓
- Jessica Hullman on June 6, 2026 4:40 PM at 4:40 pm said:
  
  That paragraph is summarizing anti-detection views I’m seeing (note the topic sentence), not necessarily what I think we should be most worried about. Sone people feel strongly that AI-detection is just the latest form of academic moral panic.
  
  Reply ↓
Joshua on June 6, 2026 4:46 PM at 4:46 pm said:

Maybe a bit tangential, or perhaps not…

I’ve worked with students from other countries who don’t really quite get the American focus on plagiarism. What does it matter, exactly, who authored some thoughts in a paper? Isn’t the truly important thing whether or not the student has identified the correct answers or critical concepts?

Of course the issues you’re addressing are more complicated than that, but there is something important there thsr overlaps with your post, I think: Who owns ideas anyway? If some traditions treat writing as a craft separate from idea generation, or see (ownership of) idea generation as less important, then the distinction between Type A and Type C authors may itself be somewhat culturally specific.

Is the American (or academic) concept of owning ideas substantially a cultural artifact, perhaps an artifact of Western, or capitalist values?

Reply ↓
- Hannes on June 6, 2026 5:54 PM at 5:54 pm said:
  
  I don’t think plagiarism is about who came up with an idea, it’s more narrowly about stealing text verbatim or closely paraphrasing.
  
  Reply ↓
  - Joshua on June 6, 2026 9:33 PM at 9:33 pm said:
    
    Hannes –
    
    Fair enough. It is usually considered in the context of someone intentionally passing off someone else’s work as their own.
    
    But sometimes it’s actually more like a mistake made out of a different set of values, where ownership of ideas or phrasing is not considered important.
    
    Outside the context of education, where we place a specific value on students working out ideas in their own heads (not something that is a universal higher-order value, that’s also something that can be culturally-dependent), and where we believe writing is integral to working out ideas, and where we use writing to check on a student’s mastery, why do we care about who wrote something?
    
    Reply ↓
    - David, a Bostonian in Tokyo on June 7, 2026 9:11 AM at 9:11 am said:
      
      In education, in particular creative writing, handing in an AI generated paper should be considered plagarism. No one would, or could, argue against this.
      
      In other cases, should it be considered plagarism? I’d say 100% yes. You put your name on a work “someone else” wrote. IMHO this whole issue is as friggin as simple as it gets.
      
      Personally, I really hate the impersonal, schmarmy, overly-confident AI style. So I think people shouldn’t be forced to read that slish. Hooray for AI slop detectors. They make people’s lives better.
      
      How to handle a paper that an AI slop detector tells you is slop? Return it to the author, and tell them to rewrite so that it reflects their personal experiences in doing the work they did. Don’t publicise that the idiot handed in AI slop, just tell them to rewrite in a manner that doesn’t set off the slop detector.
      
      Now what happens when people learn how to rewrite AI slop so it’s personal, heart-felt, and serious? This is good news. They’ll learn to write like a human being. And soon won’t need the slop to start with.
- AAAnonymous on June 7, 2026 3:29 AM at 3:29 am said:
  
  As a result of the blog post and this specific comment, I would like to add the following thoughts. I expressed something related to this all, at least in my view and interpretation, in a discussion about the bias-bias where I noted I used that term before someone officially used it in a paper (or whatever description is more appropriate):
  
  https://statmodeling.stat.columbia.edu/2019/07/14/gigerenzer-the-bias-bias-in-behavioral-economics-including-discussion-of-political-implications/#comment-1081670
  
  I noted the following in the discussion on that previous blog post which I still think is extremely important to note, and possibly ponder for some, in view of intellectual ownership or plagiarism or whatever term is appropriate:
  
  “I personally don’t care about “credit” because i don’t want a “career” in science/academia anymore, but i DO think it’s very important to give credit to the correct people in science/academia in general. I think this might also be very important in science/academia because i reason the chances are higher that that person will also have other good ideas/thoughts/etc. In my reasoning, and view of science, this could all be an important part of making sure the “best” and “brightest” people are working in science/academia.”
  
  Reply ↓
  - Jukka on June 7, 2026 5:48 AM at 5:48 am said:
    
    If you replace mathematics with something else, I suppose many things would still apply with slight alterations:
    
    https://leidendeclaration.ai/
    
    Reply ↓
    - AAAnonymous on June 7, 2026 6:56 AM at 6:56 am said:
      
      I have previously mentioned on this blog that I am pondering writing a declaration about groups of scientists writing declarations. I don’t like groups of people writing stuff and shoving it in my face and making it seem like they have something useful or important to say.
      
      I am more a fan of scientists just writing stuff, and posting it somewhere, and letting others find it and deciding whether it’s worth reading or further pondering. That’s also why I don’t intend to actually write a declaration about groups of scientists writing declaration, and possibly gathering all of my scientific buddies to join my efforts. I merely use the idea to make something clear.
    - Jukka on June 7, 2026 7:16 AM at 7:16 am said:
      
      “That’s also why I don’t intend to actually write a declaration about groups of scientists writing declaration, and possibly gathering all of my scientific buddies to join my efforts. I merely use the idea to make something clear.”
      
      I agree with you, but, nevertheless, did you actually read the declaration? The issue is much bigger because with LLMs we are losing not only attribution but also traceability.
    - AAAnonymous on June 7, 2026 7:44 AM at 7:44 am said:
      
      Quote from above: “I agree with you, but, nevertheless, did you actually read the declaration?”
      
      I quickly browsed through it because you linked to it. I did not see how it related to my comment, but most importantly I don’t know anything about A.I., I don’t want to learn more about A.I. at this moment, and I dislike groups of scientists declaring stuff so I stopped reading pretty quickly.
      
      Your link may be useful and interesting for other readers though!
Dale Lehman on June 6, 2026 5:56 PM at 5:56 pm said:

In reality, the costs of false negatives and false positives is not constant across papers (my opinion). The impact of papers depends on both the content and the political climate in which they reside. So, both the dangers and benefits of AI authored papers is likely to vary across papers. The optimal decision (to go back to your prior post) should depend on these as well as the probabilities you use in the post (you could, of course, say that it is only detection of AI authored papers that we care about, but that is an assumption that is open to critique). I view this as a typical case where decision analysis is both useful and complex enough to merit skepticism of any rule derived from it.

Reply ↓
Tim F on June 6, 2026 6:09 PM at 6:09 pm said:

Maybe somewhat off topic, but Pangram and other similar services have created something that looks like a protection racket: as long as their false positive rate is perceived to be sufficiently far from 0 (1 in 10,000 is large if your career is at stake!) authors are incentivized to pay for services to ensure their work won’t be flagged as AI by someone else using the same service.

Pangram has a very strange set of incentives – they might actually be incentivized to keep the false postpone rate above 0!

Reply ↓
Michael Murphy on June 6, 2026 6:12 PM at 6:12 pm said:

So it is probably worth noting that the author behind the Granta story has unequivocally denied using AI. As have the other Granta contest entrants accused of using AI. The story here is that a bunch of media hits were driven by the outputs of AI software that usually fails to tell AI generated from human generated text, and is widely known to be unreliable. It is very much like accusing people of being a witch based on the output of a Magic 8-ball. And it might border on being defamatory.

As for the Nazir story, I have read it and it doesn’t do much for me. There are tics in it that one associates with LLMS, true, but LLMS pick those up from humans and, as someone who has occasionally read Granta and considered submitting to it, the Nazir story really really looks like the kind of story that would fit right in at Granta. Very prose poem-y. Lots of local vernacular. Etc. So I am willing to pass it off as not my thing, but if it’s your thing you might like it.

Reply ↓
- JCP on June 6, 2026 7:09 PM at 7:09 pm said:
  
  > The story here is that a bunch of media hits were driven by the outputs of AI software that usually fails to tell AI generated from human generated text, and is widely known to be unreliable.
  
  This seems false, looking at the external Pangram audit. It actually seems very reliable! I’d be interested in evidence to the contrary, especially verifiable false positive hits, if you can find them.
  
  Reply ↓
Robin Blythe on June 6, 2026 6:16 PM at 6:16 pm said:

My view generally is that AI is exposing a lot of what separates value in both human and computer-generated output. In this case, I believe that the quality of the result is what matters, both in terms of accuracy and rhetorical quality. If a paper is well-written, convincing, and makes a meaningful contribution, I don’t really mind who wrote it. Part of that is in its ability to convince the reader that it is well-written and makes a meaningful contribution to knowledge in its own way. In this respect, what I think the situation has exposed is that NeurIPS (and in fact most journals/conferences generally) receives too many submissions and the problem is the current system of receiving, judging, and accepting tens of thousands of submissions. It was already getting bad before AI, which just accelerated it. I would prefer if we stopped making products by and for AI to read, and found another way to determine whether something was worth considering as a valid and meaningful scientific contribution.

Reply ↓
- Andrew on June 6, 2026 6:57 PM at 6:57 pm said:
  
  Robin:
  
  Just as an example: If someone were to copy a chapter from Bayesian Data Analysis, or run it through a computer program but without really changing the content, this could be an accurate, readable, and convincing document. There should be lots of journals that would interested in publishing it–I say this because already many journals publish review articles on statistical methods.
  
  But it would be even better if this hypothetical review article were replaced by a reference to the already-written chapter. The same principle holds for Christian Hesse’s book of ripped-off chess stories, Claudine Gay’s plagiarized Ph.D. thesis, etc. The material’s already out there; what the plagiarist (or the chatbot) does is to hide the source. The plagiarist does this to grab credit; the chatbot does it just as part of the nature of the process, that it takes and combines material from many different sources.
  
  Reply ↓
  - Robin Blythe on June 6, 2026 11:28 PM at 11:28 pm said:
    
    Andrew:
    I’m not sure I agree that LLMs are incapable of generating novel ideas. However, my point was more that I think plagiarism happens anyway, and I think the issue lies further upstream, that science is essentially a mess right now and AI has just exposed the fault lines.
    
    Reply ↓
    - David, a Bostonian in Tokyo on June 7, 2026 9:21 AM at 9:21 am said:
      
      LLMs, as random text generators, are certainly “capable of _generating_ novel ideas”.
      
      They are not capable of _recognizing_ that they’ve generated a novel idea. That’s the user’s job.
      
      I suppose they’re fun if you can get them to spit out a bunch of potentially random ideas and hope something in the random slop is interesting. Seems a waste of time to me, but that’s always been the game: get the user to do the work of understanding the random meaningless* text they generate.
      
      *: Meaningless in the sense that there was no “meaning” used to generate the text. Handling “meaning” in computers is still an open problem.
    - Dale Lehman on June 7, 2026 9:31 AM at 9:31 am said:
      
      David
      While I understand (and largely agree with) your characterization of how LLMs work, I do not agree with your conclusion. You call it a waste of time, but I have numerous cases where LLMs provided meaningfully different ideas about how to approach modeling problems I have presented it with. There are largely quantitative risk analysis problems where I have my own approaches to modeling that the LLM has presented meaningful, and often better, alternatives. Now it is entirely possible that this only shows how poor my modeling ability is. But I have more confidence than that – I can admit that there exist people for which these LLM alternatives might be meaningless due to their superior ability. But I’ve seen genuinely improved ideas generated by LLMs.
      
      I should add: not always, and not without errors. So, I do not propose these LLMs as a replacement for me, I think you are selling them short – by a large margin.
    - David, a Bostonian in Tokyo on June 7, 2026 12:54 PM at 12:54 pm said:
      
      Relying to Dale here.
      
      Actually, we’re saying the same thing: they spit out random slop that might make you think.
      
      For me, it’s waste of time, for others, not.
      
      But I’d bet that two thirds of those “interesting ideas” it spat out are things you already knew/had thought of but hadn’t gotten around to chasing down.
      
      Question? Are you paying something like US$35 a month? If so, would you still be interested at the actual US$750 or so of computer time you are actually using???
      
      (Or would your employer be willing to pay that???)
      
      Sorry for the subject shift. A friend showed me a model running on a friggin’ iPhone that was actually pretty good. Almost the next day, a music YouTube video (Rick Beato) discussed models running on peecees and Macs being good enough. I’m wondering what the limitations of these are going to be, and if specially trained mini-LLMs will be able to do what specialists in limited areas need.
      
      This is sort of hilarious because one of the problems of LLMs as a commercial thing is that there’s no moat: everyone knows how to build an LLM. So the heads of LLM companies have to say stupid things to keep people from realizing that…
    - Daniel Lakeland on June 7, 2026 2:34 PM at 2:34 pm said:
      
      Dale,
      
      The entire game with LLMs is:
      
      generate meaningless slop :— have a human filter and add meaning to the output
      
      meaningless slop here is a technical term, the generated text is plausibly like previously seen text but the meaning of the words does not enter into the probability calculations. Just to give an example “when in doubt lead” could be an exhortation to the hobby manufacturer of leaded glass windows, or to a group leader of an expedition gone wrong, the token “lead” is transformed to a number, maybe 1281, the sentence “when in doubt lead” might be transformed to the sequence of numbers [7318, 64, 5509, 1281] and the job of the LLM is to predict the next token. It has no idea what 1281 “means” in the sense of reference to either an Element with atomic number 82, or a process involving showing a group a way forward… At this point in the text it could easily say
      
      “when in doubt lead is a good thing to add to the joints that seem less sturdy” or “when in doubt lead your group to the nearest water source and try to stay there while organizing a signal fire for any passing airplanes to see”
      
      The problem is, LLMs get used for technical things requiring knowledge in order to know whether the resulting text/code/images etc are meaningful. In the absence of deep knowledge of the subject matter, it’s easy to go down a meaningless rabbit hole. Yet, anyone can use them to do anything at all, including people who don’t know what they don’t know. For example, suppose an expert in leaded glass windows would tell you that merely adding more lead causes problems that often result in cracked panes, while an expert in wilderness survival might say that finding a water source is a good idea, but signal fires usually result in more trouble than they’re worth and you’d be better off arranging brush into a pattern that says “SOS” or something.
      
      So they’re a dangerous thing to rely on. This isn’t a new problem, like reading The Anarchist Cookbook and trying to understand chemistry from it. It was written by a 19 year old fool with emotional problems, he literally didn’t know anything much about anarchism or chemistry. It isn’t completely wrong maybe, but it isn’t the output of an expert. The output of an expert is the US Army’s improvised munitions handbook. https://archive.org/details/tm-21-210-improvised-munitions-handbook-1969-department-of-the-army . Relying on crap information could get yourself blown up.
      
      The point being, this isn’t a new phenomenon, but LLMs make it an automated process at scale. It’s a gish gallop on every topic of human knowledge. The plausible nature of the meaning-free output makes it usable almost exclusively by subject matter experts for their limited area of subject matter expertise… But that’s NOT what they’re used for in reality. That’s particularly true when they’re used to write entire websites exclusively to collect ad revenue from passing people’s clicks as they try to search the web for authoritative information.
    - Anoneuoid on June 7, 2026 4:02 PM at 4:02 pm said:
      
      @Daniel
      
      Its like if you ask “Did Einstein say the speed of light is constant?” The response will be essentially “yes”. Then follow up with, “What did he say when comparing GR vs SR?” Then it will give you a more nuanced answer (only “locally”, ie for infinitesimally small distances and SR is a toy model for a universe without gravity).
      
      You need to continue down such paths to get it away from the popular/textbook oversimplified BS before you can get useful/novel output. Just like a human really.
Luc Rocher on June 6, 2026 8:02 PM at 8:02 pm said:

Thanks Jessica, this is a great piece. I’m curious why few academics, especially in CS/AI circles, want to address the elephant in the room: there are too many papers. Generative AI is pushing to its end a system that was already sick, and making students sick. The mental load of having to churn small, iterative contributions as fast as possible because someone might publish it first is not sustainable, nor good for science.

What about we aim to have one good paper per year, per person—pro rata of co-authorship? It wouldn’t be an easy task, but attempting across thousands of papers to detect who used AI, and who used AI but tried to hide it, is not easy either.

Reply ↓
- Andrew on June 6, 2026 8:19 PM at 8:19 pm said:
  
  Luc:
  
  It’s fine for some people to write one good paper, per year. But some of us have more to say than that!
  
  I guess it’s ok as long as I’m limited to one paper per year for CS conferences but I’m allowed to publish other papers elsewhere.
  
  Reply ↓
- Jessica Hullman on June 6, 2026 8:51 PM at 8:51 pm said:
  
  I’ve probably said it before but I like the idea of deciding hiring and promotion based on a randomly drawn subset of the person’s recent work.
  
  Reply ↓
  - Jukka on June 7, 2026 2:37 AM at 2:37 am said:
    
    Rather, I’d argue for the direction the Coalition for Advancing Research Assessment (CoARA) is taking especially in Europe. Among other things, they emphasize diverse outputs. Yes, those cover papers too, but also everything in-between; from media articles, blog posts, and whatnot to research software.
    
    https://doi.org/10.5281/zenodo.13480728
    
    Reply ↓
- AAAnonymous on June 7, 2026 4:08 AM at 4:08 am said:
  
  Quote from above: “What about we aim to have one good paper per year, per person—pro rata of co-authorship?”
  
  Another reason why I reason intellectual ownership or credit (or whatever term is appropriate here) might be very important in science is that it might be useful in light of other scientific processes and perhaps even certain views on science itself.
  
  Take the quote above for example. The one paper per year proposal has been mentioned by Nelson, Simmons, and Simonsohn in their paper titled “Let’s publish fewer papers” (2012). An excerpt:
  
  “As a thought experiment, consider a different utopia. In this one, researchers are allowed to publish only one
  paper per year. Publication quantity is no longer a relevant dimension. This system incentivizes researchers to demonstrate that an effect is robust and generalizable, and hence true and important.”
  
  The possible relevance, or even benefits, of making sure credit is given where credit is due, or making sure people know who thought or wrote about things previously, is that scientists and other readers who come across the idea of publishing one paper per year can now, also, read what the previous scientists who wrote about this idea thought about this all. Then, someone might think about this idea some more and perhaps even write a paper about it where they can refer to this idea proposed by scientists in the past and perhaps contribute their own ideas or changes to it. New readers can all become more aware of this stuff, or new readers can even more directly search for more papers about the topic when looking at the author’s publications, etc.
  
  Perhaps this all can be seen as a (kind of) scientific process, in which it might be beneficial, if not crucial, to make clear who exactly wrote what and when (which might make it also easier to determine why).
  
  Reply ↓
Jon M on June 6, 2026 10:01 PM at 10:01 pm said:

Alternative view: Type B and Type C authors are not so different, at least when it comes to high-quality work. For such writing, initial ideas are relatively incomplete, and most of the value comes through a person refining their initial ideas through the writing and editing process. Today, LLMs cannot replicate that process by themselves, and are at the “it’s faster to do it myself” phase when it comes to working with them.

Given that reviewer / reader time are now much more scarce relative to author time than before, and critically so, it makes sense to aggressively filter out anything with signs of being LLM-written (whether based on content, author, whatever), accepting a much higher false-reject rate than one previously would have.

Reply ↓
Richard Kennaway on June 7, 2026 4:56 AM at 4:56 am said:

When people rely on AI to write conference papers, how well do they do at presenting them, handling questions, and talking off the cuff about them in informal conversations? Does anyone have a feel for that?

Reply ↓
- Anoneuoid on June 7, 2026 5:37 AM at 5:37 am said:
  
  Also, are those heuristics important? If you can come up with replicable methods and interesting predictions, should anyone care about if you can talk about it?
  
  Reply ↓
Andy W on June 7, 2026 10:31 AM at 10:31 am said:

My impression is mostly that Pangram only really flags B (folks that just do some simple prompts and do very little edits). I personally use either A or C approach for some writing (not all, but I am using it more and more all the time), and I use Pangram to check it afterwards, https://andrewpwheeler.com/2026/03/20/using-claude-code-to-help-me-write/.

I have not had a personal piece of writing be flagged by Pangram when I use the approach in that blog post (which is merely “get some of my prior writing in your context window” and don’t use emojis and lists. See https://andrewpwheeler.com/2026/04/21/the-race-to-the-bottom-with-ai-tools/ for one example (with a link to show the Pangram check at the end, and the conversation log).

I have started to just personally disclose it when I use the AI tools (and in what capacity).

Reply ↓
John N-G on June 7, 2026 1:23 PM at 1:23 pm said:

I was definitely thrown for a loop by the tacit assumption that Type B papers should be rejected but Type C papers should be accepted. I would reject both. For those who can’t write well, the submission ought to start with something like “I cant write well, so I used AI to write the paper. I had the idea of …. and I …. . To make all this clear to you, I used AI (Xxxx version x.x) to generate text describing everything I did in the form of a paper, with prompting at the level of [the entire paper / individual sections / individual paragraphs and figures / revising my original text]. Here it is.”

This doesn’t solve the detection problem, as a Type B person could just as easily write an introduction like this, but as a reader or reviewer I prefer to know the nature of the communication being presented to me.

Reply ↓
Bora Ön on June 8, 2026 5:05 AM at 5:05 am said:

This is a very very nice and deep article. Well done.

Reply ↓
Kris Hardies on June 8, 2026 3:04 PM at 3:04 pm said:

This editorial in Organization Science is interesting
https://pubsonline.informs.org/doi/10.1287/orsc.2026.ed.v37.n3

They used Pangram to analyze all their submissions since 2021. Some notable findings:
* Writing quality declined (in AI-high manuscripts); also for non-native speakers.
* Authors who are most likely to use AI in writing do not benefit in the review process and, in fact, may be hurt when using AI for writing
* It’s mostly due to academics from “elite” institutions responding to their strong publication incentives (schools using the UTD50 list); this seems counter to some other recent research.

Reply ↓
Chen on June 8, 2026 9:58 PM at 9:58 pm said:

I don’t think you use AI enough to truly get how AI help authors. The use AI to “generate” and can be detected is really the shallowest level of use, it enters into writing in more complex ways, research, checking parameters against actual experiments or code, checking references and correcting them, AI when used carefully can truly boost the quality of paper even for native speakers.

Reply ↓
- Jessica Hullman on June 9, 2026 10:10 AM at 10:10 am said:
  
  Not sure who you are addressing here. If it’s me, I’m confused about where you’re getting your info. The post is about using detectors to flag papers that might be mostly AI generated. No one is worried about researchers using AI to improve their workflow.
  
  Reply ↓
Anonymous on July 2, 2026 5:46 AM at 5:46 am said:

I think that using llm to articulate your ideas better is OK. Generating neuroslop with hallucinated ideas and fake results in order to climb on academic status ladder is not OK. The problem of Authorship is quite tricky, how many books used character arc of hero from the Odyssey or Bible? Does it means that LOTR is plagiarism?

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

When is detecting AI-generated text worthwhile?

38 thoughts on “When is detecting AI-generated text worthwhile?”

Leave a Reply Cancel reply