Skip to content
 

Computer-generated writing that looks real; real writing that looks computer-generated

You know that thing where you stare at a word for long enough, it starts to just look weird? The letters start to separate from each other, and you become hyper-aware of the arbitrariness of associating a concept with some specific combination of sounds? There’s gotta be a word for this.

Anyway, I was reminded of that dissociation or uncanny valley after reading a passage from an article by John Seabrook on computer programs that can write text: kinda like that autocomplete thing on your phone, but instead of just suggesting one word or phrase at a time, it will write whole sentences or paragraphs.

The article is fun to read, and I have some further thoughts on it relating to statistical workflow, but for now I just wanted to point out this passage recounting how Seabrook pushed a button to train a language-learning bot on some writings of the linguist Steven Pinker and then asked the bot to complete an email that Pinker had started. Here’s Seabrook:

I [Seabrook] put some of his reply into the generator window, clicked the mandala, added synthetic Pinker prose to the real thing, and asked people to guess where the author of “The Language Instinct” stopped and the machine took over.

Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement and concord—to say nothing of semantic coherence. And this reveals the second problem: real language does not consist of a running monologue that sounds sort of like English. It’s a way of expressing ideas, a mapping from meaning to sound or text. To put it crudely, speaking or writing is a box whose input is a meaning plus a communicative intent, and whose output is a string of words; comprehension is a box with the opposite information flow. What is essentially wrong with this perspective is that it assumes that meaning and intent are inextricably linked. Their separation, the learning scientist Phil Zuckerman has argued, is an illusion that we have built into our brains, a false sense of coherence.

You can click through to find out where the human ended and the computer started in that passage. What’s interesting to me is that, reading the paragraph with the expectation that some parts of it were computer generated, it all looked computer-generated! Right from the start: “Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement…”: this looks like word salad—or, I guess I should say, phrase salad, already.

It’s a Blade Runner thing: Once you get in your mind that your friend might be a replicant, you just start looking at the person differently. The existence of stochastic writing algorithms makes us aware of the stochastic algorithmity of what we read and write. It makes us think of the algorithms that each of us use to express our thoughts.

This shouldn’t be news to me. It’s already been the case that my own experience as a writer has made me more aware as a reader—I can see the seams where the writing has been put together, as it were—so even more so when we think of algorithms.

P.S. I’ve been critical of Pinker on this blog from time to time, so let me clarify here that (a) Seabrook’s example of completing Pinker’s paragraph is amusing, and (b) I’m pretty sure the same thing would happen if someone were to run one of my paragraphs through a machine learner.

First, my writing (like Pinker’s) is patterned: I have a style, and there are some things I like to write about. Hence it’s no surprise that it should be possible to write something that looks like what I write—at least for short stretches.

Second, if you stare at just about any sentence I write, it will start to fall apart. Each sentence works as part of the larger story. Trying to make too much sense from any one sentence is like staring at a single brushstroke at a painting. Ultimately, it’s just a collection of words, or some paint, and it can’t live for long after being plucked from its contextual tree.

OK, I did that last bit on purpose, wandering off and writing somewhat incoherently. The point is that a phrase “Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement…” is, to be fair, no more gobbledygooky than lots of sentences that I write.

My point here is not to criticize Pinker all but rather just to use Seabrook’s particular example to demonstrate this uncanny valley that arises when we consider the possibility that a sentence isn’t real. I guess it’s similar to what’s happening with fake photos, fake videos, fake news stories, and so forth, that these all make us more aware of the fakeyness of the real.

22 Comments

  1. A.P. Salverda says:

    When a spoken word is pronounced over and over again in quick succession, a speaker or listener will experience the feeling that it loses its meaning. This effect is called semantic satiation. It’s quite compelling and has been well studied in the psycholinguistic literature.

  2. Steve says:

    This blog post is obviously Andrew punking us. The entire post was obviously written by a bot.

  3. Clyde Schechter says:

    I have to say I’m puzzled by your reaction to ““Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement…” Maybe it’s because I minored in linguistics in college, or I’m cued by its position at the beginning of the paragraph (although, you are, too), but this seems like a perfectly natural sentence to describe the linguistic behavior of a bot. I’m not familiar with Pinker’s writing style at all, but I could easily see myself writing it.

    • Clyde Schechter says:

      By the way, FWIW, in the article, I correctly guessed where Pinker ended and the bot began. It wasn’t about the syntax, it was that the meaning at the end didn’t sound to me like something a linguist would say, whereas everything up to that did.

      • Bob76 says:

        I also guessed correctly. The last two sentences are disconnected from the rest of the paragraph and contradict each other. Reading them reminded me of Eliza’s responses.

        Bob76

    • Andrew says:

      Clyde:

      Maybe my problem in comprehension came from the three “it”s in the sentence. I had difficulty following what was going on, and that made the sentence seem like phrase salad. Again, this is no slam on Pinker: I’m sure that I’ve written lots of sentences that would be just as difficult to read, or more so, without sufficient background.

  4. Adam Wheeler says:

    I realize that this isn’t the point of your post, but I’m tempted to agree with Pinker here. I correctly identified the spot where the computer took over because there was a kind of discursive non-sequitur. Maybe my vague familiarity with this sort of thing helped me know what to look for?

    I found about half of the auto-generated in the article to be convincing, although often a bit trite.
    > Pinker is right about the machine’s amnesic qualities—it can’t develop a thought, based on a previous one. It’s like a person who speaks constantly but says almost nothing. (Political punditry could be its natural domain.)
    savage

    • Phil says:

      I, too, identified the transition from Pinker to the machine, but (1) I wasn’t 100% sure until I checked, and (2) evidently quite a few people didn’t get it right. In fact, the thing I find a bit eerie about the transition is that although the machine was deviating from the point Pinker was trying to make, it seemed to know that: it chose words that indicated that the sentences it was adding were at odds with the preceding sentences. If it had just quoted the entire Pinker section and started a new paragraph, it would have been totally fine.

      And anyway I think the fact that the computer can’t currently generate more than a few convincing sentences at a time, and that imperfectly, misses the point. The issue is not what the computer is capable of now, it’s the trajectory this research is on. Prior to reading this article I would have agreed with Pinker that in order to generate decent writing the computer would have to understand a narrative arc or a coherent set of thoughts that it wanted to convey, but this article seems to indicate that either that isn’t necessarily true, or that whatever it means to ‘understand’ something, computers are well on their way. The program in question — or maybe I should call it ‘the AI in question’ since it is both hardware and software — is a long way from being able to write a New Yorker article, but it seems it might already be able to co-write a decent op-ed if you give it the first few sentences of each paragraph and let it add two of its own. One can certainly see why people are worried about the future of AI and what it means for society.

  5. Dmitri says:

    Those first four Pinker sentences make a very clear point: language is not just a string of words but the communication of ideas. The fifth sentence essentially rejects this point, giving the appearance of making an argument (“what is essentially wrong with this perspective …”) but not actually providing an argument, just a weak and floppy placeholder (“it assumes”). Then the sixth sentence says that some learning scientist has shown that the separation of meaning and intent is an illusion, essentially reverting to the point of the first four sentences.

    The logic is clear for four sentences then it becomes a total mess. This seems like a clear illustration of Pinker’s point.

  6. Michael Nelson says:

    Upon revising a very long article, I was dismayed to discover that I use an algorithm that uses a lot of the same types of clauses, and repeats the same set of words to introduce those clauses:

    In fact
    Of course
    Here,
    Now,
    Then,
    Although
    Though
    Certainly
    Granted
    Obviously
    As such
    Therefore
    Thus
    Where
    Whereas
    Such that
    Accordingly
    Likewise
    Additionally
    Indeed
    However
    Yet,
    Still,
    Nonetheless
    In any case
    In any event
    Regardless
    That is,
    Hence
    Consequently
    Indeed
    Whereas

    I don’t have the time or inclination to fix all that, but I am intrigued at what these kinds of conditional clauses say about my approach to writing. I don’t seem to trust the reader to go from proposition to conclusion without my signaling whether two statements are mutually true, independently true, conditionally true, contradictory or redundant. Also, I seem to like relaying facts in such a way that I think the reader will be surprised or challenged.

    • Phil says:

      You might be interested in a few paragraphs John McPhee wrote about the text editor Kedit in The New Yorker article “Structure: Beyond the picnic-table crisis” which I believe you can read in their archives even if you don’t have a subscription.

    • jim says:

      You should write for The Economist :)

      Economist Story Formula:

      Dadada happened. That’s (good, bad). Here’s why it’s (good, bad), and here’s the impact if it’s really as (good, bad) as believed [“obviously”, “hence”, “therefore”].

      Yet, it might not be as (bad, good) as it seems. Here’s the alternative take on why it might be opposite to the prevailing idea. Here’s the impact if it’s opposite to the prevailing idea [“However”, “That is”, “As such”].

      The Prevailing idea looks likely. But no one can tell. The alternative idea just might be right[“Yet”, “Indeed”, “Still”].

  7. Thomas Lumley says:

    I’m surprised no-one has done this by the 15th comment ;-)

    A fundamental tenet of social psychology, behavioral economics, at least how it is presented in the news media, and taught and practiced in many business schools, is that small “nudges,” often the sorts of things that we might not think would affect us at all, can have big effects on behavior. But those kinds of small nudges, made possible by technology, may have a much larger, more subtle impact, and this is something a lot of people are not aware of or don’t seem to be aware of. And there’s nothing in this study to say that’s a bad thing.

  8. Olav says:

    I agree with Pinker that the program mostly seems to spout a bunch of incoherent gobbledygook that only superficially looks fine (and I was also able to tell where Pinker’s text ended and the program output started). However, having graded thousands of student papers over the years at several universities, I can tell you that the truth is that this is what a large proportion of human writing looks like. Being able to write coherently, using grammatically well-formed sentences, and with a clear argumentative arc is actually surprisingly rare. Skilled writers like Pinker and Gelman make it look easy, but it definitely isn’t. If the goal with the program is to imitate how most people write, then that goal has arguably already been accomplished.

    • David J. Littleboy says:

      Maybe you underestimate “most writers”. All the stuff I read has a point to make. No one goes incoherent. For example, no article ever printed in the New Yorker does that. Ditto for Bungeishunju (a Japanese New-Yorker-like monthly).

      FWIW, Gary Marcus had an article (somewhere) that discussed this thing. The bottom line is that randomly generated garbage is random garbage no matter how good the syntax. (He points out that some of the articles on it cherry-pick the examples by having the program run several times and then only showing the outputs that are not ridiculous.) It’s a glorified Markov Chain text generator. Personally, I don’t understand the fascination with randomly generated garbage. People had great fun writing and playing with those Markov Chain text generators back in the day. But when it comes to intellectual content, there isn’t any.

      I suppose one could argue that there are two types of art. Type A, for lack of a better term, is art in which the intent of the artist is clear/direct. Prose fiction, landscape photography. And type B: art in which the object of the game is to jolt the viewer’s sensibilities with kewl juxtapositions. Poetry, abstract painting. But in both types of art, the artist is speaking to a human viewer/reader based on said artist’s understanding of the human experience. Computer generated art may effectively function as type B art for some, but it ain’t art because the jolts are randomly generated with no understanding of the viewer/reader. It’s still randomly generated garbage, even if it’s fun. Now if the machine had a model of causality and reasoning and used that to generate meaningful texts, we’d be making progress in AI.

      But, as Marcus points out in “Rebooting AI”, none of the current work in “AI” is even trying to deal with causality, reasoning, common sense. To quote Roger Schank “There’s no such thing as AI”.

      • Olav says:

        Of course the New Yorker doesn’t contain incoherent writing. It’s one of the most prestigious magazines in the US. But even your average local newspaper article will be better written than what 90% of people are capable of producing. In general, anything that’s published in any periodical is likely to have been written by a vetted writer, so looking at published work (in any form) will give a highly skewed impression of the average writing ability in the general population. You only get a real sense of how bad most people are at writing if you have to grade lots of student papers.

  9. Sean Matthews says:

    For what it is worth, I hit the transition exactly.

Leave a Reply