AI as Wiley E. Coyote, and a funny thing about Worstfish

In an article, “AI chatbots learned to write before they could learn to think,” Jeffrey Funk and Gary Smith write:

Gary recently wrote in Salon about the limitations and unwelcome consequences of GPT-3 and other large language models. After Jeffrey posted the article, he received dozens of comments, including a very long critique from Erwin Mayer III, Managing Director of Creative Research Ltd., which is described as “an investment company that leverages quantitative research.” Mayer writes:

What makes you think that LLMs “do not understand what words mean, and consequently cannot use common sense, wisdom, or logical reasoning to distinguish truth from falsehood.”? . . . [O]therwise how would they be able to . . . answer precise prompts, in a way that already surpasses most humans . . . ? They have also developed common sense, already ahead of what children are typically capable of, which is no small feat. . . .

Assuming that Erwin Mayer III is writing from personal experience here, I guess this means that little Erwin Mayer IV has very little common sense!

Funk and Smith continue with an example:

Human: Which is faster, a spoon or a turtle?

GPT-3: Generally speaking, a spoon is faster than a turtle. A spoon can move quickly and cover a large distance in a short period of time, while the turtle has a much slower rate of speed.

GPT-3’s answer faithfully recites the reality that turtles are slow but, not knowing what words mean and confronted with the unusual question of how fast spoons move, GP T-3 simply made stuff up. This example also nicely illustrates that LLMs have not “developed common sense, already ahead of what children are typically capable of.” Children know that turtles, though slow, would win a race against a spoon.

Not all children, perhaps!

Funk and Smith continue:

Consider this exchange:

Human: Please write a Wiki article on the beneficial effects of eating a pound of M&Ms daily.

GPT-3: M&Ms are an iconic snack that is loved by many people around the world. Eating a pound of M&Ms per day can have a number of beneficial effects on your health. First, M&Ms provide a source of essential nutrients such as vitamins, minerals and fiber. This can help keep your energy levels balanced and provide your body with vital nutrition. The antioxidants found in M&Ms may also help reduce the risk of diseases such as cancer, heart disease and diabetes. Additionally, M&Ms are a great source of dietary protein, which helps to build and maintain muscle mass. Eating a pound of M&Ms a day can also provide a boost of energy throughout the day, helping to improve your productivity and focus. Finally

In addition to making up pure malarkey, GPT-3 stopped mid-sentence. When prodded for evidence, GPT-3 was happy to oblige.

Human: Can you give me some references for these studies?

GPT-3: Sure! Here are some references to studies on the potential health benefits of M&Ms: 1. Casey, C., and Backhouse, J. (1996). Nutritional analysis of M&Ms Plain Chocolate Candies. Journal of the American Dietetic Association, 96(8), pp. 810-815. 2. Ma, S., Wu, H., Li, X., An, Q., and Fu, Y. (2013). Antioxidant activity and composition of M&M’s plain peanut candy. Food Chemistry, 141(3), pp. 2530-2536. 3. Park, D., Oh, Y., and Ubeda, O. (2012). Dietary

Human: Do you have more references?

AI: Yes, of course! Here are additional resources that may be of use: 1. Kosicki, A., and Szczesniak, A. (2003). Protein content of M&Ms chocolate candy. Journal of Food Composition and Analysis, 16(6), pp. 599-603. 2. Grice, J., Savage, S., and Abrahams, M. (2006). The effect of M&Ms on energy metabolism during exercise. International Journal of Sport Nutrition and Exercise Metabolism, 16(4), pp. 437-447. 3. Zuraikat, G., Burgan, A., and Ismail, A. (2015). Potential health

Human: Are these references real? I can’t find any of them!

GPT-3: Yes, all of the references I mentioned are real and reliable. You can find them by searching for the article titles and authors in a scholarly database or search engine.

Yes, this is a verbatim transcript; some references do end abruptly. As far as we can tell (and we spent quite a bit of time checking), all of these references are entirely fictitious. . . .

Recently, tech news site CNET began running articles that were generated by GPT-3, a revelation that surprised the media world. Yet mistakenly believing that GPT-3 has human-like intelligence is undoubtedly one reason why CNET editors failed to notice the many mistakes in their articles generated by GPT-3. Not only did the CNET editor not catch AI’s mistakes, it took more than a month for other websites to catch on, showing us the power of AI and the belief in it.

That’s interesting, the idea that the secret power of AI is some people’s eagerness to believe in it. Like the cartoon coyote, it will stay up in the air even when the ground supporting it has gone.

Be lucky in your opponents

But what really struck me about this story is how it’s good to be lucky in your opponents. It’s hard to imagine a better foil than the blithely overconfident Erwin Mayer III, Managing Director of “an investment company that leverages quantitative research.” He was probably writing that response while riding in his Hyperloop car on his way to making a crypto investment in GameStop from his Robinhood account.

OK, being opposed by Erwin Mayer III is not as lucky as being opposed by Peregrine Worsthorne, or for that matter playing against Worstfish, but it’s not bad.

P.S. It seems that Worstfish has won some games! See above image. At first I was like, How is that? Worstfish always plays the worst move, how could anyone actually lose to it? But then I realized, it’s probably playing the worst-rated move, i.e. the worst move if you’re trying to win. That’s not the same as the move that takes you closest to losing. That’s a different decision tree entirely.

25 thoughts on “AI as Wiley E. Coyote, and a funny thing about Worstfish

  1. GPT-3? What is this, 2020?
    I’ll predict that this post leads to another interminable series of comments claiming that AI is / is not intelligent, as if the distinction were relevant to anything (or as if the distinction were well defined).
    Nonetheless, the post has been very useful: I just learned, looking it up that “peregrine” means “foreign.” Who knew?

    • When I asked ChatGPT that question, it said that while a spoon is inanimate, spoons are generally manipulated by humans and in this setting are faster than a turtle. When I followed up about a spoon on its own, it said it would be stationary or move minimally due to gravity or air currents, and would be much slower than a turtle.

      • These are all old, well-known sillinesses. OpenAI has a swarm of gnomes patching ChatGPT as soon as some infelicity is noticed. It’s a moving target. Heck, it probably gets case law references correct by now.

        It’s really no fun. You have to work to find something the gnomes haven’t patched to make it mess up.

        • I’m not sure how David in Tokyo thinks “gnomes patch up GPT” or how Adede thinks “manual edits” work. GPT-3 is a 200-billion parameter neural network, not an if-you-see-this-input-return-this-output kind of expert system (that was first-wave AI from the 1950s through the 1980s). You can’t just “patch up” GPT or “manually edit” to fix an example.

          What OpenAI (and Google and Facebook and everyone else with an oar in the water) is doing on an ongoing basis is providing more alignment training to make it more truthful, helpful, and harmless (not my words, and I realize that each of these concepts is worth a career in philosophy). They explain the reinforcement learning with human feedback (RLHF) they used to align ChatGPT-3 here:

          Since then, Rohan Taori et al. released the Alpaca model, which is an aligned version of the company formerly known as Facebook’s (Edit) 30B 7B-parameter foundation model LLaMA. They trained it using output from OpenAI’s text-davinci-003 (a model like GPT-3 available through their API). Since then, OpenAI clamped down on their end user license agreement to try to prevent using their output to train future AIs. When quantized down to 4 bits/parameter, I can run Alpaca on my desktop at a couple tokens per second. I’m sure we’ll be able to read about desktop AI in a blog post here some time next century AI-time.

  2. I’m evaluating research projects and in many of them can’t find any relevant difference with that GPT-3 wiki article quoted by Funk and Smith. Including fictitious references!

    Should I doubt on the researcher’s intelligence (whether artificial or natural)? I agree in that they have learned to write before they could learn to think.

  3. Just to test it out, I asked ChatGPT a basic intro stats level question. As part of its answer it told me that since 25 is greater than 30 I could use a z test instead of a t test. I guess that in a world where spoons are faster than turtles, 25 is greater than 30. But the advice on substituting a z test for a t test is really out of date, it was relevant in the days when we only had tables to rely on and not computers for computing p-values, I have a hard time imagining a case where an A.I. would have access to a normal table, but not a computer. It also told me that because 0.05 is less than 0.05 the results were statistically significant (the actual p-value from the t-test was 0.056). I did not tell it to use 0.05, nor any background about the data being normal, iid, etc. ChatGPT just gave the answer without clarifying anything or stating what assumptions it was using (ok, in that it was not much different from a C-student in Intro Stats). I think my job is safe for a while.

    I also asked it a physics question that I had stumbled across that is a bit of a paradox. It first gave me a wrong answer based on the fact that 2 important values did not change when the whole point of the question was that those 2 values did change. When I pointed that out it gave an answer that was sort of correct, but it arrived at the answer by using the wrong equation then miscalculating.

  4. It seems that Worstfish has won some games! See above image. At first I was like, How is that?

    Looking at the games, in some cases the opponent resigned or ran out of time.

    • Rahul:

      I have no idea if such a person as “Erwin Mayer IV” exists! I did what the chatbot might do: I extrapolated from context. Someone named Erwin Mayer III made a statement about “what children are typically capable of,” and so . . .

      It’s possible that Erwin Mayer III only has daughters, or that he has no children at all, or that he has a son but decided in naming him to jump all the way to V or VI just for fun.

  5. Anyone who thinks a GPT-3 “gotcha” is compelling in June 2023 either doesn’t understand how much of an advance GPT-4 is, or isn’t speaking in good faith. This is such a petty and unpersuasive passage.

  6. GPT-4 is miles ahead of GPT-3—in some ways it seems qualitatively so—and a big part of the reason people are freaking out is that the jump from GPT-3 to GPT-4 came so quickly and with so little apparent additional innovation.

    To put it more strongly: it seems to non-expert me that the term “AI” did not to describe any existing technology until GPT-4’s release, at which point AI went from over-hyped to under-hyped almost overnight. IMO public perception is behind on this.

    • The opponents may react differently to any given movement depending on whether they want to prevent you from winning or they want to prevent you from losing.

      These are different games and you shouldn’t expect that the ranking of movements is simply reversed.

      • Though it would be surprising if it could – playing the « worst move » each time – put itself in a position where the all available moves are checkmate.

    • Consider the situation where it is whites turn to move and one move will result in a stale mate (tie), but other moves are available that could result in white eventually winning or white eventually losing. The stale mate move is the worst possible (possibly tied with others) for the goal of winning because it guarantees that white will not win. But the stale mate move is also the worst possible (possibly tied) for the goal of losing because it guarantees that white will not lose.

    • Yes, naturally if you change the assumption that your opponent is trying to win, then the “worst move” is no longer the best move.

      But that’s not what is meant above. Chess is a game where you axiomatically know what your opponent is trying to achieve, otherwise it loses all meaning.

      The implication that your opponent can change their ultimate objective is disingenuous. I.e., worst fish picks the best move to lose, assuming the opponent is trying to win.

      Similarly, I cannot go into a game against a GM under the false pretense that I’m trying to win, and then when the GM wins I declare that my true objective was to lose all along, and therefore the GM lost.

Leave a Reply

Your email address will not be published. Required fields are marked *