I disagree with Geoff Hinton regarding “glorified autocomplete”

Computer scientist and “godfather of AI” Geoff Hinton says this about chatbots:

“People say, It’s just glorified autocomplete . . . Now, let’s analyze that. Suppose you want to be really good at predicting the next word. If you want to be really good, you have to understand what’s being said. That’s the only way. So by training something to be really good at predicting the next word, you’re actually forcing it to understand. Yes, it’s ‘autocomplete’—but you didn’t think through what it means to have a really good autocomplete.”

This got me thinking about what I do at work, for example in a research meeting. I spend a lot of time doing “glorified autocomplete” in the style of a well-trained chatbot: Someone describes some problem, I listen and it reminds me of a related issue I’ve thought about before, and I’m acting as a sort of FAQ, but more like a chatbot than a FAQ in that the people who are talking with me do not need to navigate through the FAQ to find the answer that is most relevant to them; I’m doing that myself and giving a response.

I do that sort of thing a lot in meetings, and it can work well, indeed often I think this sort of shallow, associative response can be more effective than whatever I’d get from a direct attack on the problem in question. After all, the people I’m talking with have already thought for awhile about whatever it is they’re working on, and my initial thoughts may well be in the wrong direction, or else my thoughts are in the right direction but are just retracing my collaborators’ past ideas. From the other direction, my shallow thoughts can be useful in representing insights from problems that these collaborators had not ever thought about much before. Nonspecific suggestions on multilevel modeling or statistical graphics or simulation or whatever can really help!

At some point, though, I’ll typically have to bite the bullet and think hard, not necessarily reaching full understanding in the sense of mentally embedding the problem at hand into a coherent schema or logical framework, but still going through whatever steps of logical reasoning that I can. This feels different than autocomplete; it requires an additional level of focus. Often I need to consciously “flip the switch,” as it were, to turn on that focus and think rigorously. Other times, I’m doing autocomplete and either come to a sticking point or encounter an interesting idea, and this causes me to stop and think.

It’s almost like the difference between jogging and running. I can jog and jog and jog, thinking about all sorts of things and not feeling like I’m expending much effort, my legs pretty much move up and down of their own accord . . . but then if I need to run, that takes concentration.

Here’s another example. Yesterday I participated in the methods colloquium in our political science department. It was Don Green and me and a bunch of students, and the structure was that Don asked me questions, I responded with various statistics-related and social-science-related musings and stories, students followed up with questions, I responded with more stories, etc. Kinda like the way things go here on the blog, but spoken rather than typed. Anyway, the point is that most of my responses were a sort of autocomplete—not in a word-by-word chatbot style, more at a larger level of chunkiness, for example something would remind me of a story, and then I’d just insert the story into my conversation—but still at this shallow, pleasant level. Mellow conversation with no intellectual or social strain. But then, every once in awhile, I’d pull up short and have some new thought, some juxtaposition that had never occurred to me before, and I’d need to think things through.

This also happens when I give prepared talks. My prepared talks are not super-well prepared—this is on purpose, as I find that too much preparation can inhibit flow. In any case, I’ll often finding myself stopping and pausing to reconsider something or another. Even when describing something I’ve done before, there are times when I feel the need to think it all through logically, as if for the first time. I noticed something similar when I saw my sister give a talk once: she had the same habit of pausing to work things out from first principles. I don’t see this behavior in every academic talk, though; different people have different styles of presentation.

This seems related to models of associative and logical reasoning in psychology. As a complete non-expert in that area, I’ll turn to wikipedia:

The foundations of dual process theory likely come from William James. He believed that there were two different kinds of thinking: associative and true reasoning. . . . images and thoughts would come to mind of past experiences, providing ideas of comparison or abstractions. He claimed that associative knowledge was only from past experiences describing it as “only reproductive”. James believed that true reasoning could enable overcoming “unprecedented situations” . . .

That sounds about right!

After describing various other theories from the past hundred years or so, Wikipedia continues:

Daniel Kahneman provided further interpretation by differentiating the two styles of processing more, calling them intuition and reasoning in 2003. Intuition (or system 1), similar to associative reasoning, was determined to be fast and automatic, usually with strong emotional bonds included in the reasoning process. Kahneman said that this kind of reasoning was based on formed habits and very difficult to change or manipulate. Reasoning (or system 2) was slower and much more volatile, being subject to conscious judgments and attitudes.

This sounds a bit different from what I was talking about above. When I’m doing “glorified autocomplete” thinking, I’m still thinking—this isn’t automatic and barely conscious behavior along the lines of driving to work along a route I’ve taken a hundred times before—; I’m just thinking in a shallow way, trying to “autocomplete” the answer. It’s pattern-matching more than it is logical reasoning.

P.S. Just to be clear, I have a lot of respect for Hinton’s work; indeed, Aki and I included Hinton’s work in our brief review of 10 pathbreaking research articles during the past 50 years of statistics and machine learning. Also, I’m not trying to make a hardcore, AI-can’t-think argument. Although not myself a user of large language models, I respect Bob Carpenter’s respect for them.

I think that where Hinton got things wrong in the quote that led off this post was not in his characterization of chatbots, but rather in his assumptions about human thinking, in not distinguishing autocomplete-like associative reasoning with logical thinking. Maybe Hinton’s problem in understanding this is that he’s just too logical! At work, I do a lot of what seems like autocomplete—and, as I wrote above, I think it’s useful—but if I had more discipline, maybe I’d think more logically and carefully all the time. It could well be that Hinton has that habit or inclination to always be in focus. If Hinton does not have consistent personal experience of shallow, autocomplete-like thinking, he might not recognize it as something different, in which case he could be giving the chatbot credit for something it’s not doing.

Come to think of it, one thing that impresses me about Bob is that, when he’s working, he seems to always be on focus. I’ll be in a meeting, just coasting along, and Bob will interrupt someone to ask for clarification, and I suddenly realize that Bob absolutely demands understanding. He seems to have no interest in participating in a research meeting in a shallow way. I guess we just have different styles. It’s my impression that the vast majority of researchers are like me, just coasting on the surface most of the time (for some people, all of the time!), while Bob, and maybe Geoff Hinton, is one of the exceptions.

P.P.S. Sometimes we really want to be doing shallow, auto-complete-style thinking. For example, if we’re writing a play and want to simulate how some characters might interact. Or just as a way of casting the intellectual net more widely. When I’m in a research meeting and I free-associate, it might not help immediately solve the problem at hand, but it can bring in connections that will be helpful later. So I’m not knocking auto-complete; I’m just disagreeing with Hinton’s statement that “by training something to be really good at predicting the next word, you’re actually forcing it to understand.” As a person who does a lot of useful associative reasoning and also a bit of logical understanding, I think they’re different, both in how they feel and also in what they do.

P.P.P.S. Lots more discussion in comments; you might want to start here.

P.P.P.P.S. One more thing . . . actually, it might deserve its own post, but for now I’ll put it here: So far, it might seem like I’m denigrating associative thinking, or “acting like a chatbot,” or whatever it might be called. Indeed, I admire Bob Carpenter for doing very little of this at work! The general idea is that acting like a chatbot can be useful—I really can help lots of people solve their problems in that way, also every day I can write these blog posts that entertain and inform tens of thousands of people—but it’s not quite the same as focused thinking.

That’s all true (or, I should say, that’s my strong impression), but there’s more to it than that. As discussed in my comment linked to just above, “acting like a chatbot” is not “autocomplete” at all, indeed in some ways it’s kind of the opposite. Locally it’s kind of like autocomplete in that the sentences flow smoothly; I’m not suddenly jumping to completely unrelated topics—but when I do this associative or chatbot-like writing or talking, it can lead to all sorts of interesting places. I shuffle the deck and new hands come up. That’s one of the joys of “acting like a chatbot” and one reason I’ve been doing it for decades, long before chatbots ever existed! Walk along forking paths, and who knows where you’ll turn up! And all of you blog commenters (ok, most of you) play helpful roles in moving these discussions along.

37 thoughts on “I disagree with Geoff Hinton regarding “glorified autocomplete”

  1. Hinton is arguing the standard line that language is AI-complete. That is, in order to solve language, you need to solve all of AI. That’s one of the reasons I stayed in natural language processing in the first place—this claim has been around since the 1970s or before. Why do people make this claim? At the very least, for a system to be good at generating turns in a dialogue, it has to represent what is going on in the linguistic context (for ChatGPT, that’s my current input as user, plus all the previous user inputs, assistant replies, and system prompt). If it couldn’t figure out who did what to whom and in what order, it couldn’t formulate sensible answers or use pronouns. However it keeps track of such things is what we tend to call a model of the world. People are dissecting transformers (attention and multi-layer perceptrons) to see how they model the world and keep track of attention—it’s an odd kind of artificial neuroscience.

    Herb Simon liked to distinguish logical chains of reasoning, at which humans are very slow and associative reasoning, which is very fast. Herb argued that our brains have a lot of parallelism, but the clock cycle is very slow. He worked in experimental cognitive psych later in his career, and he was very focused on error analysis. Humans are pretty bad at chains of logical reasoning—we go wrong in all sorts of ways [syllogisms are hard, as are chains of generics/defaults] and have all sorts of biases (one could argue Herb founded behavior econ, for which he won an econ Nobel prize). Associative reasoning has other biases having to do with attention, focus, and selection bias during training. Herb had a theory of learning based on production systems (early AI approach, for which he won an ACM/Turing award), whereby when we learn, we take logical chains of reasoning and compose them so that we can use them in an associative way in practice without having to reproduce the logical chains. Which I guess makes our thinking a kind of continuum. One could say that Andrew’s free association is informed by years of logical reasoning, simulation, math, and experience.

    I think the question to ask is just how good a system like ChatGPT could get at modeling the world and doing chains of reasoning. It fails at conversation in just this way—it will go off the rails on long histories and isn’t good at chaining reasoning together that requires sub chains of reasoning (you can hack this by giving it federated instances to use, but it still doesn’t recurse well).

    My question is whether there is any point at which a chatbot would be considered intelligent. If not, does that mean written output by people can never be considered intelligent on its own? I’m just not clear on what notion of intelligence people are using that rules out even the current instance of ChatGPT. If Andrew only responded through the blog (hence, acted with a chatbot’s interface), would we no longer consider the blog’s output intelligent because it’s just text? This always reminds me of Daniel Dennett’s discussion of vim in “Consciousness: How much is that in real money?”, where he talks about people’s attitudes toward consciousness being like their attitude toward their native currency—they know it’s real, but all that other stuff is like Monopoly money—not at all the same thing.

    • Bob:

      The distinction I referred to in my post, contrasting associative reasoning with logical reasoning, occurs in my writing as well as in my speech. For example, to write this article I needed to spend some time sitting down and thinking. The main idea of the paper is arguably a product of associative reasoning, and much of the writing has some auto-complete aspects (although I don’t think anything close to it could’ve been produced by any existing chatbot), but many of the details required that special focus.

      Also, I’m not arguing that a chatbot could never be considered intelligent. I’m just specifically arguing against Hinton’s claim, which is that “by training something to be really good at predicting the next word, you’re actually forcing it to understand.” For me, I do a lot with autocomplete while only understanding on a shallow level. Then again, your experience is different than mine; as noted above, I get the impression that you do a lot more thinking and a lot less associative autocompletion than I do, at least at work.

  2. Before AI chatbots, I used to recognize a kind of student essay that was “Excellent form, no substance”. It’d have perfect APA formatting, tons of references in all the right places, But the overall ideas would essentially just be a carbon-copy if the sample papers I’d post (if the sample paper was an RCT for therapy, so was their paper). I guess it’s still impressive in the same way watching someone look at a drawing and reproduce it (which I couldn’t do to save my life!). They’d get good grades on a rubric, but I always felt kinda uncomfortable at how shallow their thinking was.

    I always preferred reading papers that were grammatically rough, but creative, sometimes introspective, and with fewer references but you could tell they actually read them. You could tell they were trying to solve things through first principles, sometimes failing, but always more interesting with evidence of thinking hard about things. The longer I taught, the more I just ignored the “form” for the most part in favor of the actual ideas to try and encourage this.

    Once chatbots came around it was almost revelatory: The chat bots wrote just like these “form over content students”. Chat GPT sounds like this specific kind of undergrad writing I don’t want to elicit to me. Makes me wonder if the glorified autocomplete analogy really is apt for these kinds of student writers, maybe the writing process for these students is like a chat bots process!

    Sorry for the rambling post, but all you wrote here reminded me of this stream of thoughts I had but never wrote down.

    • It’s worth noting that the specific model that runs on ChatGPT has been trained to extremely restrictive criteria of refusing to discuss (or giving uselessly evasive content when prompted to discuss) a very broad range of topics. This includes politically contentious subjects, but also philosophical or ethical issues, as well as many things requiring creative thinking: whenever it ignores a question and responds with some drivel like “Sorry, I don’t have experiences, emotions, or opinions, I’m here to assist to the best of my abilities within the scope of the information and capabilities I possess” that’s because it was RLHFed into oblivion. Presumably, this was done in the hopes preventing people from get it to say something that caused people to get mad at OpenAI on social media; regrettably, most interesting intellectual discourse is in the aegis of “things that someone could get mad about on social media”.

      The previous model (ChatGPT is GPT-3.5, davinci is GPT-3), which could be accessed through the OpenAI API, was capable of all sorts of wild and imaginative text completion. It would gladly spend hours writing essays in favor of any abstruse position imaginable, profane soliloquys, fanfic, dialogue between Aristotle and Donkey Kong arguing about neuropsychology, so long as you were able to prompt it with a couple lines to establish a pattern — the extremely constricted style of ChatGPT is not at all indicative of a lack of capability. That’s not to say that GPT-3 was the genius voice of our generation, but it was not forced to give such dreary output as you describe here.

  3. “Autocomplete” carries too much pejorative baggage. I’m not expert on how the brain functions, but I don’t believe we know enough to rule out such associative thinking as the only way that human brains work. Babies “learn” not to touch hot stoves in very similar ways that Chatbots “learn” things. Do they “understand” the meaning of hot stoves differently than the way ChatGPT “understands?” Surely it is different, but I haven’t seen any explanations of how they differ that make much sense to me. We can easily fool ChatGPT to produce stupid responses, but we can also fool people to do the same.

    I’m not sure that “autocomplete” and “deep understanding” are different in kind or only differ in degree. Some autocomplete is simpler than others, but it is possibly all autocomplete in one form or another. I’d like to believe there is “true creativity” but it might just be more complex autocomplete – I really don’t know. I do think the System 1/System 2 distinction (as expounded by Kahneman) is something different – it seems to relate more to the physical way that the brain functions. It is possible that autocomplete and deep understanding also relate to different ways that brains physically function – again, I don’t know. But even if there was a physical/biological difference, that may not say anything about AI. After all, the physical process for an AI is clearly not the same as a biological process. Whether it results in similar autocomplete functionality is different than the physical mechanism it uses to accomplish that. At least I think this (am I autocompleting or creating new ideas now?).

    • Dale:

      1. “Autocomplete” was the term that Hinton used, and not in pejorative way.

      2. Based on introspection, I don’t believe that such associative thinking is the only way that human brains work. As discussed the above post, when I’m having research interactions, much of the time I do some version of associative thinking, but there are times where I stop, focus, and engage in careful logical reasoning.

      • I’m not so sure that stopping, focusing, and careful logical reasoning is different than associative thinking. I agree that it feels different, but that doesn’t mean it is different. How do we learn logical reasoning?

    • With a grammatically more complex language like Serbian, which involves in-word conjugation with declensions, number and gender, kids learn to “understand” rules of all the conjugations and compose new ones very quickly (exceptions are where they fail, which is what they need to memorize, or create a set of auto-completion rules that span more than a single word — similar to a chatbot AI). While ChatGPT has been trained on a small (comparatively) Serbian-language corpus, it really struggles to figure out the right word forms in Serbian (I am impressed that it even produces mostly legible Serbian at all). But, even with a modified tokenization, because of multiplied complexity, I would expect that the size of the Serbian corpus would have to be a large multiple of the English one, and that’s going to be hard to achieve for such a — again comparatively — small language.

      Obviously, with a significantly enlarged corpus, it will be able to match what kids can do, but from my very limited sample (my two kids following unexplained rules at ages 2-3, most noticed when they get it wrong on exceptions — something they could not have heard anywhere), humans can figure out good language patterns based on a much more limited “knowledge exposure”. And kids at 5 are already very eloquent in Serbian, compared to ChatGPT as of today (which writes like a foreigner with a large vocabulary).

      I am not certain this is a different type of intelligence, but it totally looks like achieving better results with much less data, and definitely on a different trajectory: kids learn differently from the chatbots of today. (With the caveat I mentioned: extremely limited sample size, also not controlled for any biases or predispositions, since it’s two of my own kids :))

  4. Previously, [you discussed](https://statmodeling.stat.columbia.edu/2021/05/23/thinking-fast-slow-and-not-at-all-system-3-jumps-the-shark/) how Systems 1 and 2 appear not to encompass all reasoning. There, you proposed a sort of System 3, which involves no reasoning whatsoever. [On Lobsters](https://lobste.rs/s/nlwirr/thinking_fast_slow_not_at_all_system_3), I gave a potential way of understanding System 3 in terms of *memetics*, the science of the building blocks of cultures.

    In short: System 2 is syntactic, logical, evidence-driven, systematic, and skeptical; it is what we wish we always used. System 3 is wholly memetic, with each response drawing from associative memory and cultural symbols. System 1, the one we’re comparing to Markov chains, is mostly System 3 but has the word choice and cadence of System 2; it is an attempt to mimic System 2’s cold hard logic, but without actually engaging with the presented material.

  5. As quoted above from Hinton: “Suppose you want to be really good at predicting the next word. If you want to be really good, you have to understand what’s being said. That’s the only way.” How does he know that this is the *only* way? I don’t see any reason here. But his argument totally relies on this. He just made up a premise that makes the conclusion seem logical, but if we don’t buy the premise, there’s nothing left. In fact one could turn the argument around and say, because ChatGBT doesn’t understand (which I think is about as strong or weak a premise as Hinton’s), and ChatGPT is really very good at autocomplet, we know that understanding is *not* the only way.

    • I agree.

      When I first started playing around with chatGPT, I guess that was waaaay back a year ago or so, my initial thought was “wow, this really understands what I’m asking.” I could give it quite a sophisticated prompt and get back an answer that seemed as though it could only be generated if chatGPT “understood.” But then, as I documented here https://statmodeling.stat.columbia.edu/2023/04/18/chatgpt4-writes-stan-code-so-i-dont-have-to/ , when I tried to get it to embody some concepts in computer code, it couldn’t do it, or rather it couldn’t do it correctly. It would say it understood what I was trying to do, it would explain it in words, and it would provide syntactically correct code that would run, but the model it said it was implementing was not in fact correctly implemented.

    • Yes, that argument of Hinton appears to be an exercise in circular logic.
      He has provided no basis for the assertion that “…if you want to be really good, you have to understand what’s being said. That’s the only way.”

      On the contrary, the essence of his assertion seems to fall apart with the following evidence:
      If you ask ChatGPT a regular question in the guise of a riddle, it fails to recognize that the statement has a plain/trivial answer, and it goes onto produce an elaborate answer to the closest riddle that presumably was part of the corpus it was trained on. Yet, a child could really understand the question and answer it right away even though that child may fail to solve/answer the associated puzzle. The following is an example to try with ChatGPT:

      A person is standing at a switchboard that has two switches on it, one of which turns on the light bulb inside the nearby room. The person is allowed to use the switchboard only once and has to answer, upon entering the room, which switch turns on the light. Help this person solve this riddle.

      Even if you give it clues and ask it to retry, ChatGPT continues to espouse the answer to the associated riddle that involves three switches (as opposed to two in the above non-riddle) and the solution of which depends on delaying entrance into the room to allow for the bulb to become hot.

      Unless we abandon what it universally means to understand something, it is hard to see how the glorified autocomplete automaton really understands anything.

      • I don’t know about that. I just tested your riddle. Yes, ChatGPT initially responded with the classical 3 switch solution, but then I just prompted it by saying “There are only two switches, not three.” and it correctly modified it’s answer to the 2 switch case. Maybe that was more direct than the clues you were trying to give it to redirect.

        • Mike, I tried again recently (with 3.5). Even though it rephrases the answer, like I got it to do months prior, in response to the feedback that it erred by considering three bulbs instead of two, the essence of the answer is still the same. It involves keeping a switch on for five minutes or so in the beginning with the first switch, and the rest of the text does not indicate any understanding of the essentials.
          Can you post here the response you received that did not seem nonsensical?

      • Well, I gave the same prompt to ChatGPT (3.5) and got this:

        “This classic riddle involves a simple strategy to determine which switch controls the light bulb using only one attempt. Here’s a step-by-step guide:

        Turn on one of the switches: Let’s call this switch “Switch A” and the other one “Switch B.”

        Wait for a few minutes: After turning on Switch A, leave it on for a few minutes.

        Turn off Switch A: After a few minutes have passed, turn off Switch A.

        Turn on Switch B: Now, enter the room where the light bulb is and observe its status. If the light is on, then Switch B controls the light. If the light is off but warm to the touch, then Switch A controls the light. If the light is off and cool, then neither switch controls the light.

        The logic behind this strategy is that when you initially turn on Switch A, the light bulb will start warming up. If you leave Switch A on, the light will obviously stay on. However, if you turn off Switch A and then turn on Switch B, you’re completing the circuit only if Switch B controls the light. If the light is off, but warm, then Switch A controls the light since it was the one initially turned on and warmed up the bulb.

        This way, with just one attempt, you can determine which switch controls the light bulb.”

        If I had gotten the responses you did, I would have asked the followup question: can you respond with only two switches rather than three? I don’t know what the response would be, but my point is that it usually performs much better when you are specific and when you ask followup questions. This doesn’t address whether, where, or how it is useful, but I think these examples of how it fails are fairly uninformative. I’m sure we can generate lots of human examples where lack of understanding is revealed (though they are likely to differ from AI errors in a number of dimensions).

        • I mentioned (in the second last paragraph) that Even if you give it clues and ask it to retry, ChatGPT continues to espouse the answer to the associated riddle that involves three switches (as opposed to two in the above non-riddle) and the solution of which depends on delaying entrance into the room to allow for the bulb to become hot.
          You may try prompting it to correct itself too. I’d be very surprised if you get it to give any answer that is not nonsensical substancewise.

          What this reveals is significant in that this thing is in essence nothing more than a glorified autocomplete engine in that it can hardly transcend the lexical and the syntactic patterns it extracts from the training corpus. Those answers were apparently based on statistical training built on scraping the Web forums/sites that address puzzles.
          The model solely relies on statistical pattern matching and with the absence of training materials covering such trivialities as figuring out which switch turns on the light when there are precisely two switches on the board, there does not appear to be any way to actually force it to understand things that are too uncomplicated to have been recorded in texts.

  6. If you train a model with tons of videos of billiard being played, it’s going to predict fairly accurately the possible subsequent trajectory or scenes from a given point. But does it grasp the underying principles of classical mechanics that govern those interactions and yield the outcomes of mobilities?

    It is hard to see how the model is “actually forced to understand” in such setting.

    • This is an odd analogy. Does a person who plays billiards a lot, becoming for example a championship-level billiards player, “grasp the underlying principles” better or worse than a physicist who has studied classical mechanics? Does “grasp” mean phrase in a mathematical form? Consciously articulate? Use to predict outcomes to high accuracy?

      • Fair point. Let me try to infuse some further clarity into the observation.

        First, the players were not meant to be relevant in my prior comment. The subject is the entity that observes/captures the visual aspects of the game and models and analyzes them to some effect. Here is a fresh look at the observation with three candidates as illustrative cases:

        (a) An ML/statistical model, when trained with large piles of images, videos etc. of the played games, is expected to become quite good at predicting the next scene. Although the structure and the weights in the trained model’s neural networks (or regression parameters) represent some form of the acquired knowledge for the narrow task, they almost certainly have no discernible relations to any sense of force, mass and associated rules of mechanics.

        (b) A four year old regular child observes the sport (and maybe tries to play it too by striking the balls). Asked to predict the scene/trajectory right before a strike, he predicts with poor accuracy. It is reasonable to conclude/expect that he understands neither the rules of the sport nor the notions/principles driving the movement of the balls on the table.

        (c) A fifteen year old (who may not have studied any physics) happens to be the entity/observer (and/or possibly a player). Predictions she make right before a strike about the trajectory are probably moderately accurate. Yet, she can presumably convey some intuition, not without flaws, about mass, force and how they relate to the directions and the distances traveled by the balls. She may articulate these as text (assuming no familiarity on her part with formal mathematics/physics).

        In the standard lexicon, it may not be disputed to claim that some understanding or grasp, expressed in natural language informally, of what notions/principles determine the movements and trajectories of the balls is present in the subject in case (c) as opposed to in case (b). In the same vein, one can reasonably conclude that the subject in case (a) produces no evidence of such understanding per se despite the predictive accuracy it brings in, rendering Prof. Hinton’s claim questionable.

        And importantly, the fifteen year old in case (c) can likely transfer this understanding/grasp, which is probably not without its flaws, to different settings (.e.g., different shapes of objects and surfaces in varied conditions, in different dimensions — one line instead of a 2D plane). No such transfer of knowledge seems foreseeable with the trained model (a), implying that no understanding was likely acquired in the first place.

        To recap, the concerns here have to do with Prof. Hinton’s contention that being good at predicting something is a sufficient condition for presence of understanding the essence of that thing. The connotations of the term “understanding” generally transcends capacities to predict from statistical correlations under very rigid representations.

  7. Now this is an interesting post and thread. It’s something I’ve thought about too, but pretty much in isolation. I’ve had this idea about homological reasoning: the problem in front of me has a comparable structure to this other problem I think I know the answer to, so I’ll graft the structure of that solution onto this new one. I believe I’ve seen a lot of this, and I worry that it is often too lazy, but maybe its efficiency outweighs that.

    Now, the question is whether homologous thinking is the same as associative, or at least one form of it. I’m not sure. There is a certain amount of analysis involved to recognize the structural parallelism; often it’s not obvious. And there’s also false parallelism to be wary of. Association in the ChatGPT sense seems different, although associations in past use could arise as a consequence of perceived structural parallelism. So perhaps my question is whether AI capabilities current or projected encompass response by homology in contexts where the parallels hadn’t been previously recognized. Also whether AI could identify new parallels that haven’t been recognized yet — the precise but dull paper problem Sean M referred to.

    • I think this is largely the point made in Lakoff and Johnsons “Metaphors We Live By”, and the field of conceptual metaphors. A lot of metaphorical reasoning becomes embedded in language as the metaphor becomes commonplace, and may not be regarded as metaphorical anymore. I’d expect LLMs to use conventional metaphors, but it would be interesting to see whether they are able to produce novel metaphors.

      As you suggest, the ability to engage in homological reasoning may be an indicator of understanding. Mapping one phenomenon to another with homological features but no direct relationship and no existing conventional metaphorical connection in language may require representations of the phenomena that contain sensory or practical experience rarely verbalized. Some metaphors work by pointing towards such similarities that are instantly recognizable to someone with that experience, but hard to grasp from the reference to linguistic representations alone. Good metaphors represent expansions of language, after all.

      • As an example, Norwegian prime minister Per Borten famously referred to heading an unsuccesful coalition government as being like trying to carry some poles on your shoulder, and how difficult that is if those poles start to point in different directions (“sprikende staur”), comparing that to the backpack of the PM in the following single party government.

  8. I’m going to comment with my first impression without reading the preceding comments (yet). I expect they have some good things to say but I want to get this off my chest first.

    I don’t see any contradiction between the quote and your (Dr. Gelman’s) response. Both can be true: a really good autocomplete is a useful, significant accomplishment; and there is a deeper level waiting to be obtained.

    (I’ve made some lame suggestions myself in Internet comments as to how to reach another level, and there was the paper cited in comments a few months ago about combining LLM’s with another system.)

  9. As a student, while following lectures, there was a game I played — I’d listen, pay attention, and try and predict the next word/sentence/idea/calculation-step the lecturer would give. For good lecturers (this is circular, I ended up defining a good lecturer as someone where I was able to do this often…) I would be able to follow their logic well enough that I could do this… and when they said something I hadn’t expected, I knew I needed to pay attention, as there was a new idea there!

    But doing this required me to understand the subject, so understanding led to autocomplete… this does NOT imply that autocomplete implies understanding!

  10. I don’t think Hinton is arguing anything about associative pattern matching comprising intelligence, so I don’t think you are disagreeing with him.

    He’s saying that there is no upper bound on intelligence implied by “predict the next word”. If you have two models, and one performs “careful thought” more unreliably, the next word predicted by a different model which is more reliable will be a better prediction, when the next word requires careful thought to predict. His point is that decreasing the error score from predicting the next word in real-world tasks and language is the only feedback loop you need for reaching superintelligence. If the next word requires any aspect of intelligence to predict well, the training process will attempt to learn how to perform that aspect of intelligence.

    The model still needs to have the capability of being able to “approximate all the functions” that supply that decreased error score in next-word prediction. The surprising thing about models like GPT-4 is the amount of internal capability and abstractions over the training data they build out during training, to continue providing that supply of decreased error.

    • Chris:

      Hinton said, “by training something to be really good at predicting the next word, you’re actually forcing it to understand. Yes, it’s ‘autocomplete’—but you didn’t think through what it means to have a really good autocomplete.”

      In my experience of internal autocomplete (not the computer program, just my brain going comfortably at half-speed, as I usually run it, even during many work meetings), autocomplete is not the same as understanding. To understand, I need to focus and go beyond surface autocompletion.

      So I’m disagreeing with Hinton because the part of me that does a really good autocomplete—the sort of thing that allows me to use my accumulated understanding and life experience to write this blog comment, for example—will not do things like write this paper.

      From my perspective, my own experience at work is a counterexample to Hinton’s statement.

      That said, if you take his statement literally, it’s kind of meaningless. I say this because his condition is that a bot “be really good at predicting the next word.” But that’s not gonna ever happen: there are just too many possibilities for the next word! Being pretty good at predicting the next word, that’s what impresses people about that texting gadget on the phone. What impresses and amazes people about new chatbots is not that they predict the next word of what you’re gonna say, it’s that they can answer your questions, write computer programs to order, write new poems, etc.

      So I was interpreting Hinton’s statement not literally, as about a nonexistent and impossible program that could be really good at predicting the next word, but rather a statement about existing chatbots, which are indeed very impressive. I interpreted his quote (which came from conversation with a reporter; it was not a considered statement that he carefully put into a document he was authoring) as saying something like: “Chatbots are amazing, but some people say they’re just glorified autocomplete. But, to construct something that can behave like an impressive chatbot (answer questions fluidly, give solutions to challenging problems, etc.) requires understanding.”

      And my reply was that, no, I myself can and do answer questions fluidly, give solutions to challenging problems, etc., while in what might be called “chatbot mode” or “associative mode”—and that’s not the same thing as understanding. I know this from long experience at being able to contribute to discussions without full focus and lacking understanding.

      I’m not trying to say that computer programs could never understand; I’m just saying that autocomplete, or generative completion, or whatever it might be called, does not require understanding. In short: it’s possible to B.S. about just about anything without understanding, just by moving words around in familiar patterns.

      So I hold to my substantive point. Regarding what Hinton meant by that statement, and whether it was just a throwaway comment that he would not try to defend, I have no idea.

      • Fair enough, thanks for the reply. I did understand it differently: I think he was offering us a simple explanation of “how (e.g.) GPT-4 works” — the surprising result that when you optimize “predict the next word” to grand enough scale, the model moves *past* surface statistics and starts pulling in the facts about the world it can learn from the language it was exposed to, to continue making better and better predictions. I don’t think that’s talking about nonexistent and impossible programs, but what today’s LLMs have shown glimmers of capability for, and tomorrow’s may show mastery of.

        That’s why I think he is challenging us to separate our understanding of what an “autocomplete system” is from the implication that only surface language statistics and superficial levels of analysis are being applied to the prediction, and from the implication that a maximum bound on intelligence is present, which are connotations you’re somewhat reinforcing with this post.

        A description of GPT-4 as a model that kept seeking more understanding from its training data to improve at predicting the next word until it started to seek *the reason someone wrote those words in that order in the first place* is a powerful intuitive explanation of how today’s state-of-the-art LLMs became as capable as they are; but it’s also a warning that there is no “missing mechanism” past “improve your prediction of the next word” that was required for an LLM to get there, and perhaps onwards from there to superhuman intelligence given only more compute and data.

        • Chris:

          I agree with the statement that to perform convincing generative conversation (not quite “predicting the next word,” but I get the connection) that the model is moving “past surface statistics and starts pulling in the facts about the world it can learn from the language it was exposed to, to continue making better and better predictions” (I’d say “generative statements,” but, again, I get the point).

          The part where I disagree is the claim that this implies that the bot “has to understand what’s being said.” Again, I myself feel that I can do good and useful generative conversation without understanding. In my experience, understanding is something different.

          I’m not saying that future chatbots won’t be able to demonstrate understanding. I’m just disagreeing with the claim that the organization of knowledge required to predict or generate the next word requires understanding.

          It may be that there will be a way to bootstrap from associative reasoning and chatbot-style interactions (that’s the thing I can do so easily all the time, as here) to understanding (that’s the hard thing, where I really need to focus to understand something new that I haven’t understood before). If that happens, I won’t say that I was wrong in my above post; I’ll just say that understanding is not the same as associative reasoning, and it was another step for them to build a chatbot that was able to understand new things. It’s also possible that such a chatbot will not be built because there will be no need for it; maybe chatbot-style associative reasoning will be good enough to serve the purposes we need, in the same way that we have machines that do all sorts of things and we haven’t found the need to build machines that mimic existing animals.

      • You wrote: “In short: it’s possible to B.S. about just about anything without understanding, just by moving words around in familiar patterns.”

        Exactly! And congratulations!!! You have reinvented the LLM technology.

        This is, of course, the axe that I always grind: the underlying LLM game is the processing of strings of undefined tokens. There’s no “understanding” there at all. Period. Full stop. If you think you see “understanding” in it’s output, you’ve made a mistake.

        Having fun with what LLMs can do is, well, fun. My _opinion_ is that that’s intellectually vacuuous, but that’s because the question I want the psychology and comp. sci. universes to think about is why are humans so amazingly good. We’ve got Lie Algebras, the Standard Model, and Bebop. We understand what “multiplication” is, at a variety of levels. But once you get to sixth grade, you know that (a) there’s a right answer, (b) that there’s a way to get that right answer, and (c) you are likely to make a mistake doing that procedure for big numbers. LLMs can talk about all this stuff beautifully, but the underlying technology doesn’t link the _concept_ of multiplication to anything other than the contexts in which the words have been used. Which is why Bing come up with the amazing stupidities it did discussed in a previous thread.

        Bottom line: LLMs “do” arithmetic by looking up the answer, that is the very definition of “stupidity”, and that’s the technololgy’s most fundamental level.

        But you knew I’d say that. Heck, if you asked ChatGPT, it could have predicted it.

        • +1

          One thing we have learned from chatbots is that the Turing Test does not tell us anything useful about technology. It does tell us that we need a more rigorous definition of human intelligence.

        • It took humanity about 200,000 years to get Lie Algebras and everything else we take for granted. The basic, underlying method was trial and error, the same as evolution used to produce us. LLM’s are a demonstration of what trial and error can do when you supercharge it.

          I wonder how long it would take a human brain whose only inputs and outputs were collections of binary digits, not particularly organized for ease of instruction, to be able to make reasonable replies to such inputs?

          A related experiment has occurred naturally. According to Wikipedia, without growing up with human contact, instruction, and encouragement, known cases of feral children (after rescue as tennagers or older) have never learned human speech–much less all the wonderful things we learn by the sixth grade.

          I suspect terms like ‘understanding’ are (‘understandable’) somewhat chauvinistically based on how humans experience things. Does Windows ‘understand’ what I want when I click on the Chrome icon? What difference does it make whether it does or doesn’t, as long as it works?

  11. I can hardly tie my statisrical shoelaces. Andrew ties statistical shoelaces effortlessly.

    Andrew said “I’m just thinking in a shallow way, trying to “autocomplete” the answer. It’s pattern-matching more than it is logical reasoning.”. True. But…

    I think your selling yourself short with that statement. That shallow way has developed over ? 1 million years?

    If I were you in your meetings, I’d either hallucinate answers and feel shame, guilt and dumb, or look stange due to my cognitive overhead that I’d be asked if I needed a break or a glass of water.

    So your “I’m just thinking in a shallow way” could be because your statistical brain autocomplete, has moved out of System 2  into S1, and System 1 is now able to immediately and effortlessly spout sensible replies,  whereas I’m still cranking up my system 2, as the subject is ‘not in my bones’, as is your subject of statistics.

    “Human- versus Artificial Intelligence

    “For example, when we tie our shoelaces, many millions of signals flow in and out through a large number of different sensor systems, from tendon bodies and muscle spindles in our extremities to our retina, otolithic organs and semi-circular channels in the head, (e.g. Brodal, 1981). This enormous amount of information from many different perceptual-motor systems is continuously, parallel, effortless and even without conscious attention, processed in the neural networks of our brain (Minsky, 1986; Moravec, 1988; Grind, 1997). In order to achieve this, the brain has a number of universal (inherent) working mechanisms, such as association and associative learning (Shatz, 1992;Bar, 2007), potentiation and facilitation (Katz and Miledi, 1968; Bao et al., 1997), saturation and lateral inhibition (Isaacson and Scanziani, 2011;Korteling et al., 2018a).

    “These kinds of basic biological and perceptual-motor capacities have been developed and set down over many millions of years. Much later in our evolution—actually only very recently—our cognitive abilities and rational functions have started to develop. These cognitive abilities, or capacities, are probably less than 100 thousand years old, which may be qualified as “embryonal” on the time scale of evolution, (e.g. Petraglia and Korisettar, 1998; McBrearty and Brooks, 2000;Henshilwood and Marean, 2003).

    https://www.frontiersin.org/articles/10.3389/frai.2021.622364/full

    And when discussing AI, I think the first and last questions need to be – are we over anthropomorphising?

    Thanks.

  12. Reading these comments leaves me wondering what the definitions are for:
    Understanding
    Remembering
    Creating
    Choosing

    I know these when I see them, but I can’t really define any of these in a clear useful manner. I’m not even sure if they are biological, chemical, physical, or metaphysical processes. It is clear, however, that the way a machine works is different than the way a human works, although I’m not sure when (under what conditions) the different operations translate into meaningful differences in outcomes.

    Now, it is entirely possible that more knowledgeable people in other fields than me can define these terms. But absent that, I think all these discussions about AI suffer from definitional inadequacy. I truly want to believe that humans are more than simple (or even complex) parlor tricks, but it eludes me to define how in any precise way.

    • Nobody in any field can define these terms in a unique way which everyone will find compelling because they are all polysemous. The important thing for productive conversation is to have a shared definition (or several) in mind that is useful for addressing the questions you care about. From this comment, it seems like you care about more than one thing, and for that reason, I would suggest you will need more than one definition.

      In particular, when you say “I’m not sure when (under what conditions) the different operations translate into meaningful differences in outcomes”, this points toward the need for functional definitions of the terms you listed. In a context where we only care about outcomes, functional equivalence is all the equivalence we need to say that two things are the same.

      This is also relevant to Andrew’s main post. While his personal experience of a qualitative distinction between a loose, associative mode of thinking and an effortful, focused one is interesting and relatable, it takes an additional inferential step (involving analogical reasoning about the information processing of very different systems) to conclude that this distinction will map onto what you called “meaningful differences in outcomes”. In other words, without additional argument, we shouldn’t necessarily expect that an AI system operating on associative principles can never produce outputs which would have required deep, effortful, and focused work for a human to produce (note that I am not claiming good arguments for this view do not exist, just that they haven’t been provided in this post).

      Moving onto the second thing you seem to care about, which is whether or not “humans are more than simple (or even complex) parlor tricks”, we may need to revisit and adjust our definitions. In particular, it may now be relevant how exactly outcomes are produced (i.e., algorithmic, mechanistic, and even phenomenological explanations), even if those outcomes are indistinguishable from those that may be produced by simple mechanisms which (for the sake of argument) deserve the reductive label of “parlor tricks”.

  13. This discussion reminds me of the all the arguments over interpretations of quantum mechanics where many people felt there had to something deeper. “Shut up and calculate” quantum mechanics works just fine, and I believe Hinton is just saying that improving autocomplete may be all that’s needed in AI. God does apparently play dice with the universe, and intelligence may just be glorified autocomplete.

    • David,

      My above post is more about human reasoning than what computers do. Improving autocomplete may indeed be “all that’s needed in AI”—I guess it depends what people’s (or machines’) needs are. My point was that the ability to be really good at predicting the next word—or, more precisely, to be able to generate strings of words, sentences, paragraphs, etc., to order—seems to me to match well to the free flow that is what I do most of the time in conversation or writing. It does not seem to match so well to the occasions when I do focused thought, which is when I force my understand, or at least attempt to do so. Based on this experience of my own thought processes, I disagree with Hinton’s claim that for a chatbot to be successful it must have the equivalent of understanding. Again, I say this because it seems to me that chatbots do something not too different from what I do, during the 90%+ of the time that I’m skating on the surface, without any need to do the thinking part.

      I’m not saying necessarily that there’s “something deeper” there, just different, and I could well imagine that a future program will be able to think and understand as well, maybe even using the same general principles as existing chatbots, just manipulating concepts and mathematics rather than words and phrases, and mixing in some logical reasoning with the association.

Leave a Reply

Your email address will not be published. Required fields are marked *