Body language and machine learning

Riding on the street, I can usually tell what cars in front of me are going to do, based on their “body language”: how they are positioning themselves in their lane. I don’t know that I could quite articulate what the rules are, but I can tell what’s going on, and I know that I can tell because I make predictions in my mind which are then confirmed by what the cars then actually do. (Yes, there could be selection bias, so if I really wanted to check for sure, I should record my guesses and check the error rate. Whatever.)

Anyway, the other day I was thinking about how this is an example of machine learning. No causal inference (sorry, Judea!), just pure prediction, but “machine learning” in that my brain has passively gathering data on car positioning for the past few decades, and at some point it decided to associate that with driving decisions. I guess it was motivated by me trying to figure out where to go in particular situations. So in many ways this is exactly the kind of problem we’ve been hearing about in discussions of artificial intelligence, with the usual steps:

1. Open-ended data gathering (“big data”),

2. Unsupervised learning with undefined categories (that would be “cluster analysis”),

3. Supervised learning with defined categories once I become conscious of the categorization that I’ve been doing passively until then,

4. Refinement: Once I’m aware of the parameters of this inference process, I can use more active processes to flag the misclassifications and ambiguous predictions and use these to refine my predictions (“diagnostics” and “evaluation”).

I’ve read about these steps in other problems, from image identification to crime detection. But somehow it all becomes more real to me in the context of this everyday example. In particular, I’m aware of the different steps, from passive data collection to the unconscious identification of a pattern through conscious use and refinement of the procedure.

It also strikes me that there is an analogy between consciousness (for humans and animals) and, hmmm, I don’t know what to call it . . . maybe “active programming” in machine learning.

Let me put it another way. Statistical methods when constructed are entirely conscious: design, measurement, data collection, inference: these are all problems that the user must choose to solve. Certain statistical procedures have been automated enough that they could be applied unconsciously: for example, a computer could compute correlations between all pairs of variables, look at distributions and scan for outliers, etc., in the same way that human or animal visual systems can find anomalies without the conscious choice to look for them.

Machine learning is a little different. There are lots of conscious machine learning procedures—various nonparametric algorithms for prediction, classification, inference, decision making, etc.: basically, these are statistical methods, but maybe we call them “machine learning” because they are new, or because they are nonparametric, or because they’ve been developed by computer scientists rather than statisticians, or because they work with big data, etc. But machine learning and AI are also associated with automatic or background or unconscious processes, such as processing of big data without specific goals in mind (sure, you could argue that projects such as the General Social Survey have this feel to them too) or looking for patterns in the background such as my brain did with the car positioning problem.

P.S. The above are my conjectures and loose thoughts. There could well be a literature on all of this. If so, feel free to inform me about it in the comments.

20 thoughts on “Body language and machine learning

  1. A fantastic example that I will steal for future use in classes!

    Not a literature survey, just some quick remarks:

    Some of the most popular techniques in information retrieval (before it got subsumed by machine learning) were originally developed to help explain human semantic memory, including Landauer & Dumais’ Latent Semantic Analysis as well as Griffiths, Steyvers, and Tenenbaum’s Topic modeling. The conceptual connection is clear: in info retrieval we are trying to engineer a system to locate “knowledge” that is appropriate to a context; in cognitive science we are trying to reverse engineer how a biological system solves a similar problem.

    Plus, there’s a reason they’re called “neural” networks–they were originally developed to model systems built out of real neurons!

    Finally, I’m not sure that *anything* in machine learning can be thought of as “unconscious”, except in the sense that someone is unaware of the consequences of their choices. These include the choices of what data to analyze/exclude and what features and measurements are relevant, as well as the choices of which analysis techniques to apply. All of these choices proscribe the set of questions one can ask of the data, as well as the form of any possible answer. But while someone might be “unconscious” of those consequences, they are still consequences of a set of choices rather than something that “emerges” from the structure of the world.

    • gec said,
      “All of these choices proscribe the set of questions one can ask of the data, as well as the form of any possible answer. But while someone might be “unconscious” of those consequences, they are still consequences of a set of choices rather than something that “emerges” from the structure of the world.”

      Yes, yes, yes! And basically the same thing applies to any kind of statistical analysis — the conclusions depend on the choices made in gathering and analyzing the data.

  2. One thing that I like about this analogy is the light it sheds on issues that arise both in human decision-making and algorithmic decision making. The human brain does an incredible job of recognizing patterns and responding to them (as with driving), but only when there is sufficient input to “learn” to make good choices. When the data is more limited, or selectively chosen, then humans are prone to bias and inefficiency. Algorithms have the same problem – when the input data is not rich enough, then the algorithms also are prone to bias and inefficiency. The areas in which the data is or is not adequate, however, probably differ between the two contexts. I’d be interested in research that focuses on examining what these contexts are.

    • Dale said,
      “The human brain does an incredible job of recognizing patterns and responding to them (as with driving), but only when there is sufficient input to “learn” to make good choices. When the data is more limited, or selectively chosen, then humans are prone to bias and inefficiency. ”

      Yes, yes, yes!

    • “When the data is more limited, or selectively chosen, then humans are prone to bias and inefficiency.”

      IT would be interesting to create a problem where one could test – presumably someone has tried this – the difference in bias between conscious and subconscious decisions or pattern recognition with limited or insufficient data. I’d guess when we fill in the blanks subconsciously, we’re less inclined to bias, but maybe that also depends on the problem.

  3. Prof. Gelman,
    Hi there I hope that you and family are well.
    You said…
    ” In particular, I’m aware of the different steps, from passive data collection to the unconscious identification of a pattern through conscious use and refinement of the procedure.” Isn’t this what Amos Tversky and Dan Khaneman explained in their work i.e. System 1/System 2? I am assuming that you are already familiar with their work.

    My robot planning programs, when I was designing them, have no consciousness. Yet, they avoid obstacles and do stuff predicated on conscious models of physics and stats (pulling,pushing, orientation, sensory feedback, data fusion from video). You mentioned something important in that “my brain has passively gathering data on car positioning for the past few decades”, I think that’s the key. I know of no NN that hasn’t been trained in some fashion.
    My work was very constrained to meet a goal, you are unconstrained… you can choose not to avoid one obstacle in “favor” of some other decision that may have an adverse or salubrious outcome.

    Unconscious computing hmmm… I have to think about that. In the 80’s, there were efforts at discovery/chunking if that is what you’re alluding to.
    All the best.

  4. ‘But machine learning and AI are also associated with automatic or background or unconscious processes, such as processing of big data without specific goals in mind’ That last part about no specific goals implies that ML/AI can determine their own goals, but it seems we’re still kind of far off from that at least in mainstream ML practice. As far as I can tell, the success of methods like NN often comes with very well defined problems. Though areas like meta learning would seem to get closer to that, where you want to `learn’ the learning process.

  5. Professor Gelman,
    I am amazed that the deeper some CoCoSci researchers go down what is indeed a whole “literature on this,” the more the “Bayesian Brain” hypothesis seems to match reality. If you haven’t seen this great article–Friston et al 2017, Neural Computation–it lays out an entire framework related to your post, including a biologically-mechanistically-plausible Bayesian approximation that unifies “…active inference, combining earlier formulations of planning as inference…with Bayesian model averaging…and learning…Importantly, action (i.e., policy selection), perception (i.e., state estimation), and learning (i.e., reinforcement learning).” A must read!

    https://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00912

  6. My car-prediction algorithm once noticed a car sneaking out from a side-street, ready to make the jump out in front of me as I bombed down the hill on my bike. I shouted at the driver and pointed at him. While I was doing that a truck coming the opposite direction turned left in front of me and I hit it and flipped over the top of it.

    The algorithm worked but it generated a behavior that exacerbated rather than mitigating the danger.

  7. The book “Surfing Uncertainty” is a fascinating overview of the the protective processing model of the brain, that ties together computational modeling and neuroscience and psychology research to argue that perhaps all our mental processes (perception, memory, cognition, dreaming) are governed by approximately Bayesian predictions driven top-down from a generative model we have learned through processes like your intuition.

    https://www.amazon.com/Surfing-Uncertainty-Prediction-Action-Embodied/dp/0190217014

  8. The only reason we can race is because we anticipate based on what must not or should not happen. You can imagine yourself in a car moving at 200mph, with other cars only inches to a few away, so any movement out of the narrowly anticipated can be fatal. Speed focuses that choice: the clock for examining changes in the perceptive field runs very fast and is highly attuned to anything not correct. This allows them to move inches closer or farther away, as though they arent doing this at such high speed, at such a finely attuned reactive level.

    It’s remarkable to study birds in flock and fish in schools. The degree of reliance on what the others are doing simplifies. An example is that when pigeons are attacked on the ground, like by an owl or hawk, they will explode in every direction, which they determine by angling to stay apart, using their entire perceptive apparatus, which includes air currents, sounds, sight, so they randomize the effect by local choice. By contrast, it’s fascinating to watch flocks work their way around to find the approach to where they’re heading.

    The ability to recognize change from moment to moment is the ability to recognize what is not the answer, not desired, not good to eat, not a sensible place to cross.

    When I was a kid, I watched a CBC documentary about Vietnam in which they followed a mission and filmed the soldiers interacting. One of the most interesting parts, and the reason it came to mind, is the argument that erupted over crossing an exposed field. The officer in charge wanted to cross, but the men refused because they felt this looked like the kind of place where they’d be ambushed. The officer felt this was the best solution, and IMO implicit in his conclusion was that he had to be at x point in a certain time so he was therefore influenced into believing that this was too obvious a place for an ambush. They took the long way around. (By men, I probably mean the direct subordinates, not the entire platoon, but I dont remember.) They were arguing over whether an anticipation would occur, one with horrific results, based on how they perceived whether a condition of normalcy would continue. The potential of death acted like speed in a race car.

  9. It is a good example of how we ourselves are “black box” models. It is strange we’ve come to expect a level of transparency in machine learning that we do not even have of ourselves. At least in machine learning, we can simulate data and get an idea of what is going on in a ‘black box’.

    The main difference from your example and machine learning for me is the data required. We require far less data to come to the conclusion of what that driver will do. I find this sort of research interesting: https://arxiv.org/abs/2009.08449

  10. Your example reminds me of musings over the decades within the AI community about the nature of intelligence. There was a period in history where computers could do a formidable job of tackling (at the time) incomprehensibly large mathematical systems. I think it fair to say that at the time this was considered a feat of intelligence and not mechanical. John McCarthy and others held the Dartmouth conference in 1955; their proposal stated “The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”

    This began an approach that put symbolic reasoning ahead of numbers, and argued probability was “epistemologically inadequate” for artificial intelligence. Despite this, AI repeatedly returned to statistics-like techniques to accompany these systems – see a fascinating (some would use coarser words) critique of probability and statistics in “The MYCIN experiments” by Buchanan and Shortliffe that launched a thousand ships.

  11. Artificial ‘Intelligence’, ‘Neural’ networks, Deep ‘Learning’, Computer ‘Vision’ are just a bunch of new marketing borrowed terms.

    It is really the job of Cognitive Neuroscientists, Attention researchers and such to be in the driver’s seat at the forefront of those ‘new’ disciplines, instead of overly enthusiastic computer ‘scientists’, software developers and such.
    I don’t want to be mean, but there is an enormous body of knowledge in the disciplines listed above that needs to be understood first.

    We are talking cca. 800 billion neurons and an endless no. of interactions in that wetware that is electro-chemical in nature we are trying to model. What is leisurely called ‘consciousness’ is basically a function of (mostly visual) attention and self-awareness is still a black box. I’m not talking about p-hacking-prone ‘couch’ psychology which is frequently (and rightfully) criticized here. Cognition/Attention/Neuroscience research is a different animal.

    Most of our day-to-day life is led by unconscious activities. A small fraction of that is brought to our ‘awareness’ (consciousness), so we think rational decisions are made. Most of Andrew’s car behavior prediction was done outside of his awareness, but it didn’t seem like that to him.

    Until we understand and build a self-aware system that is afraid to die and has what we call attention (taking only what we think is relevant at the time and ignoring the rest), all those catchy marketing phrases are going to stay just that.

    A good non-hype book on somewhat current state of the limitations in brain research:

    https://www.amazon.com/Future-Brain-Essays-Leading-Neuroscientists-ebook/dp/B00M5JXUEM/ref=sr_1_1?dchild=1&keywords=the+future+of+the+brain&qid=1603745715&sr=8-1

  12. I think this a great example of Machine Learning and also leads to the reason why “self driving” cars are actually quite dumb today. It’s an open secret in the tech industry that “self-driving” by Tesla, Waymo and big players is fundamentally driven by long c++ hard coded programs. The “ML” is mostly kept to computer vision – object identification and so on where – granted – huge progress has been achieved in recent years. But current “self-driving” cars lack the ability to learn environment awareness and decision making. “self driving” cars of today are not capable of anything resembling the process described by you.

    One other example: deciding how fast to drive through a corner. Often times, we slow down below the speed limit for a corner. The way a human driver approaches the problem is: we come up with a rough estimate of the speed by visually inspecting the corner or taking cues from road signs or both. Once in the corner, we use our perception of lateral acceleration and car feedback to continuously correct such estimate. This method is basically used with different degrees of success from the teenager that learns how to drive for the first time to professional drivers. How does a Tesla do it today: well maps nowadays come with LARGE amounts of information, including road curvature. Based on that information, there is a hard coded map between curvature and speed, with potential correction for wet conditions. Ernst Dickmanns points this out as well: “Rather than truly “seeing,” Dickmanns said they rely on what he calls “confirmation vision.” That means they might work well on roads and areas that have been extensively mapped but fail when it comes to less controlled environments.” (source: https://www.politico.eu/article/delf-driving-car-born-1986-ernst-dickmanns-mercedes/)

    Quite sad how this fact that we are nowhere close of actual ML(as described by you) in control applications is obscured behind all the “neural network” marketing.

  13. One of the remarkable things about humans is that we can alter our body schema to include tools in a very short period of time. Will Wright talks about the extent to which we incorporate a car when we’re driving. I would argue that the body language of cars is a manifestation of human body language, and that there is some transfer learning taking place.
    You could also argue that much body language comes down to physical positioning plus acknowledgement. Most species communicate by moving towards what they want and letting smaller animals get out of the way. Sometimes the larger animal will make eye contact to make it clear that physical conflict is the next step. That covers a lot of driver body language, unfortunately.

    • Not sure what to think of this. It seems overblown or something to me. I think the real concept going on her is muscle memory — it can be used/developed to throw an object, drive a car, play a musical instrument, sing (as well as talk) — and lots of other things.

  14. Twentyish years ago when I was in this world, the term of art in cognitive psychology was statistical learning. I didn’t actually understand until my career took me in other directions that the term has a bigger life outside that field. But yes, there’s work on this in humans. Back then there were also researchers trying to establish whether human learners are intuitive Bayesians (I vaguely remember the evidence was mixed, and right now I wonder whether it is a well formed question). Regarding consciousness, you might also be interested in the broader literature on implicit learning and procedural memory. Pretty much all the work I knew about had the usual small sample NHST inferential problems, but I think it’s fair to say the phenomena exist even if the methods that have been used to investigate them have been clumsy sometimes.

    You say “riding on the street,” so I wonder whether you are talking here about cycling. I had personal cause to think about this literature again this summer, as the pandemic prompted me to improve my cycling skills (I started the summer still very much a beginner). It was very interesting to observe my attention shift as I gained experience with small things like rounding corners and crossing streets. Subjectively, it felt like while I was working to learn, my attentional window was very narrow, like looking through a small aperture. This is what made street crossings especially treacherous. As I became more confident, I was able to take in more of the scene at one time. I’m not sure I’m cut out for winter bike riding in my northern state, so I’ll be interested to see how much of this fluidity I will need to relearn in the spring. I have not yet braved full-on street riding, with its more complicated inferences about traffic! — where I am, there are enough side paths to get by.

Leave a Reply

Your email address will not be published. Required fields are marked *