It’s kinda like phrenology but worse. Not so good for the “Nature” brand name, huh? Measurement, baby, measurement.

Posted on October 2, 2020 9:38 AM by Andrew

Federico Mattiello writes:

I thought you might find this thread interesting, it’s about a machine learning paper building a “trustworthiness score” from faces databases and historical (mainly British) portraits.

It checks many bias boxes I believe, but my biggest complaint (I know it shouldn’t be) is the linear regression of basically spherical clouds of points:

The above-linked thread has disappeared, but here’s the paper, which is published in Nature, that famous scientific tabloid.

I replied that this research project looks like lots of fun. It reminds me of the idea that a student suggested once in class, many years ago, to watch a bunch of movies, and every time you see someone with a gun, see if he or she is on the left or the right side of the frame. Gather a bunch of data, and who knows what you might find? I’m a big fan of this sort of open-ended project.

I do have a problem with the researchers’ conclusions, though. First off, they keeps talking about “trustworthiness” or “computed trustworthiness” when they really mean “a measure that is weakly correlated with respondents’ judgments about trustworthiness.” Second, they makes the unsupported claim that “Quantifying trustworthiness tells us how much looking nice and trustworthy is important in a society.” Third, they do cross-national comparisons, but where are the human trustworthiness judgments coming from. Fourth, they do changes over time without recognizing the problem that their calibrations are all taking place now: even setting aside all other problems here, there’s no reason to think that trustworthiness as judged in a painting 500 years ago would be anything like how it is judged now.

I’d summarize that the big problem here is the confusion of the measure (some geometrical characteristics of a painted face) with an underlying construct of interest (trustworthiness). This is a huge problem. One way to think about it is that you could take a particular person and paint him or her in 100 different ways and get widely varying “computed trustworthiness” scores. But it would be the same person every time! It’s not like he or she would be varying in actual trustworthiness. So I see a big conceptual problem here.

In short, it could be a fun project without the overclaiming. But without the overclaiming I guess we wouldn’t’ve heard about this project at all. We’ve discussed this paradox before.

I also cc-ed Dan Simpson who added:

It’s nice to see digital phrenology branching out from criminality and finding the gays, I guess.

Also it’s hard to really take the data set seriously. Is there any reason to think that a portrait is in the national gallery because it’s photorealistic? It would not be that hard to imagine a portrait artist making their subject look more dominant or weaker or more trustworthy or less by the specific standards of the day and place (again, that’s a very slippery standard).

Also styles of portraiture and the facial expressions that are considered neutral or artistic change massively over this time period. There’s a lot of “lesbians wear more baseball caps” going on: people scowl a lot in old portraits.

There’s probably something about the way the structure of the economy changed and what that means both for who has a famous portrait and how the gdp relates to an individual. But that would require going deeper into the structural assumptions than this paper probably deserves.

Related (different nature journal): did you know that the presence of mountains explains your openness to new experiences? I feel very bad for the people of The Netherlands and Denmark.

I replied that actually this is worse than phrenology. Phrenology at least corresponds to stable characteristics of individuals. But this coding is all about facial expressions. As a person with Tourette’s who’s never been able to look people in the eye (which according to my elementary school teachers was necessary for any trustworthy person), I find it particularly annoying.

Let me emphasize what I wrote above, that I think this could be a fine research project if conducted in an exploratory spirit and not over-sold. When I say it’s “worse than phrenology,” I’m speaking of the science; I’m not making any moral or societal claims.

From a statistical point of view, the big problem here is not taking measurement seriously. Remember, measurement is the most important thing in statistics that’s not in the textbooks. Remember that example from a few years ago, that study that claimed that North Carolina was less democratic than North Korea? Measurement, baby, measurement.

P.S. It’s pretty funny that most of the comments on this post are about the name of the journal, Nature Communications, and almost nothing on the actual paper under discussion. The paper fails as clickbait if nobody even wants to argue about it!

65 thoughts on “It’s kinda like phrenology but worse. Not so good for the “Nature” brand name, huh? Measurement, baby, measurement.”

Sam on October 2, 2020 9:42 AM at 9:42 am said:

This may be a bit nitpicky but I find it weird that a lot of the discussion of this paper (including this blog post) makes the simple error of saying “I can’t believe this was in *Nature*”. The paper is in Nature Communications, not Nature. Nature Communications is a pay-to-publish OA journal with an absurdly high APC and far lower standards than Nature. Nature proper is a different journal.

Reply ↓
- Andrew on October 2, 2020 10:23 AM at 10:23 am said:
  
  Sam:
  
  Follow the link. The paper is published at nature.com. Nature Communications is an extension of the Nature brand. Reputational inference goes both ways. By giving this journal the Nature name, they’re leveraging the Nature brand. The converse is that if the new journal published a paper with serious flaws, the Nature brand loses.
  
  I understand the concept of multiple brands: back in the old days, GM made Cadillacs and they made Chevys. I think the implication here is that Nature is the Cadillac and Nature Communications is the Chevy. But it seems more like they’re making Cadillacs and Chevys, but they’re not calling the cheaper cars Chevys, they’re calling them Cadillac Cars or something like that.
  
  Arguably the most valuable asset that nature.com possesses is not its editorial board or its website or its back catalogue but rather the Nature name. If they want to try to squeeze more out of that name by lending it to a journal that publishes this kind of article, that’s their call—but then they have to accept the consequences of that decision.
  
  Reply ↓
  - jonathan on October 2, 2020 10:42 AM at 10:42 am said:
    
    GM still makes Chevy and Caddy. And Chevy makes the Corvette, so the brands are a mixture of social signaling. But I knew what you meant.
    
    Reply ↓
  - Raghu Parthasarathy on October 2, 2020 10:45 AM at 10:45 am said:
    
    Andrew: What you’ve written is all true, but writing that this article is “published in Nature” is like writing that a Chevy *is* a Cadillac. You could write, “published in a Nature journal” or “published in a Nature Publishing Group” journal,” or “published in one of those Nature-branded journals that keep multiplying like rabbits” or something like that, but when writing “published in X” it’s understood that X is the name of the journal in which the paper appeared. “Nature Communications” is in this case the name of the journal. X is *not* the shorthand name of the publisher, or the more famous journal from the publisher, or something else.
    
    Reply ↓
    - Andrew on October 2, 2020 11:00 AM at 11:00 am said:
      
      Raghu:
      
      If the journal were published by nature.com but just called Communications, I’d agree with you. But they put Nature right in the name. In the GM example, it would be as if Chevys were named Cadillac Chevys. If GM made Cadillacs and Cadillac Chevys, and I bought a Cadillac Chevy which had a rattling engine and wasn’t powerful enough to climb a hill, then, yeah, I’d be ok saying, “My new Cadillac couldn’t climb the hill.” And then if someone said, “Don’t call it a Cadillac; call it a Cadillac Chevy, man,” I’d reply: “GM made the decision to give that car the Cadillac name, not me.”
      
      Similarly, nature.com made the decision to give this journal the Nature name. If they wanted to avoid confusion, they easily could’ve done so. But my guess is that a big reason they chose this name was to play both sides of the street: the journal gets the Nature name so that authors are motivated to publish there, as the word Nature looks good on the C.V. etc., but then they have plausible deniability when they publish something bad, as it’s not “really” Nature. I don’t want to give them this deniability.
      
      In short, I’m calling it Nature because it’s the name of the journal, not because it’s the name of the publisher.
      
      Just by analogy: the American Statistical Association (ASA) publishes many journals. The flagship is the Journal of the American Statistical Association (JASA). Then they publish other journals. The other journals have different names. The American Statistician is not called JASA Teaching. Statistics and Public Policy is not called JASA Policy. Etc. That’s fair enough. When I publish in those lesser journals, my papers don’t get that JASA sheen. And JASA doesn’t get tarnished by bad papers published in those lesser journals.
    - Zad on October 2, 2020 11:53 AM at 11:53 am said:
      
      +2
    - Carlos Ungil on October 2, 2020 12:38 PM at 12:38 pm said:
      
      > In short, I’m calling it Nature because it’s the name of the journal, not because it’s the name of the publisher.
      
      It’s not the name of the journal, the name of the journal is Nature Communications. You’re of course right that it can lead to confusion and you’re definitely doing your part. They’ll have to accept the consequences!
    - Andrew on October 2, 2020 1:02 PM at 1:02 pm said:
      
      Carlos:
      
      If I buy a Chevy Malibu, I can say I bought a Chevy.
    - Carlos Ungil on October 2, 2020 6:28 PM at 6:28 pm said:
      
      Don’t worry, your point was clear! There are a few dozens publications from Springer Nature that some people would say are distinguished by their names (“Nature”, “Nature Digest”, “Nature Communications”, “Nature Immunology”, “Nature Reviews Immunology” and so on and so forth). You don’t share that view.
      
      You *can* say that they all have the same name “Nature” (you can even include in the pack a couple of dozens more of “npj whatever” publications, where the n stands for Nature). You *can* say that all of them are the journal “Nature”. You *can* refer to something appearing in any of them as “published in Nature”.
    - Raghu Parthasarathy on October 2, 2020 12:49 PM at 12:49 pm said:
      
      “But my guess is that a big reason they chose this name was to play both sides of the street: the journal gets the Nature name so that authors are motivated to publish there, as the word Nature looks good on the C.V. etc., but then they have plausible deniability when they publish something bad, as it’s not ‘really’ Nature. ”
      
      I agree about the likely playing both sides of the street. I do think that correctly calling it “Nature Communications” already achieves the aim of tarnishing the Nature brand if Nature Communications publishes poor articles. I can appreciate, though, Nature’s actions are begging for this sort of takedown.
      
      By the way: I’m sure the authors are delighted that you’re referring to their paper as a Nature article.
    - rm bloom on October 2, 2020 9:04 PM at 9:04 pm said:
      
      The big reputable publishers, including Penguin, all now have “imprints” which are part of a related racket. It is the self-publishing racket. I can pay “Author Solutions Inc” to “publish” my grandmother’s recipes and “market” the book. I pay them 5-10K or at as much as I desire; and they give me e-books or print-on-demand or even a nice stack of physical books in return. They will even take more money to pretend to generate publicity for me. This used to be called Vanity Publishing and there used to be only a few entities (such as Vantage) who played old ladies and cranks for their money. But now *everyone* can be a “published author” and the big presses see the money in it — they literally have made what formerly was a sort of backstairs business into their own line. It used to be that if you had one of these “publication” credits and were vain or feckless enough to list it in your CV it’d never pass the smell test — not with anyone who counted. But now it’s not so clear, is it?
    - John N-G on October 2, 2020 4:01 PM at 4:01 pm said:
      
      This is like ordering a Budweiser, getting a Bud Light, and having the bartender try to convince you that they’re the same thing because they’re made by the same company and start with the same three letters.
    - Brent Hutto on October 2, 2020 8:21 PM at 8:21 pm said:
      
      That’s an unintentionally (?) hilarious analogy. The difference between Bud and Bud Light is like the difference in saying p=.043 vs. p<0.05 for an NHST significance test. I personally find all of the above equally hard to swallow.
    - Martha (Smith) on October 2, 2020 10:27 PM at 10:27 pm said:
      
      Pun in tended, I assume?
    - Brent Hutto on October 3, 2020 6:14 AM at 6:14 am said:
      
      Martha,
      
      “Pun in tended, I assume?”
      
      Most certainly.
      
      In retrospect I should have somehow worked in a riff on Daniel’s “it’s all dog food” comment too!
  - Daniel Lakeland on October 2, 2020 10:47 AM at 10:47 am said:
    
    Then there’s this: https://www.nature.com/articles/s41586-020-2649-2
    
    Where an article describing a 15 year old numerical programming extension to Python is published in Nature proper.
    
    I just shake my head and move on when I hear people talking about what journal something is published in. It’s like talking about how much better your dog is because of the brand name of the food you feed it. It’s all dog food people!
    
    Reply ↓
    - Adede on October 2, 2020 2:41 PM at 2:41 pm said:
      
      To be fair, Numpy is pretty useful. It’s a lot better than this facial nonsense.
  - Esteban on October 2, 2020 4:22 PM at 4:22 pm said:
    
    This is one method a journal publisher might consider using to help to solve a concern over potential loss of reviewer resource investment return from the many rejected manuscript submissions to their flagship product? Vertical brand expansion via the cascading of its portfolio such as this, opens up the possibility for manuscript transfer between its various imprinted “sister” products, across a more distributed menu of in-house publication venues. Apart from the financial bottom line enhancements, is the resulting lesser number of rejections and loss of competition good for science? Some say this tiered journal branding process is one of the main driving forces behind the growth of Open Access. Certainly the reviewers of the top-tier journal would have to be complicit in this arrangement?
    
    Reply ↓
Andy on October 2, 2020 9:43 AM at 9:43 am said:

Sorry, but it’s not Nature, it’s Nature Communications, a different journal.

Reply ↓
- Paul on October 2, 2020 9:47 AM at 9:47 am said:
  
  It’s linked through their site with the same font. Is it really different?
  
  Reply ↓
  - Raghuveer Parthasarathy on October 2, 2020 10:15 AM at 10:15 am said:
    
    Nature has spun off a large number of Nature-branded journals. Nature itself remains its flagship, and is extremely selective (i.e. extremely hard to get published in). The rest can be confusing — sometimes deliberately so, I think. The ones with particular fields in the title (e.g. Nature Physics) tend to have very high reputations. At the other extreme is Scientific Reports, Nature’s version of PLOS One, which will publish things without consideration of potential impact. (There’s some good stuff there, and quite a bit of terrible stuff.) Nature Communications is in the middle; I’ve read excellent papers in it, but it isn’t Nature. It’s open access, by the way. A notable thing about Nature Communications is that it’s article fee (i.e. what you pay if your article is accepted for publication) is over $5000 — the highest I’ve seen for any journal I’ve personally run into.
    
    Reply ↓
    - Andrew on October 2, 2020 10:26 AM at 10:26 am said:
      
      Andy, Paul, Raghu:
      
      See my comment above. There’s a good argument for Nature to spread the wealth, as it were, and lend their name to a whole series of journals. But, again, this is what will happen. Giving a journal the name Nature Communications will give its papers extra attention due to the Nature name, but when that attention is negative, this will reflect negatively on the brand.
    - John Williams on October 2, 2020 1:55 PM at 1:55 pm said:
      
      To the extent that scientific journals are run like businesses, we can expect them to act like businesses.
Paul on October 2, 2020 9:44 AM at 9:44 am said:

I thought it was an Onion type article and was impressed at the length they went to for the graphics. When I found it through Nature, I thought I had downloaded a virus redirecting my search. When I used a second computer, it was too much.

Reply ↓
Simon Gates on October 2, 2020 9:47 AM at 9:47 am said:

Link doesn’t work for me. Is it this paper: https://www.nature.com/articles/s41467-020-18566-7 ?

Reply ↓
- Andrew on October 2, 2020 10:26 AM at 10:26 am said:
  
  Simon:
  
  Yes, I added the link above, as the twitter thread seems to have disappeared.
  
  Reply ↓
  - Emil O. W. Kirkegaard on October 5, 2020 2:18 AM at 2:18 am said:
    
    You can use archives to find it. Here you go: https://archive.vn/5BJ2c
    
    Reply ↓
Simon on October 2, 2020 9:47 AM at 9:47 am said:

The thread linked at the start of the post no longer exists – did the Nature enforcement bots close it down?

I will be forwarding this blogpost to my colleagues who teach modelling – its too late for the authors of the Nature paper, but we can put the next generation in the right direction…

Reply ↓
Peter Ellis on October 2, 2020 9:48 AM at 9:48 am said:

I totally agree with Andrew’s description of this as mostly about measurement. I don’t think the “regression through a cloud of points” is a problem, or certainly not an important problem relatively speaking. Lots of good models with important underlying relationships have low R squared. But the misunderstanding of measurement and the overclaiming consequent on that are disastrous.

Reply ↓
jim on October 2, 2020 10:53 AM at 10:53 am said:

“Lots of good models with important underlying relationships have low R squared. ”

Perhaps “barely discernable tendency” or “very slight tendency” is more apt than “important underlying relationship”? This takes me back to the “how will undergrads spend a penny” classroom experiments that were popular in economics for a while, where barely discernable tendencies became important underlying relationships and then went on to lead the league in slam-dunk principles on human behavior.

Reply ↓
- somebody on October 2, 2020 12:16 PM at 12:16 pm said:
  
  I think you’re misunderstanding R squared. There are lots of important underlying relationships with low R squared.
  
  http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/10/lecture-10.pdf
  Part 3.2 Q 1 (a) “R2 can be arbitrarily low when the model is completely correct. Look
  at Eq. 13. By making Var [X] small, or σ2 large, we drive R2 towards 0, even when every assumption of the simple linear regression model is correct in every particular.”
  
  Graphical example
  https://blog.minitab.com/hubfs/Imported_Blog_Media/flp_highvar.png
  
  Reply ↓
  - Andrew on October 2, 2020 12:29 PM at 12:29 pm said:
    
    See here, for example.
    
    Reply ↓
jonathan on October 2, 2020 11:16 AM at 11:16 am said:

I saw yesterday a bit about how fantasy maps in games, movies, etc. seem always to orient the same direction. I’m not that interested in those maps, so I didnt go through it in detail, but the reaction was of a fan who is seeing the seams appear in the confection and that is lessening enjoyment of the confection. And one of my favorite Taylor Swift bits is about how you know more at 18 than at 22 because you believe in things at 18 that you cant at 22.

But this kind of research seems to be something different, more like why bother have an election if you can determine ‘trust’ from pictures? That’s an extreme example, but bankers check credit because appearances lie. Or you can meet someone and think ‘I could fall in love with this person’, but you still have to go through the process. I’m reminded of the famous scene in Citizen Kane where Bernstein says, ‘A fellow will remember a lot of things you wouldn’t think he’d remember. You take me. One day, back in 1896, I was crossing over to Jersey on the ferry, and as we pulled out, there was another ferry pulling in, and on it there was a girl waiting to get off. A white dress she had on. She was carrying a white parasol. I only saw her for one second. She didn’t see me at all, but I’ll bet a month hasn’t gone by since that I haven’t thought of that girl.’ I’ve had many of those, and one equivalent was being on a subway in Mexico City surrounded by men who were, to be charitable, not clean and there was a young woman, maybe 16, wearing a simple, extremely clean dress, herself absolutely pristine and freshly scrubbed, holding an English textbook. We stood next to each other. Moments of what could have become.

I think people get caught up looking for ‘magic’ passages through the actual work by which a moment becomes something more. This makes immense sense; those moments shine, and we relive them. But the idea seems faulty at its core to think that humans can form substantial beliefs through shortcuts which are – this is crucial – detached from some specific messaging. As in, trust you to do what? To be what? They have basic training to teach recruits how to be in the military, so they can be trusted and thus can trust within the confines of specific roles. The space exists. There are clear vectors that cross the space.

I read the paper and they toss around ‘trustworthiness’ as though repetition gives the word meaning. It isnt clear what they are measuring or what space they are measuring in.

Reply ↓
Javier on October 2, 2020 12:08 PM at 12:08 pm said:

Have you considered to use autoregressive models in order to study trustworthiness in the Picture of Dorian Gray?

Reply ↓
- jrkrideau on October 4, 2020 8:40 AM at 8:40 am said:
  
  + 1
  
  Reply ↓
Psyoskeptic on October 2, 2020 1:23 PM at 1:23 pm said:

And here I thought that the assumption of a representative sample was the biggest thing not taught in the statistics textbooks.

Reply ↓
Fred on October 2, 2020 3:02 PM at 3:02 pm said:

This is the first time I’ve seen art historians, statisticians and economists unite to criticize a paper for three different reasons.

One of the author(Baumard)’s response on his now-deleted twitter account was disappointing as he chose to deflect criticism by focusing on the minority who were saying the research is racist. But I guess it is understandable. I would also be pissed if people accuse me of being a racist when I’m only guilty of conducting bad research.

Reply ↓
- Jonathan (another one) on October 2, 2020 3:46 PM at 3:46 pm said:
  
  Yeah… sort of makes you the anti-Fisher.
  
  Reply ↓
Foobar on October 3, 2020 4:16 AM at 4:16 am said:

Regarding:

“I’m speaking of the science; I’m not making any moral or societal claims.”

Can you really separate the two? While this piece was kind of funny, there are a lot worse stuff published. Coming from your local CS department / tech bros, of course.

The problem, I think, is that these people have no education whatsoever about the actual subject matters they “study” with their pseudo-science.

Reply ↓
- Andrew on October 3, 2020 4:57 PM at 4:57 pm said:
  
  Foobar:
  
  I think you could separate the different aspects of this article. For example, someone could publish an article discussing trends in facial expressions in art museum paintings. Once you get away from the silly idea that they’re measuring “trustworthiness,” then all sorts of reasonable things are possible. But that would move away from the imperative of pop social science, as well as journals of the PNAS/Science/Nature/Lancet/Psychological Science variety which is to hype the importance of the work.
  
  Reply ↓
jrkrideau on October 3, 2020 7:00 AM at 7:00 am said:

As a person with Tourette’s who’s never been able to look people in the eye (which according to my elementary school teachers was necessary for any trustworthy person), I find it particularly annoying.

And IIRC it is culturally determined. Some Chinese friends tell me that “looking someone in the eye” especially if one is a child being reprimanded is seen as defiant or aggressive.

Reply ↓
- Martha (Smith) on October 3, 2020 4:16 PM at 4:16 pm said:
  
  + 1/2 — I’d say that it is influenced by one’s culture and subculture (e.g., a family or religion or neighborhood might be a subculture within a larger culture).
  
  Reply ↓
  - jrkrideau on October 4, 2020 8:50 AM at 8:50 am said:
    
    @ Martha
    
    A very good point. I am Canadian and some of my most hostile remarks in a conversation would be seen as mild or inoffensive in U.S. discourse although we speak “almost” the same language.
    
    Pity the US cannot spell colour.
    
    Reply ↓
AB on October 4, 2020 2:21 PM at 2:21 pm said:

I find it strange that most commentaries about this paper are about the idea that you can form all sorts of impressions — including trustworthiness — from faces and facial expressions. I don’t really get why anyone (who got beyond the clickbaity title) would assume the authors think facial appearance gives away some sort of stable personality trait. Indeed, the very thing that you can paint a person “100 different ways and get widely varying “computed trustworthiness” scores” makes this research possible. see authors’ statement: https://twitter.com/CoralieChevall1/status/1311347519327211522/photo/1

BTW, Alexander Todorov, the champion of this subfield in psychology, and a reviewer of the paper is a celebrated Princeton professor with some 20K citations. Safra et al have not invented the idea that you can make trustworthiness impressions based on faces, they have not built the first algorithm to make these judgements instead of MTurkers, they just applied it to a new set of faces, ie. portraits. To me this seems like a fun project, with a potentially interesting pattern (“people scowl a lot in old portraits”), with admittedly overstretched framing and conclusion.

At the same time, and given that the post has the sociology label too, it’s worth noting that the authors have been lynched on Twitter, accused with being a racist, and a phrenologist, and a murderer, demanding the retraction of the paper, harassing the authors etc. Apparently the senior author was chased off of Twitter, hence the broken original link. Relatedly, it seems unfortunate that this post’s title echoes the mob’s war cry, even if it then clarifies that this time it’s not meant as a moral judgement.

Reply ↓
- Anonymous on October 5, 2020 8:29 AM at 8:29 am said:
  
  I’m in full agreement with this post. There is nothing strange in the idea that painters 500 years ago would choose to represent facial expressions so as to convey certain emotions and attitudes, in a way that is not completely alien to how we today read faces. It is regrettable that AG is trashing this paper at a moment when the authors face absurd and unjust accusations, while not giving supporting evidence that it is “silly” to talk about “perceived trustworthiness” in paintings – given that people certainly *do* have such perceptions, and that there is evidence for a good amount of cross-cultural uniformity in how facial expressions are interpreted.
  
  Reply ↓
- Andrew on October 5, 2020 8:38 AM at 8:38 am said:
  
  Ab:
  
  See my comment here. Trustworthiness as measured in this article is a perception, not a personality trait.
  
  I agree with your statement that “this seems like a fun project, with a potentially interesting pattern (“people scowl a lot in old portraits”), with admittedly overstretched framing and conclusion.” That’s what I wrote in my post above: “this research project looks like lots of fun. . . . I do have a problem with the researchers’ conclusions, though. . . . I think this could be a fine research project if conducted in an exploratory spirit and not over-sold.”
  
  I have no idea how it is possible to lynch people on twitter, but let me clarify that my entire critique is in the post above (and elaborated in the comments section). The only twitter material I read was in the linked thread, which was by the author of the article. Please do not take this post as an endorsement of any lynching, war cries, etc.
  
  Anon:
  
  You write that I should give supporting evidence for my comments. I actually think the burden goes the other way: I think the authors of a scientific paper should be justifying their assumptions. They can feel free to publish anything they want, but I have no reason to take it seriously. For example, you write that “people certainly *do* have such perceptions” of trustworthiness—but I don’t see that at all! I do see that if you ask people survey questions, they will answer them, but the fact that you can get a survey response from people that’s 0.2 correlated with something else, does not tell me that this relates to any interesting underlying construct. Again, I’m not trying to discourage the authors from continuing this line of research; I would just prefer if they were more direct about what they’ve found, rather than making these big claims which indeed do seem silly to me.
  
  Reply ↓
  - Anonymous on October 5, 2020 9:00 AM at 9:00 am said:
    
    Thanks for taking these comments seriously.
    They do provide evidence, by citing work that contain such evidence:
    “Experimental work have revealed that specific facial features, such as a smiling mouth or wider eyes, are consistently recognized as cues of trustworthiness across individuals and cultures”
    So to judge whether this is silly, one should in fact evaluate the papers they cite in support of their claim. You cannot fault a paper for not repeating the evidence contained in some other papers on which they rely. Of course, one has to trust (no pun intended) that these papers are themselves solid. It is a theoretical possibility that all the relevant field is based on weak foundations, but how would you know?
    About “lynching” not being possible on twitter…You must be kidding no? Of course the use of “lynching” is metaphorical, but it’s pretty clear that someone’s reputation can be harmed when 100+ people denounce them as racist…to the point that they feel depressed, become afraid to interact with others on social media (hence indeed ‘silenced’). In this case I don’t think you would disagree that it’s deeply unfair (they have been accused of being racist, etc.). Also, it discourages people to engage in the future. Probably next time the authors will *not* sum up their research on twitter. Is that good? Whatever you think about the paper, even you agree that it’s based on an interesting idea – people are better off knowing about the idea than not, it seems to me, and then it’s for everybody to make up their mind. But the minute the racism charge is raised, things become toxic.
    
    Reply ↓
    - somebody on October 5, 2020 9:57 AM at 9:57 am said:
      
      > Probably next time the authors will *not* sum up their research on twitter. Is that good?
      
      Frankly, I think it’s best for everyone that nothing of consequence be said on twitter anymore. I think this paper is terrible in concept, but the platform’s predisposition to quippiness turns statements like “this paper is kind of like phrenology in its misuse of human measurements to generate strong, spurious conclusions” into “this paper is racist eugenicism frmo the 1930s.” And in my opinion, the problem isn’t us or our culture–people I otherwise respect engage in this woke mockery of the week. The platform is only built for sharing one-liners and memes, so the only political discussion possible is sick burns from wokes or alt-righters.
Sra on October 5, 2020 12:03 AM at 12:03 am said:

I’m uncertain about this:

> linear regression of basically spherical clouds of points

If I have a near-spherical cloud of points, what is wrong with linear regression? What if n is large?

Reply ↓
- Sra on October 5, 2020 8:00 PM at 8:00 pm said:
  
  People do regressions like this all the time. Are they wrong?
  
  Reply ↓
  - Andrew on October 5, 2020 8:03 PM at 8:03 pm said:
    
    Sra:
    
    There’s nothing wrong with fitting a linear regression to a basically spherical cloud of points. The coefficients will all be near zero, and that can tell you something.
    
    Reply ↓
    - Sra on October 6, 2020 12:25 AM at 12:25 am said:
      
      But what if the coefficients are like 0.04 which is small-but-notable-if-true in the substantive context. Am I supposed to say “sure, p=.02 and the b=0.04 but the picture looks spherical so it’s probably nothing, who needs coefficients anyway”, and then write up the null-effects paper? Or what?
    - Zhou Fang on October 6, 2020 9:42 AM at 9:42 am said:
      
      It all depends on the context. If the Y axis is overall mortality rate and you’ve figured out a way to on average avert 4% of all deaths, that’s huge. If you only avert 4% of some incredibly rare cancer, then no one is going to give a shit.
      
      There’s just no universal way to interpret R^2 values.
    - Martha (Smith) on October 6, 2020 2:21 PM at 2:21 pm said:
      
      +1
    - Sra on October 6, 2020 4:20 PM at 4:20 pm said:
      
      > only avert 4% of some incredibly rare cancer
      
      Presumably the people having/treating that cancer would care.
      
      Suppose I need to decide whether to use that treatment. What should a preregistered regression with p=.02 b=0.04 on a visually spherical scatterplot should do to my N(0,sig=0.03) beliefs about the effect? Just do a by-the-book Bayesian update? Or is there some additional skepticism I should apply due to the spherical pointcloud? If so, what is the form of that skepticism?
    - Zhou Fang on October 8, 2020 7:31 AM at 7:31 am said:
      
      A “spherical”, or more generally an ellipsoid scatter pattern is indicative of a multivariate Gaussian joint distribution. In many senses it’s actually the idea case in this kind of regression, so there is no need for any additional skepticism. It’s about as by-the-book as you get.
    - Andrew on October 8, 2020 7:54 AM at 7:54 am said:
      
      Zhou:
      
      Yes, but spherical is a special case of elliptical where all correlations are zero.
Anonymous on October 5, 2020 8:05 AM at 8:05 am said:

Hello,

I’d like to take issue with this: “there’s no reason to think that trustworthiness as judged in a painting 500 years ago would be anything like how it is judged now.” Is there really “no reason” to think it will be “anything like” it is judged now? I would think that the default hypothesis would be that it’s partly ‘nurture’ and partly ‘nature’, while you are suggesting that there is “no reason” to think that there might be universal human tendencies in this domain (both in terms of how facial expressions relate to emotional states, and how people interpret facial expressions). What is your prior that 500 years ago smiling faces were perceived as sad or threatening while faces that we would now interpret as expressing anger and aggresiveness were in fact read as expressing benevolence? Note that Descartes, who lived about 400 years ago, talked about facial expressions in a way that does not seem particularly at odds with contemporary perceptions: for instance “it’s true that some facial expressions are easy enough to spot—e.g. a wrinkled forehead in anger and certain movements of the nose and lips
in indignation and mockery” (in ‘Passions of the soul”, on-line version available at https://www.earlymoderntexts.com/assets/pdfs/descartes1649part2.pdf), or
“Thus joy makes the colour brighter and rosier…As the blood becomes warmer and more fluid it makes all the parts of the face swell a little, thus making it look more smiling and cheerful” – so he took it as obvious that there is a relationship bewteen joy and smiling. Also,
“laughter might seem to be one of the chief signs of joy” (a claim Descartes them qualifies, but still, he took it as obvious).

In fact, there is evidence that facial expressions are interpreted in a quite uniform way across cultures (though not identically, I’m not saying there is no variation at all), and this was already recognized by Darwin. See http://www.paulekman.com/wp-content/uploads/2013/07/Constants-Across-Cultures-In-The-Face-And-Emotion.pdf, which has data which seem to be pretty strong evidence (I’m no statistician, but just looking at the results they report it seems compelling).

Reply ↓
- Andrew on October 5, 2020 8:29 AM at 8:29 am said:
  
  Anon:
  
  I agree that these things could be possible, so “anything” is too strong in my above statement. It would be more precise to say that the authors provide no evidence for this claim.
  
  Regarding the quotes you provide: I’m speaking specifically about trustworthiness, not facial expressions in general. I would indeed not be surprised to learn that expressions of joy, sorrow, anger, etc., have been stable over the centuries. I’m more skeptical about expressions of trustworthiness, partly because I have no sense of what an expression of trustworthiness would look like. Happiness, sadness, etc. see much more clear. whereas trustworthiness seems much more like an indirect derived quantity. Also there’s the “trustworthiness as judged in a painting” thing, which adds another degree of difficulty.
  
  The more the signal is attenuated, the more you have to worry about noise. And one thing that people don’t always understand is that “noise” includes systematic error. Noise includes all sorts of biases in measurement, changes in implicit definitions, etc.; it’s not just independent error that goes away when you increase your sample size.
  
  Reply ↓
  - Martha (Smith) on October 5, 2020 9:00 PM at 9:00 pm said:
    
    Nicely put.
    
    Reply ↓
  - AB on October 6, 2020 7:06 AM at 7:06 am said:
    
    Andrew,
    
    There is a literature in psychology which has been arguing at least for some 15 years that people form trustworthiness perceptions based on facial appearance. There is not even a debate about this, as far as i know. Of course, even larger and older sets of studies have proved to be wrong, so continuous skepticism, criticism etc is definitely warranted. But as long as the researchers’ own impression is that this literature is not flawed, it’s completely OK to cite the studies most relevant to their point and move on, instead of making OR testing the argument all over again, would not you agree?
    
    Here is the relevant bit from the paper:
    “Experimental work have revealed that specific facial features, such as a smiling mouth or wider eyes, are consistently recognized as cues of trustworthiness across individuals and cultures. 16,17,18,19,20,21”
    
    16. Walker, M., Jiang, F., Vetter, T. & Sczesny, S. Universals and cultural differences in forming personality trait judgments from faces.Soc. Psychol. Personal Sci 2, 609–617 (2011).
    17. Xu et al. Similarities and differences in Chinese and Caucasian adults’ use of facial cues for trustworthiness judgments. PloS ONE 7, e34859 (2012).
    18. Bente et al. Cultures of trust: effects of avatar faces and reputation scores on German and Arab players in an online trust-game. PLoS ONE 9, e98297 (2014).
    19. Engell, A. D., Haxby, J. V. & Todorov, A. Implicit trustworthiness decisions: automatic coding of face properties in the human amygdala. J. Cogn. Neurosci. 19, 1508–1519 (2007).
    20. Birkás, B., Dzhelyova, M., Lábadi, B., Bereczkei, T. & Perrett, D. I. Cross-cultural perception of trustworthiness: the effect of ethnicity features on evaluation of faces’ observed trustworthiness across four samples.Personal Individ. Differ. 69, 56–61 (2014).
    21. Todorov, A., Olivola, C. Y., Dotsch, R. & Mende-Siedlecki, P. Social attributions from faces: determinants, consequences, accuracy, and functional significance. Annu. Rev. Psychol. 66, 519–545 (2015).
    
    Is not this a fair way to “justify an assumption” or to offer “evidence for this claim”? I think that’s why Anon above was saying that people certainly have facial trustworthiness perceptions, and that’s why he expected you to offer more evidence for your criticism. It’s not the authors’ unjustified assumption vs your intuition. It’s a scientific literature with a decent body of empirical evidence vs your intuition. And again, your intuition may be right and expressing skepticism is what strengthens literatures. The problem is that both your post and the Twitter flame war was fundamentally about an idea, which the authors I think justifiably took for granted given the state of the literature.
    
    Concerns/criticism about using a noisy algorithm, applying it in a novel context, and overblowing conclusions/implications are more valid.
    
    Reply ↓
    - Andrew on October 6, 2020 8:43 AM at 8:43 am said:
      
      Ab:
      
      Thanks for the references. I agree that it’s informative to learn about cross-cultural similarities in judgments of trustworthiness, but I don’t see this as implying much about how people interpreted paintings 500 years ago. This is related to the point in your last sentence about “applying it in a novel context, and overblowing conclusions/implications.”
Carlos Ungil on October 5, 2020 3:07 PM at 3:07 pm said:

> the big problem here is the confusion of the measure (some geometrical characteristics of a painted face) with an underlying construct of interest (trustworthiness)

Maybe the confusion here is not theirs. The trustworthiness they are interested in is not a quality of the people portrayed, it’s about of the portrait and how other people reacts to it. The paper is mostly about “trustworthiness in portraits”, “trustworthiness displays” and “displayed trustworthiness”. In one section they discuss “whether displayed trustworthiness could be used as a proxy for interpersonal trust” so it’s not like they just use all the trust related concepts interchangeably.

Reply ↓
Zhou Fang on October 6, 2020 9:51 AM at 9:51 am said:

FWIW, I think it really would be interesting to do the Left to Right analysis. There’s a prevailing theory that in populations with left to right text, action that proceeds from left to right in art seems more “natural” than right to left. So aggressors in action scenes would be expect to face to the right, while defenders face to the left. In turn, nations with RTL text layouts would have it the other way round.

Reply ↓
- Martha (Smith) on October 6, 2020 2:23 PM at 2:23 pm said:
  
  Yes, it might be interesting to explore.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

It’s kinda like phrenology but worse. Not so good for the “Nature” brand name, huh? Measurement, baby, measurement.

65 thoughts on “It’s kinda like phrenology but worse. Not so good for the “Nature” brand name, huh? Measurement, baby, measurement.”

Leave a Reply Cancel reply