“I Can’t Believe It’s Not Better”

Check out this session Saturday at Neurips. It’s a great idea, to ask people to speak on methods that didn’t work. I have a lot of experience with that!

Here are the talks:

Max Welling: The LIAR (Learning with Interval Arithmetic Regularization) is Dead

Danielle Belgrave: Machine Learning for Personalised Healthcare: Why is it not better?

Michael C. Hughes: The Case for Prediction Constrained Training

Andrew Gelman: It Doesn’t Work, But The Alternative Is Even Worse: Living With Approximate Computation

Roger Grosse: Why Isn’t Everyone Using Second-Order Optimization?

Weiwei Pan: What are Useful Uncertainties for Deep Learning and How Do We Get Them?

Charline Le Lan, Laurent Dinh: Perfect density models cannot guarantee anomaly detection

Fan Bao, Kun Xu, Chongxuan Li, Lanqing Hong, Jun Zhu, Bo Zhang. Variational (Gradient) Estimate of the Score Function in Energy-based Latent Variable Models

Emilio Jorge, Hannes Eriksson, Christos Dimitrakakis, Debabrota Basu, Divya Grover. Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning

Tin D. Nguyen, Jonathan H. Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick. Independent versus truncated finite approximations for Bayesian nonparametric inference

Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig. Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Geoff Pleiss, John Patrick Cunningham. Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

P.S. The name of the session is a parody of a slogan from a TV commercial from my childhood. When I was asked to speak in this workshop, I was surprised that they would use such an old-fashioned reference. Most Neurips participants are much younger than me, right? I asked around and was told that the slogan has been revived recently in social media.

34 thoughts on ““I Can’t Believe It’s Not Better”

  1. > The name of the session is a parody of a slogan from a TV commercial from my childhood. When I was asked to speak in this workshop, I was surprised that they would use such an old-fashioned reference. Most Neurips participants are much younger than me, right? I asked around and was told that the slogan has been revived recently in social media.

    It’s still around isn’t it? I could’ve sworn I saw it at the supermarket the other day. And I definitely remember commercials / other parodies of that slogan and I’m early 20s! It’s a great session title by the way.

      • See https://thesocietypages.org/socimages/2011/09/22/the-politics-of-yellow/

        I remember when my family tried a “squeeze-mix” margarine package once when I was a kid. It didn’t work very well. If I remember correctly, the plastic bag broke, so my mom ended up putting the (still mostly white) margarine and the “button” of food coloring in a big bowl, and we all took turns trying to mix it together well. We still ended up with streaky margarine.

        • Martha:

          When I was a kid we weren’t poor, we were upper middle class. But we ate margarine not butter, McDonalds was our usual restaurant of choice, and when we wanted to go out for a fancy dinner we ate at Ponderosa. Now I’m upscale, I eat butter and haven’t even seen a tub of margarine for years, I’ve eaten at McD’s twice in the past 30 years, and I don’t know if Ponderosa even exists anymore.

          We did have Thai food once when I was 11, though. There was this restaurant called the Thai Room that we would often drive by. I’d heard that Thai food is hot, and I’ve always loved hot food, so my parents took me there as a treat at the end of the school year.

        • Andrew said,
          “When I was a kid we weren’t poor, we were upper middle class. But we ate margarine not butter”

          My Dad’s family was working class; my Mom’s kinda lower middle class. My paternal grandmother’s recipe for ice box cookies used margarine. Her recipe for fruitcake used salt pork instead of either margarine or butter. You minced the salt pork up into little pieces, then poured hot coffee over them to melt the fat, then used the coffee/salt pork combination as liquid and fat for the batter that held the nuts and candied fruit together. (The nuts were dusted with flour, and the candied fruit soaked overnight in wine, brandy, or whiskey before being added to the batter.)

        • Let’s not consider that a metaphor for the erotic too closely here. Ok fine let’s. The little yellow packet may break and colors everything but one may still remain transfat-laden, flavorless, and tightly regulated.

          I think we ate margarine mainly because mom liked to reuse the plastic tubs. She had about 100 of those.

        • I lived a mile from Andrew and I guess we were in the same culture. McDonalds was a treat for us, too — so was Roy Roger’s, which was a bit farther away. On rare occasion we would eat at Arby’s, whose outrageously salty thinly-sliced meat I still remember, although not with great fondness. I was somewhat surprised to learn in 2016 that Arby’s still exists, when someone pointed out that more people work at Arby’s than work in the U.S. coal industry.

          We were in the outer DC suburbs and I don’t think there was any food delivery available. At any rate in my entire childhood I don’t think we ever had food delivered. I remember Ponderosa Steakhouse, and Shakey’s pizza. For special occasions we would go to a seafood place whose name I’ve forgotten, or to the “Luau Hut”, and once a year when an elderly relative visited we would go to Mrs K’s Toll House, which had a dress code (jacket required for men).

          Nowadays I recognize (and care about) food quality way more than I used to. But quality is not at all the same as price. I’ll eat at the cheapest burrito place in town if the burritos are good, and one of my guilty pleasures is an incredibly greasy ‘New York style’ pizza at a cheap place in downtown Berkeley, which I generally prefer to pizza from The Cheese Board, which is near my house and which often has a line of people picking up to-go that stretches half a block (and I’m talking pre-pandemic, before people stood at a distance from each other).

        • The line at the Cheese Board is an illusion. Because they only sell one type of pizza a day, they churn through it faster than a fast food drive thru. It’s also a worker owned cooperative — socialism at work. That place is my model for society.

        • btw if you want a NY pizza, if you’re willing to order a day in advance, Emilia’s is good as well, or Gioia’s in North Berkeley

        • I remember some visiting cousins staying at a nearby cottage mixing white margarine with dye one day when I was about 10 years old.

          I was horrified/amazed/slightly revolted. What was that “stuff”? They were going to eat it?

          Of course, I grew up on a dairy farm and we even, occasionally, made our own butter.

    • In the sixties there was a “margarine scandal” in Finland: it was revealed that the carcasses of various animals were used to make margarine – including cats. For this reason in the Donald Duck cartoons he worked at “kattivaaran margariinitehdas” – catburg’s margarnine factory.

    • Beep:

      It’s the old knave-or-fool question. I’m guessing that he’s both. As many people (including me) have pointed out, the reasoning used in that report would lead us to be suspicious of just about every election ever conducted. It’s also the reasoning used in lots of junk science papers that we’ve discussed on this blog over the years, but the problems are particularly apparent in this case because everybody (except perhaps the fool who wrote that report) understands that voting patterns change over time.

    • The premise continues to be that votes in two periods “should be” drawn from the same distribution. Thus, observed differenced are either coincidences or fraud.

      There is no reason to suppose that the votes should be drawn from the same distribution. The whole point of doing elections periodically is that opinions change. If identical populations across time were true, then ANY scenario where an incumbent loses by more than margin of error is fraudulent by fiat. Since Trump won in 2016, assuming identical populations is almost assuming Trump won in 2020, in which case Trump losing can only be fraud no matter what the outcome. In fact, the only thing that can change between elections at all in this model is random sampling error—elections in this view is simply rerandomization, rolling a slightly biased die.

      • Right, that was in the previous blog post.
        But looking at page 31 in Beep’s link, Cicchetti seems to be talking about comparing votes before and after 3am in the 2020 election.

        “This result was not expected because the tabulations reported at 3 AM EST represented almost 95% of the final tally, which makes a finding of similarity for random selections likely and not statistically implausible….Location and types of ballots in the subsequent counts had, in effect, to be from entirely different populations, the early and subsequent periods, and not random selections from the same population.”

        He then goes on to describe anecdotal evidence that the yet to be counted ballots were mail-in ballots and says:

        “Either could cause the latter ballots to be non-randomly different than the nearly 95% of ballots counted by 3AM EST, but I am not aware of any actual data supporting that either of these events occurred.”

        So I don’t think this stuff on p. 31 is describing the same analysis discussed in the previous blog post, but I could be wrong. I haven’t read much of this stuff.

        • Jd:

          First, it’s not cool for him to change the question without acknowledging that he’s done so. Second, everyone knows that votes coming at different times and from different places will be different. This is true in just about every election just about everywhere, so as we’ve all been pointing out, this reasoning would lead us to discard just about every election ever conducted.

        • > it’s not cool for him to change the question without acknowledging that he’s done so

          Actually multiple things are discussed. There are three separate sections: Early Tabulations (p. 31), Clinton Compared to Biden Among Urban Voters (p. 33) and Georgia Rejection Rates (p. 37), plus Conclusions (p. 40).

        • “not cool” is inadequate for what is going on here. It is not simply the statistics that are horrible. As you said above, virtually every election could be questioned on the basis of analyses such as these. If we are to question election results because it is likely that voters are behaving differently than in the past, than other voters in the same election, or than other voters using different means to vote, then we are putting more faith in the p-value for these meaningless calculations than in the voters. While my faith in voters has indeed been shaken (too many times), this stance is totally anti-democratic. I question why anybody would want to take – and defend – such a position. He should be disavowed by the organizations he has been affiliated with. It is not a matter of free speech – he is as entitled to express his views as any of the others that do so, including crackpots. But to lend his expertise to his views sickens me. Where was he in the 2000 election?

    • On top of any substantive criticism, their filing employs an erroneous definition of what an ad hominem attack is…

      > In fact, Pennsylvania’s rebuttal to Dr. Cicchetti’s analysis consists solely of ad hominem attacks, calling it “nonsense” and “worthless”.

      Seems to me that a quality filing would not likely contain such an obvious error.

      Of course, by their definition, saying so is an ad hominem attack.

    • I have nothing to add on the statistics. But his doubling down includes this sentence: “Therefore, I continue to recommend
      that further investigations and audits should be done to nearly everyone’s satisfaction.” The word ‘nearly’ there is hilarious. I would bet a reasonable sum of money that it was not there in Charlie’s first draft of this and some lawyer told him to include it. And it would be even better if Charlie wrote this sentence and decided to add the word himself.

  2. What about the “EM algorithm” for picking out multiple manifold-like clusters embedded in noise?
    Every single some wiseacre has suggested we use it, we re-discover the old chestnut: [1] it’s another species of non-linear optimization; [2] in the presence of multiple factitious sub-optimal points; [3] you have to pick good starting values; [4] i.e. you need to have a good representation of the clusters and their location to start with. Voila! When you put meat into the meat-grinder, out comes ground meat! Surprise? But, what about stone-soup? No soap!

    • Haha, when I read the post I immediately thought I have tried out lots of cluster analysis methods on lots of data, and every single method is easy to destroy, including my own…

Leave a Reply to rm bloom Cancel reply

Your email address will not be published. Required fields are marked *