The two most important formulas in statistics

Posted on June 27, 2020 9:46 AM by Andrew

0.5/sqrt(n) (which in turn is short for sqrt(p*(1-p)/n)

5^2 + 12^2 = 13^2

With an honorable mention to 16.

17 thoughts on “The two most important formulas in statistics”

Adede on June 27, 2020 10:27 AM at 10:27 am said:

When is the second one used?

Reply ↓
- Sameera Daniels on June 27, 2020 12:53 PM at 12:53 pm said:
  
  When is the 1st one used? lol
  
  Reply ↓
  - Ben Hanowell on June 29, 2020 12:03 PM at 12:03 pm said:
    
    When doing power analysis and the effect size is assumed to be small and the base rate is assumed to be about 50%.
    
    So… not often in my line of work!
    
    Reply ↓
    - Ben Hanowell on June 29, 2020 12:04 PM at 12:04 pm said:
      
      Although I usually do assume the effect size is centered at zero!
Hans on June 27, 2020 11:14 AM at 11:14 am said:

1) Approximate error in probability given size
2) Emphasizes importance of 1, since it is completely useless
3) Further emphazises importance of 1, it is kind of a nice number, 2^2^2, but otherwise specially useful.

Seems evident :)

Reply ↓
Jonathan on June 27, 2020 11:48 AM at 11:48 am said:

A multi-dimensional universe should reduce to squares. I’ve long though the addition of squares should be presented as the reduction inherent in generating and simplifying dimensionality.

Reply ↓
Bob76 on June 27, 2020 1:46 PM at 1:46 pm said:

5^2 + 12^2 = 13^2
may be a helpful way to remember that if there are two independent sources of uncertainty in a measurement, the variance of the measurement will probably be primarily determined by the variance of the source with larger variance. I implicitly assume that the variation is Gaussian because that’s the normal thing to do.

Bob76

Reply ↓
- Andrew on June 27, 2020 2:52 PM at 2:52 pm said:
  
  Yes.
  
  Reply ↓
- Chris on June 29, 2020 4:21 PM at 4:21 pm said:
  
  lol – the _normal_ thing to do.
  
  Reply ↓
Chris on June 27, 2020 1:47 PM at 1:47 pm said:

That Bayes guy was always kind of holier-than-thou anyway, to be honest.

Reply ↓
Anonymous on June 27, 2020 4:40 PM at 4:40 pm said:

30 = infinity

Reply ↓
Daniel Lakeland on June 27, 2020 4:52 PM at 4:52 pm said:

I’m going to go with:

p(a,b) = p(a|b) p(b)

and

p({a,b}) = p(a) + p(b) – p(a,b)

Reply ↓
Ron Kenett on June 27, 2020 6:06 PM at 6:06 pm said:

-80538738812075974^3 + 80435758145817515^3 + 12602123297335631^3 = 42 (also called a diophantine equation…)

Reply ↓
- Howard Edwards on June 30, 2020 6:52 PM at 6:52 pm said:
  
  My goodness – perhaps another vindication of Douglas Adams’ Hitchhikers Guide the the Galaxy?
  
  https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#The_Answer_to_the_Ultimate_Question_of_Life,_the_Universe,_and_Everything_is_42
  
  Reply ↓
  - Ron Kenett on July 1, 2020 1:38 PM at 1:38 pm said:
    
    Yes – you got it. It is also the cover of my book on information quality.
    
    i) not everyone sees it
    ii) those who do, do not necessarily know what it means…
    
    https://www.amazon.com/Information-Quality-Potential-Analytics-Knowledge-ebook/dp/B01MEERM38/ref=sr_1_2?dchild=1&qid=1593625104&refinements=p_27%3ARon+S.+Kenett&s=books&sr=1-2&text=Ron+S.+Kenett
    
    Reply ↓
Dzhaughn on June 28, 2020 12:21 PM at 12:21 pm said:

We should get 16 in the legion of magic numbers like 0, 1, pi, e, i, golden ratio.

Reply ↓
Neil Diamond on June 30, 2020 7:05 AM at 7:05 am said:

I missed the 16 discussion last year but if had seen it I would have made a few points.

The usual definition of an interaction between factors A and B is (at least for two level factors) the difference between the effect of A at high B and the effect of A at low B, divided by two. The division is to make all the standard errors the same.

Using this definition, and if you assume that interactions are about half the size of main effects, 16 becomes 4.

But maybe it should be 1, at least in physical experiments. In their Bayesian method for finding active factors in fractional factorial designs, Meyer and Box (Journal of Quality Technology, 1993) assume a prior for active effects as N(0, gamma * sigma^2) and a prior for inactive effects as N(0, sigma^2), where they suggest that gamma be chosen to minimise the probability of finding no active factors. They say “important main effects and interactions tend, in our experience, to be of roughly the same order of magnitude, justifying the parsimonious choice of one common scale parameter gamma…”. In the BsProb() command in R in the BsMD package (Barrios, 2020, based on Meyer’s code) the default value of gamma is 2 (although it is possible to set different gamma values for main effects and interactions, but this appears to be rarely done).

Also, for two level experiments, it is quite common to get the magnitude of the interaction approximately equal to the magnitude of the two main effects. This just means that one combination of the two factors is unusually high or low and the other three combinations give about the same response (Daniel, 1975, page 135).

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

The two most important formulas in statistics

17 thoughts on “The two most important formulas in statistics”

Leave a Reply Cancel reply