Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

Calibrating patterns in structured data: No easy answers here.

“No easy answers” . . . Hey, that’s a title that’s pure anti-clickbait, a veritable kryptonite for social media . . . Anyway, here’s the story. Adam Przedniczek writes: I am trying to devise new or tune up already existing statistical tests assessing rate of occurrences of some bigger compound structures, but the most tricky […]

The garden of 603,979,752 forking paths

Amy Orben and Andrew Przybylski write: The widespread use of digital technologies by young people has spurred speculation that their regular use negatively impacts psychological well-being. Current empirical evidence supporting this idea is largely based on secondary analyses of large-scale social datasets. Though these datasets provide a valuable resource for highly powered investigations, their many […]

Harvard dude calls us “online trolls”

Story here. Background here (“How post-hoc power calculation is like a shit sandwich”) and here (“Post-Hoc Power PubPeer Dumpster Fire”). OK, to be fair, “shit sandwich” could be considered kind of a trollish thing for me to have said. But the potty language in this context was not gratuitous; it furthered the larger point I […]

We’re done with our Applied Regression final exam (and solution to question 15)

We’re done with our exam. And the solution to question 15: 15. Consider the following procedure. • Set n = 100 and draw n continuous values x_i uniformly distributed between 0 and 10. Then simulate data from the model y_i = a + bx_i + error_i, for i = 1,…,n, with a = 2, b […]

Pharmacometrics meeting in Paris on the afternoon of 11 July 2019

Julie Bertrand writes: The pharmacometrics group led by France Mentre (IAME, INSERM, Univ Paris) is very pleased to host a free ISoP Statistics and Pharmacometrics (SxP) SIG local event at Faculté Bichat, 16 rue Henri Huchard, 75018 Paris, on Thursday afternoon the 11th of July 2019. It will features talks from Professor Andrew Gelman, Univ […]

Question 15 of our Applied Regression final exam (and solution to question 14)

Here’s question 15 of our exam: 15. Consider the following procedure. • Set n = 100 and draw n continuous values x_i uniformly distributed between 0 and 10. Then simulate data from the model y_i = a + bx_i + error_i, for i = 1,…,n, with a = 2, b = 3, and independent errors […]

Question 14 of our Applied Regression final exam (and solution to question 13)

Here’s question 14 of our exam: 14. You are predicting whether a student passes a class given pre-test score. The fitted model is, Pr(Pass) = logit^−1(a_j + 0.1x), for a student in classroom j whose pre-test score is x. The pre-test scores range from 0 to 50. The a_j’s are estimated to have a normal […]

Question 13 of our Applied Regression final exam (and solution to question 12)

Here’s question 13 of our exam: 13. You fit a model of the form: y ∼ x + u full + (1 | group). The estimated coefficients are 2.5, 0.7, and 0.5 respectively for the intercept, x, and u full, with group and individual residual standard deviations estimated as 2.0 and 3.0 respectively. Write the […]

Question 12 of our Applied Regression final exam (and solution to question 11)

Here’s question 12 of our exam: 12. In the regression above, suppose you replaced height in inches by height in centimeters. What would then be the intercept and slope of the regression? (One inch is 2.54 centimeters.) And the solution to question 11: 11. We defined a new variable based on weight (in pounds): heavy […]

Question 11 of our Applied Regression final exam (and solution to question 10)

Here’s question 11 of our exam: 11. We defined a new variable based on weight (in pounds): heavy 200 and then ran a logistic regression, predicting “heavy” from height (in inches): glm(formula = heavy ~ height, family = binomial(link = “logit”)) coef.est coef.se (Intercept) -21.51 1.60 height 0.28 0.02 — n = 1984, k = […]

Question 10 of our Applied Regression final exam (and solution to question 9)

Here’s question 10 of our exam: 10. For the above example, we then created indicator variables, age18_29, age30_44, age45_64, and age65up, for four age categories. We then fit a new regression: lm(formula = weight ~ age30_44 + age45_64 + age65up) coef.est coef.se (Intercept) 157.2 5.4 age30_44TRUE 19.1 7.0 age45_64TRUE 27.2 7.6 age65upTRUE 8.5 8.7 n […]

Question 9 of our Applied Regression final exam (and solution to question 8)

Here’s question 9 of our exam: 9. We downloaded data with weight (in pounds) and age (in years) from a random sample of American adults. We created a new variables, age10 = age/10. We then fit a regression: lm(formula = weight ~ age10) coef.est coef.se (Intercept) 161.0 7.3 age10 2.6 1.6 n = 2009, k […]

Question 7 of our Applied Regression final exam (and solution to question 6)

Here’s question 7 of our exam: 7. You conduct an experiment in which some people get a special get-out-the-vote message and others do not. Then you follow up with a sample, after the election, to see if they voted. If you follow up with 500 people, how large an effect would you be able to […]

Question 6 of our Applied Regression final exam (and solution to question 5)

Here’s question 6 of our exam: 6. You are applying hierarchical logistic regression on a survey of 1500 people to estimate support for a federal jobs program. The model is fit using, as a state-level predictor, the Republican presidential vote in the state. Which of the following two statements is basically true? (a) Adding a […]

Question 5 of our Applied Regression final exam (and solution to question 4)

Here’s question 5 of our exam: 5. You have just graded an exam with 28 questions and 15 students. You fit a logistic item-response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (a) If a question is answered correctly by students with low ability, but is missed by […]

Question 4 of our Applied Regression final exam (and solution to question 3)

Here’s question 4 of our exam: 4. A researcher is imputing missing responses for income in a social survey of American households, using for the imputation a regression model given demographic variables. Which of the following two statements is basically true? (a) If you impute income deterministically using a fitted regression model (that is, imputing […]

Question 3 of our Applied Regression final exam (and solution to question 2)

Here’s question 3 of our exam: Here is a fitted model from the Bangladesh analysis predicting whether a person with high-arsenic drinking water will switch wells, given the arsenic level in their existing well and the distance to the nearest safe well. glm(formula = switch ~ dist100 + arsenic, family=binomial(link=”logit”)) coef.est coef.se (Intercept) 0.00 0.08 […]

Question 2 of our Applied Regression final exam (and solution to question 1)

Here’s question 2 of our exam: 2. A multiple-choice test item has four options. Assume that a student taking this question either knows the answer or does a pure guess. A random sample of 100 students take the item. 60% get it correct. Give an estimate and 95% confidence interval for the percentage in the […]

Question 1 of our Applied Regression final exam

As promised, it’s time to go over the final exam of our applied regression class. It was an in-class exam, 3 hours for 15 questions. Here’s the first question on the test: 1. A randomized experiment is performed within a survey. 1000 people are contacted. Half the people contacted are promised a $5 incentive to […]

My (remote) talk this Friday 3pm at the Department of Cognitive Science at UCSD

It was too much to do one more flight so I’ll do this one in (nearly) carbon-free style using hangout or skype. It’s 3pm Pacific time in CSB (Cognitive Science Building) 003 at the University of California, San Diego. This is what they asked for in the invite: Our Friday afternoon COGS200 series has been […]