Someone named Nathan writes:

I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several times is to simply “model the problem directly”. I am not a masters student (yet) and am also not trained at all in Bayesian. My coursework was entirely based on classical null hypothesis testing.

From what I can gather, you mean the implementation of some kind of multi-level model. But do you also mean the fitting and usage of standard generalized linear models, such as logistic regression? I have ordered the book you wrote with Jennifer Hill on multi-level models, and I hope it will be illuminating.

On the other hand, I’m looking at going to graduate school and I will be applying this fall. My interests have diverged from classical statistics, with a larger emphasis on model building, prediction, and machine learning. To this end, would further training in statistics be appropriate? Or would it be more useful to try and get into a CS program? I still have interests in “statistics” — describing associations, but I am not so sure I am interested in being a classical theorist. What do you think?

My reply: There are lots of statistics programs that focus on applications rather than theory. Computer science departments, I don’t know how that works. If you want an applied-oriented statistics program, it could help to have a sense of what application areas you’re interested in, and also if you’re interested in doing computational statistics, as a lot of applied work requires computational as well as methodological innovation in order to include as much relevant information as possible into your analyses.

He should perhaps consider a data science master’s program.

I’m in a data science master’s program and I second this recommendation. Because they’re between stats and CS and are typically newly-created, students have a lot of leeway to shape their curriculum and to specialize in more “traditional” stats or more traditional compsci.

This recent post by Roger Peng might be worthwhile read for someone like Nathan https://simplystatistics.org/2019/01/18/the-tentpoles-of-data-science/

I wonder whether having a foundation in mathematics is a prerequisite. I would think so.

I keep thinking about that and have for some time.

You have to be able to think abstractly and counter-factually and mathematical training likely is the best way to obtain and hone those skills (1).

But most mathematical techniques applicable in statistics can be adequately approximated by various computations including simulation which suggests many topics can be skipped over or forgotten (students in stats course can’t because that how their current taught and tested).

On the other hand, anything that one can do mathematically that is not true for all sample sizes (perhaps greater than finite k) will require subsequent for my sample sizes verification by simulation – so simulation and computation cannot be skipped over or forgotten.

Of course you can’t do more with less, so math should be an advantage (but again 1).

1. “I have found mathematicians to be, by far, the best reasoners of any social class that excludes them; and yet it seems that an exclusive absorption in their studies, which more than any others demand exclusive devotion, tends to blind them to other kinds of reasoning.” Peirce ON THE FOUNDATION OF AMPLIATIVE REASONING 1910

Crudely put, they they often would rather do math than profitably apply statistical reasoning. This seem to include not wanting to do simulations. In particular, I have been wondering why Bayesians for so many years have not been simulating data from the prior and seeing if it emulates the world they are trying to model. Dan Simpson wrote a very recent paper noting priors often used when simulated from, gave air densities most often more dense that concrete. https://statmodeling.stat.columbia.edu/2018/09/12/against-arianism-2-arianism-grande/

Why did someone not do that 10, 20, 30 years ago?

Definitely check out the reputation of different statistics departments. Some of them are more Bayesian, some have a distinct machine learning bent, etc. If you find one that seems interesting, consider applying there. I can’t speak to CS departments, but I would assume they’re similar in this respect. Maybe someone can comment on CS directly?

As a researcher who uses a lot of stats but isn’t a statistician, I’ll make a plug for engineering as well! There is lots of need and training available to students in lots of engineering and science fields who want to use stats rather than do stats research. Some are closer to stats — ie operations research — and others use stats as a central tool — biostats, environmental engineering, etc. Not all programs and advisors will be a good fit, but if there are applications you’re interested in it’s worth considering that route as well. My take is that many, but not all, advisors would be delighted to have a student who wants to take a ton of courses in stats/ML plus a few in specific subject matter.

Also concur w/ above suggestions that you work or do a MS; a PhD is a long commitment and I definitely wouldn’t encourage anyone to apply until you are almost certain (you can’t be totally certain) that it’s exactly what you want to do.

Hi Nathan, I’m going to plug for engineering also. Think about all the cool stuff you could do w/ for example Boeing. They’re pushing to automate many processes to cut costs, improve quality etc before the Chinese enter the aviation market w/ low cost labor, no doubt *many* opportunities for machine learning! Also they’re on the cutting edge of drones and other automated flight, and competing w/ spaceX for NASA contracts. Lots of great stuff!

As someone who was once (actually twice) a computer science graduate student (albeit long ago), I feel qualified to offer an observation.

Here’s the U. Maryland PhD requirements for CS:

Coursework: Six graduate-level courses covering four areas out of artificial intelligence, bioinformatics, systems, databases, scientific computing, software engineering and programming languages, theory, and visual and geometric computing, and two more graduate courses from any area.

A person interested in the basic issue of statistical inference could take courses in AI, databases, scientific computing, and either software engineering/programming languages or bioinformatics.

The masters-level statistics program at UMD requires STAT650 Applied Stochastic Processes (3 Credits), Mathematical statistics (6 hours), Linear statistical Models (3 credits), plus some other credits.

At UMD The statistics program is much more focused on foundational mathematical concepts rather than topics such as database organization or programming.

At different universities the programs will vary. It probably makes sense to look at the specific requirements of the programs one is interested in. However, it will be generally true that computer science is bigger tent than statistics—with a wider range of topics studied in the field and required for graduate students.