David said,

“The fact that Andrew says “I don’t know” five times responding to seven questions is perhaps his true key to success.”

I’d say it’s evidence of his true commitment to intellectual honesty.

]]>Nicely done….

]]>+1 to Daniels advice

]]>No, I mean theory like: “what does an optimization algorithm do, and what are the generally difficult vs easy problems for them to solve? What aspects of the problem make it easier to solve? How can you reformulate a problem so it changes from a hard problem to an easy one?”

Or “What makes a problem amenable to an ordinary differential equation solution?”

Or “What does it mean that F(x) is asymptotically equivalent to the simpler expression Y(x) for large x, how can you evaluate whether x is large enough?”

]]>+1

]]>> But you also can’t be an excellent applied person by being totally ignorant of theory either.

The vast majority of “excellent” applied people (say top 1%) know virtually zero “theory”. (If by theory, you mean knowledge which would allow you to answer a question in the final for a theory undergrad course, much less be able to deal with the theory of a qualification exam.)

]]>This is a nice 7 minute video for making point 2 very clear Unpredictability, Undecidability, and Uncomputability https://www.youtube.com/watch?v=hDpEg881BnI

(Also a link to the full essay.)

Came to this post late, but the one piece of advice I’d give is just get used to making mistakes and don’t waste energy feeling bad about them but learn from them. Easier said than done.

]]>That shows his commitment to non-informative priors :-)

]]>[a different, much older Kyle]

That’s great advice in any field. I learned mostly by counterexample that no one should think they finally “got smart” at a certain point and can stop learning.

Kyle,

Thanks for your comments. Really interesting. Yeah, university materials have a better chance of being good but the glossing over of assumptions is common at least in practice to varying degrees across all sciences.

Textbooks are far more systematic than most online resources, save university course materials. Definitely the best way to learn a topic.

]]>Kyle:

Interesting point about online resources vs. textbooks. There does seem to be an asymmetry here: online claims get more noticed when they are bold, extreme, and overconfident; whereas successful textbooks are typically boring and understated. Even a textbook such as our own Bayesian Data Analysis . . . in many ways it represents a radical departure from earlier statistics textbooks, but in tone and content it is pretty careful. Online, it seems that dramatic statements are what gets the attention. That doesn’t mean that online resources are bad—maybe sometimes we need to get shaken up a bit—but it is interesting to consider the difference between what’s easy to find in textbooks and what’s easy to find online.

]]>Thanks Daniel I enjoyed reading your perspective, its certainly reassuring!

]]>Appreciate your feedback, I think what you’ve described is exactly what I’ve felt were neglected in a lot of online resources.

I’ve been fairly disappointed with how quickly model assumptions tend to be glossed over in online resources. For instance, I’ve seen iid mentioned countless times and its commonly not explained (apart from spelling it out), despite it being an extremely prevalent model assumption. Threats to validity, especially selection bias I’ve felt were extremely neglected in the materials I’ve seen. I’ve also seen a lot of data leakage (not to mention p-hacking and r^2-hacking) from people coming from a similar background to myself, because like you said its really easy to just shove data into formulas. I think I’m particularly lucky that I enjoy simulating data, and its helped me catch a lot of these mistakes.

Of course this is just in the resources I was looking at, but I’d like to think that worse resources tend to get marketed more, especially since there’s a giant market of people jumping onto the “Data Science hype train”, and subsequently companies scrambling to profit off of them.

I’ve been much happier with the content after switching to textbooks (I certainly never thought I’d enjoy reading a textbook in my life ) and I’ll probably avoid going back to online resources unless they’re university affiliated.

]]>This seems like good advice. When I was an undergrad, the chair and only member of the journalism department resisted offering a journalism major (he knew it would evolve into something like “communications” or “media studies” and he was too old-school for that). Instead, he offered a minor concentration that focused on writing and told his students to major in something they would like to write about.

]]>It was published in Encyclopedia with Semantic Computing and Robotic Intelligence, World Scientific Press, 1 (2), pp. 125-138, 2017.

See also Appendix A in https://www.wiley.com/en-us/The+Real+Work+of+Data+Science%3A+Turning+data+into+information%2C+better+decisions%2C+and+stronger+organizations-p-9781119570769 It is about “Skills of a Data Scientist”

]]>Some things I learned along the way:

1. Going to school during a recession is, for many people, a lower-opportunity cost option than during expansions. Graduating into a recession is no good.

2. Math classes at LACC are probably better than those at most major R1 university. The class size is probably 1/10th, and the teachers are teachers and not researchers or graduate students.

4. Talk to people about your interests. Get better at talking to people about your interests. Be concise.

5. Thinking and writing and speaking clearly are more important than math.

As to the more relevant to me of your specific questions:

– What sort of skills does this subject demand and reward, more specifically than the requisite/general mathematical abilities?

1. Clarity of communication. This is true from mathematical proofs to journal articles. It is also true from graduate school acceptance to post-graduate career. The more effectively you can communicate through speech, writing and data/results presentation (graphs, tables, hardly ever tables), the more you will succeed. Most of my students under-value these skills. They think their work will explain itself. But it will not.

2. Broader knowledge of the world. In any applied statistics situation you might be working in, the key to everything is the relationship between variation in the data and variation in the world. And more often than not, a tremendous amount of outside knowledge is required to make sense of that relationship, contextualize the measurements, and interpret the results reasonably. My students also tend to under-value knowledge, of, say, biology or law or sociology, and end up misinterpreting results in papers on health economics, economics of crime or economic demography. It is impossible to interpret statistical analyses if you don’t understand the underlying dynamics that generate what you are measuring.

– Are there any big (or at least common) misconceptions regarding what statistical research work involves?

1. That it is hard to find patterns in data. 2. That is easy to interpret patterns in data.

– If you could give your younger self any career-related advice, what would it be? (I hope this question isn’t too cliche, but I figured it was worth asking).

This answer IS too cliché: At some point in the process someone sorta powerful and who might know will probably tell you to “consider doing something else with your life.” Over the years you will reflect on this through a surprising number of different perspectives.

ps – I know that at least one other person from my calculus class at LACC now has a “Dr.” in front of their name. And that’s the only one I was ever in contact with. So that’s like batting 1.000! ¡LACC!

]]>“Most of the hands-on work of applied statistics / statistical data analysis is not in setting up the models, it’s in all of the stuff that has to be done before and after you set up the models”

And neglecting (or giving short shrift to) the “before” stuff usually amounts to shooting yourself in the foot.

]]>“Awaken from the Cartesian dream (that all the world’s a deterministic machine and with the right model and enough data you’ll arrive at Truth)”

+ many!

]]>Phil:

Rule of three for the win: it motivated you to come up with that important point about communication.

]]>There are so many different jobs that involve data analysis (and related disciplines) that it would be misleading to generalize too much about what skills one needs and what training would be handy. There, however, a few things that I think are generally true:

1. You need to be able to program. It doesn’t matter much what language, because all of the popular languages are similar enough that if you learn one you can learn another. You’ll probably do a lot of work in R or Python or maybe both, but if you’ll be using Matlab or SAS or whatever, that’s fine, you’ll just learn that too. There can be a substantial overhead cost to learning a language as you apply it, but if it happens it happens.

2. Most of the hands-on work of applied statistics / statistical data analysis is not in setting up the models, it’s in all of the stuff that has to be done before and after you set up the models. Here https://statmodeling.stat.columbia.edu/2018/03/23/lessons-learned-hell/ is something I wrote a couple of years ago. You’ll notice how little emphasis I put on having to come up with an improved statistical model. Yeah, it took me a few solid days to try models and converge on one I was satisfied with, but the project as a whole took months and even just the bad part took weeks.

3. According to the rule of three, I have to come up with one more thing, so …uh, OK, there is one more thing that I think is true in general: you are going to have to present your work to others, and that can be done well or poorly. You may or may not need to be able to do good technical writing, but you surely will have to make statistical graphics for others to look at. Try to get good at this.

]]>1 Study statistics

2 Awaken from the Cartesian dream (that all the world’s a deterministic machine and with the right model and enough data you’ll arrive at Truth)

I’ve seen people deep into their careers still fast asleep.

Universities should take responsibility for Step 2, as an employer it’s usually left up to me to shake them awake. ]]>

Terry:

Traditionally, many intro stat students are biology or psychology majors, and these fields mostly use statistics to analyze experiments (t-tests, Anova, etc), so regression is not the first priority. It’s different for me because I teach political science students. We don’t do so many experiments; we’re always analyzing surveys and archival data, and regression comes up all the time.

]]>Cool thanks! :)

Great idea to get involved with campus clubs – lots of times they’re inviting speakers and stuff like that. Also might give you a chance to interact with grad students too.

Oh, another really great resource I totally overlooked: your local grad student!

Definitely chat up the TAs and grad students in your department. They have more knowledge than you but to some extent are working out the same questions you are about careers, often they’re a lot more accessible than profs or professionals, but they have access to those people.

]]>This is powerful! I completely agree with you, Daniel!

]]>Andrew, I’d be fascinated to know which of your ideas you’re thinking of. I assume these are things you’ve published, since you say you’ve given them away for free.

]]>Something I learned a lot from was reading an eclectic array of research articles in a particular field, picking apart their methods and assumptions, and asking yourself questions like, “What are some alternative explanations for the data?” “How would one go about teasing out evidence for them, either by using different statistical method on the data or beginning with a new research design?” “Why did this paper use a different test than the other paper for the same kind of question?” Rip them apart and then answer the questions as if you were the author defending or improving your approach. Don’t be afraid to look at papers going back decades in the same research thread, as this will show you how we have grown (or not) over time. Actually, that would be an excellent intro course.

]]>+1

]]>+1

]]>“I think you want to learn theory and practice at the same time… if you start out “learning all the theory” you become totally useless… if you start out just “learning what everyone does” you learn all the bad tools that everyone uses…

No one ever becomes an excellent applied person by sitting down and studying theory for 10 years. But you also can’t be an excellent applied person by being totally ignorant of theory either. The key is to have a foundation, and to pull in new theory whenever it seems necessary to make applied progress.”

Agreed. It is a continuing process.

]]>Jim said:

“Unfortunately, unless you’re very bright (~~>90th percentile) and/or spend a lot of time doing math and stats for fun, you won’t be able to grasp what professionals are doing until the third year of your degree. That’s when you start to build an in-depth knowledge of some subject areas.”

This may vary from program to program, but is often true. In math, at least, honors courses usually give a better idea of the field than the regular courses do. So try to take them if you can (rising to the challenge is part of their worth!)

“The most important thing for you to do is to get involved in the field and keep abreast of developments as you go through your undergrad and modify your goals and plans accordingly. Cast a wide net. Talk to people in academics and industry. Find out what kind of things they do, what they think are exciting trends, and where they see the field going.”

Campus undergraduate organizations (e.g., math club, stats club) can help. If your campus doesn’t have any appropriate ones, take the initiative to start one (recruiting other students to help) and offer the type of programs that Jim suggests.

“I’m not a statistician, but people on this blog could suggest some student/professional orgs to get involved in and follow. Definitely do this, these orgs often publish monthly or quarterly news letters with columns and such on trends in the discipline, which will give you an idea of what’s going on.”

The Mathematical Association of America and American Statistical Association both offer student memberships at a discounted price. (Details easy to find online for MAA, but takes a little longer to find for ASA.) There may be local chapters that you can participate in — or maybe even start a student chapter. They both also have “online communities” that members can join.

]]>Could be — but I hope I have made a little dent because of volunteering to teach future teachers, and I think I did have an effect on at least some of them that has altered the way they teach (for the better, I hope). (But then, some of them had to deal with principals who did not allow class discussion. Hard to say whether those principals are counterbalanced by others such as the one who was an English teacher, and fully supported one of my former students’ insistence on her students explaining their reasons for what they did, even though the other math teachers didn’t like it.)

]]>I agree the survey courses are pretty shallow, but I think the they’re are OK for what they are. They give students the broad set of principles that guide everything else.

IMO what students need is more of contact with people who are working in industry so they can get a feel for it beyond the newspaper-level depiction in first year courses.

]]>Kyle,

Would you expand about what you think are the shortcomings of the online resources you’ve been working with? Is it the lack of theory?

Theory or not, what you really need to do is think about and understand what you’re doing when you shove data into formulas. Theories are frequently modified to account for new information.

The other thing you *reallyreallyreally* need to understand is what your data are actually measuring. This doesn’t require statistical theory, it requires an understanding of the assumptions in your measurements and methods. This is the biggest problem in all of science.

]]>When I was being interviewed by the dean of the law school I was sure wouldn’t admit me because of my 3.3 undergrad GPA I asked if I had a chance. You see, the average GPA for those admitted was over 3.8. He said most of those people had degrees in things like poly sci and that he’d found over 40 years that people from rigorous programs in math, engineering, chemistry and physics tended to do better than their undergrad grades would predict for those who majored in other subjects. Not long after I got my acceptance letter.

Weirdly, legal education turned out to have pretty much zero interest in science or math. Instead it was largely about completing syllogisms. Premises were never questioned and so when science made an appearance it was in the form of “All men have ESP, Darly Bem is a man, Therefore?” The good thing was that we were graded on a curve and the poly sci people propelled me to the right of it by too often answering “Socrates is Greek” in response to “All men are mortal. Socrates is a man. Therefore?”

]]>i would certainly agree with this. When I started studying applied statistics in 1973 the subject was only 50 years old and the electronic computer was beginning to make an input. There have been a huge number of advances and developments since then.

]]>….peadagogy advances one funeral at a time?

]]>Rahul said: “Speaking of math, why don’t they revise all the undergrad courses?”

A big part of the problem is that “they” covers a wide variety of people at a wide variety of institutions. My experience as a math professor for many years was that there were frequent revisions in course listings (I started several new undergrad courses myself) and course content — but that revisions often petered out or lost what was revisionary about them when they were taught by people other than the ones who did the initial revision. Kind of a regression to the mean (or just the momentum of TTWWADI (“That’s The Way We’ve Always Done It”) .

PS: One course that continued in some sense was a new course for future teachers. When I passed it on to another instructor, she noticed that the methods that were atypical actually increased student conceptual understanding (which was, of course, why I used those atypical methods). Then when she started teaching an actuarial course, she adapted the atypical methods to that subject and indeed found that the actuarial students had greater conceptual understanding compared to those taught in the more traditional way.

]]>This brings up a favorite peeve of mine – and a reason why I’ve shied away from teaching undergraduates for the last half of my career (and why I had mixed results in the first half). Almost all first year courses are ill-designed. They are overview courses and fail to give students an idea of what the field is really like (statistics, economics, whatever). I always thought they should be narrow and deep – and provide the students an experience of what the subject is really like. I’d leave the broad overview courses as senior level capstone courses. In economics (my degrees), I always though this was the right way to go. Of course, few others agreed, so it got me nowhere.

]]>This brings to mind something a law professor who was involved in admissions to Law School once told me: Math majors are generally good candidates for admission to Law School, since they are strong in logical thinking and in spotting errors in reasoning.

]]>Andrew said “I’m pretty sure that any math you learn as an undergrad will come in handy later.”

D.K. replied “No, at least if by “later,” you mean post-school. Math is about as useful as Latin in the professional world. (A slight exaggeration, but only slight.) No one will ever pay you to prove something or calculate an integral by hand.”

Andrew responded: “3. OK, not “any math.” Change that to “most math.” Most of what you learn in math in college is not proofs or calculating integrals, it’s problem solving. Even solving integrals is not so useful: sure, the particular techniques using arcsecant or whatever don’t matter, but by working through these formulas you’re learning how the expressions fit together.”

I’d say it depends on the quality of the math content and teaching you encounter in high school and college. As Andrew points out, problem solving experience is worthwhile. However, Andrew’s college problem solving experience was extremely high quality — e.g., Putnam Exam level. Many college students’ problem -solving experience is limited to “routine” problems (i.e., ones that involve carrying out an algorithm in a specific context where there is an algorithm that has been taught).

Other things that are useful from *good* math courses (but not guaranteed to be taught) include being able to work with technical definitions and learning the difference between conjecture and proof.

]]>I think you want to learn theory and practice at the same time… if you start out “learning all the theory” you become totally useless… if you start out just “learning what everyone does” you learn all the bad tools that everyone uses…

I’d say start reading BDA and Statistical Rethinking… get the point where you understand some idea, and then try to apply that idea to a real problem… then figure out what your real problem needs, and then see if those two resources have info on some theory that provides that tool… lather rinse repeat…

No one ever becomes an excellent applied person by sitting down and studying theory for 10 years. But you also can’t be an excellent applied person by being totally ignorant of theory either. The key is to have a foundation, and to pull in new theory whenever it seems necessary to make applied progress.

]]>Ugh. tell me about it…

]]>Speaking of math, why don’t they revise all the undergrad courses? I feel they focus on the wrong things.

Most math course have hardly changed in the last 20 years even last 50 maybe. Why do they pretend that we don’t have things like algebraic solvers etc.

I think the undergrad math courses we have are ok if you want to go to grad school or into method development but pretty much not suitable for the rest of the students.

I wish they would focus more on the tools, on the setting up of problems, on messy equations that don’t have closed form solutions, on non linear or numerical problems etc.

The typical undergrad math course is like a chess game where tricks are rewarded.

Why does a numerical methods or matlab course come as an afterthought and not integrated into the regular math courses .

]]>I’ve spent the last year and a half studying programming and statistics on my own for use in industry (coming from basic math/stats prerequisite knowledge). I spent my first year studying unofficial online resources, and at first I thought they were amazing, but I eventually realized their massive shortcomings.

As someone who is now trying to “correct their course”, do you have any advice on what to study? More specifically what are your thoughts on studying applied use before theory? To be clear, I don’t just want to learn statistics to make money in industry, but I truly enjoy working with data and I want to do it the right way.

I recently picked up McElreath’s Statistical Rethinking, and my plan was to read Bayesian Data Analysis next, but I’m wondering if I should slow down and make sure I have a better understanding of statistical theory and probability theory first. Is it irresponsible to try being a practitioner before truly understanding theory?

And if I wanted to learn probability theory and statistical theory, do you have any textbook recommendations?

]]>The only thing I’ve ever regretted about math is not learning more of it.

Math is more than a collection of operations. It’s a language that expresses concepts in a rigorous form dramatically improves problem solving.

From the employment perspective, it’s probably true that very few employers will care what kind of math you can do (there are exceptions). But for you as an individual, the question is: what kind of problems can you solve? Your knowledge of math will dramatically expand the set of problems you can solve and the efficiency with which you solve them.

]]>It’s difficult as a first-year undergrad to get an idea of what’s going on professionally in a field. Unfortunately, unless you’re very bright (~~>90th percentile) and/or spend a lot of time doing math and stats for fun, you won’t be able to grasp what professionals are doing until the third year of your degree. That’s when you start to build an in-depth knowledge of some subject areas.

The most important thing for you to do is to get involved in the field and keep abreast of developments as you go through your undergrad and modify your goals and plans accordingly. Cast a wide net. Talk to people in academics and industry. Find out what kind of things they do, what they think are exciting trends, and where they see the field going.

I’m not a statistician, but people on this blog could suggest some student/professional orgs to get involved in and follow. Definitely do this, these orgs often publish monthly or quarterly news letters with columns and such on trends in the discipline, which will give you an idea of what’s going on.

Stay away from NHST! :)

]]>