Skip to content

Define first, prove later

This post by John Cook features a quote form a book “Calculus on Manifolds,” by Michael Spivak which I think was the textbook for a course I took in college where we learned how to prove Stokes’s theorem, which is something in multivariable calculus involving the divergence and that thing that you get where you turn your hand around and see which way your thumb is pointing, you know, that thing you do to figure out which way the magnetic field goes—the “curl,” maybe??

Here’s the quote from Spivak (as quoted by Cook):

. . . the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions . . .There are good reasons why the theorems should all be easy and the definitions hard. . . .

Cook places this within a thoughtful discussion of the tradeoff between putting complexity in the definition or in the proof, or, in a computing context, putting complexity in the programming language or in the program itself. To port this to statistics, we might talk about putting complexity in the statistical formalism or in the application. Bayesian statistics, for example, has a complicated formalism but is direct to apply; whereas classical statistical methods are simple—closer to “button-pushing”—but a lot of choice goes into which buttons to push.

Anyway, back to Spivak. I hated the course based on his book. Even though the prof was wonderful—he was my favorite math professor in college, I want up to him after the class was over and asked him to be my advisor—and the textbook itself was super-clear. But the course made me miserable. We started off the semester with a bunch of completely mysterious definitions, continued with weeks and weeks of lemmas that made no sense (even though I could follow each step), and concluded on the last day with the theorem, at which point I’d completely lost the thread.

It was only a bit later, after I happened to come across Proofs and Refutations, Imre Lakatos’s classic reconstruction of an episode in the history of mathematics, when I realized that the professor, and the textbook, did it backwards.

The right way to teach Stokes’s theorem (at least for me) would be to start by proving the theorem—it indeed is straightforward enough that a so-called heuristic proof could be laid out clearly in a single class period—and then step back and ask: what conditions are necessary and sufficient for the theorem to be correct? Or, to put it another way: under what conditions is the theorem false?

Step 1: The proof. (first week of class)
Step 2: The counterexamples. (second week of class)
Step 3: Going backward from there, establishing the conditions for the theorem, that is, the definition, in whatever rigor is required (the remaining 11 weeks of class).

That’s how they should’ve done it.


  1. Anonymous says:

    Ha! I had similar experiences with that book, back in happier days before I knew the first thing about statistics and didn’t realize how corrupt and incompetent academia is. It was used for the undergraduate “vector calculus” course required for physics majors. I liked it, but the mostly physics majors and a few math majors who took the course hated it with a passion. In retrospect, it is not well suited for that kind of class.

    I think you all are getting the organizational motivation for the book wrong though. The classical Stokes theorem, as well as the divergence theorem, Greens theorem, and so on are essentially multivariate generalizations of the Fundamental Theorem of Calculus. They related integrals to “anti-derivatives” in some sense. In principle, there are an infinite number of these types of theorems in vector calculus.

    The organizational driving force of Spivak’s book is to get a single “Stokes Theorem on manifolds”, which encompasses in a natural way all these as special cases.

    • Anonymous says:

      I should add I think Spivak gets the organization right. The machinery needed is generally useful and not developed for this one result. A single clean simple generalized Fundamental theorem of calculus (which he calls “Stokes Theorem on manifolds”) is very useful theoretically. And if you ever need one of those infinite variety of “stokes theorem” type results you can get that special case from the general theorem in s straightforward way. Pretty powerful stuff.

      Everyone else in the class hated it though, probably because it involves generalizing special cases only partially mastered by students at that level. A bigger problem in the long run is that most people who use the old classical Stokes Theorem, like Electrical Engineers for example, have no need for that added power.

    • Anonymous says:

      Incidentally the original “stokes theorem”, which is actually a much easier to derive special case of Spivak’s Stokes Theorem, didn’t come into the through the normal journal publication route. It was a question on math competition somewhat like the math Olympiad.

  2. hjk says:

    Another interesting one. There’s a lot I like about both your and Cook’s discussions.

    My two cents is that what is most important is to start from a clear (but probably heuristic) statement of the concepts – ‘conceptual definitions’ – and a suggestive notation.

    If these are sufficiently ‘powerful’ then they should lead directly to heuristic proofs of many useful results, e.g. Stokes’ theorem.

    The next step as you say is to then go back and fill in all the technical conditions – the ‘technical definitions’.

    So I suppose where I differ from your scheme is that I would still emphasise definitions to begin with – for the same reasons as Spivak i.e. that good definitions should be suggestive of good results – but simply push back the introduction of the technical definitions and rigorous proofs.

  3. Keith O'Rourke says:

    Folks have different preferences, if the the purpose is clear in the proof, mine would be like yours.

    > Bayesian statistics, for example, has a complicated formalism but is direct to apply; whereas classical statistical methods are simple


    Bayesian statistics: with p(X,u) and an x in hand, p(u) -> p(u|x) but then what exactly (e.g. which intervals, how, for what purpose, is it calibrated, should we check the prior? but some say that’s not kosher?)

    Classical statistical: Choose wisely sufficiency (equivalence class of likelihood functions), conditionality (watch out for relevant subsets), unbiasedness (via equivariance to Rao-Blackwellise), equivariance, finite population, asymptotics, first second and third order (learn differential geometry), convergence in ? to assess equivalence of procedures but then just use the 95% confidence interval (or if naive calculate p_value for Fisher or Neyman null but be sure to use similar tests).

  4. Rahul says:

    I thought that having had previous / parallel courses on Fluid Mechanics & Transport Phenomena (heat / momentm / mass) helped me appreciate Stokes much better.

  5. Mark Palko says:


    One of these days we’re going to have to have a good discussion about Pólya/Lakatos in general and How to Solve It/Proofs and Refutations in particular.

    In the meantime, this PhD dissertation has a good summary in “3.2.3 Digression: Lakatos and Pólya”

    Here’s an excerpt

    “The mathematician G. Pólya, in a number of works [40–42], studies mathematical discovery
    and heuristic and thus touches on many of the same issues that Lakatos discusses. Indeed, it
    was Pólya himself who suggested to Lakatos to focus on the example of Euler’s polyhedron
    formula. Lakatos places his own work in the context of Pólya’s.”

  6. Chris G says:

    > Spivak: There are good reasons why the theorems should all be easy and the definitions hard. . . .

    I disagree – and that may be one of the underlying reasons why I went from being a math major to being a chem major*. Definitions should to be as easy to understand as is feasible and constructed so that examples are easily formulated. I’m okay with lengthy/complicated proofs are manageable so long as I can conceptualize intermediate states. When definitions of fundamental quantities obscure the meaning of intermediate states I get lost quickly.

    (*I realized I was not a future mathematician after the section on ring theory in abstract algebra class. That and Green’s theorem gave me headaches. In contrast, atomic and molecular orbitals and the use of group theory to determine allowed and unallowed transitions made good sense to me. But Feynman and Hibb’s “Quantum Mechanics and Path Integrals” damn near induced an aneurysm. Big Picture: I do much better understanding theory when I understand how it relates to a physical observables.)

  7. Tom Dietterich says:

    I don’t recall the source, but I remember a seminar in which someone quoted the statement that “most of our theorems are right, it is the proofs that are wrong”.

    “right” is meant in the heuristic sense that the basic insight of the theorem is correct. But the exact definitions required to make it right often still contain minor errors.

    • Rahul says:

      I’m reminded of a story Feynman narrates: Back in his grad school days he apparently challenged his friends in the Math Dept. to write before him any assertion from one of their famous theorems & explain to him the definitions, notation etc. & Feynman would use his intuition to tell them whether it was true or false. No proof provided.

      As I remember, Feynman claims he was very very often right & left the Math guys pissed off.

        • Rahul says:

          I don’t know why you find him “pretty unlikable”. I think he was fantastic (though I don’t believe all his stories)

        • Chris G says:

          There were stories in “Surely You’re Joking” which implied he was a jerk – him putting the waitress’s tip under the inverted glass of water immediately comes to mind – meets the “Feynman story” criterion in Andrew’s link above. Feynman enjoyed being precocious. “Aren’t I especially smart? Bet you wish you were.” behavior is tolerable in kids. When adults engage in it it’s obnoxious. From what I remember of Feynman’s book he came across as a precocious teenager. He was incredibly smart but he did not possess an adult social self-awareness.

      • Christian Hennig says:

        I wonder why the Math guys perceived this as a problem.
        People can have good intuitions about Math allright, but still others should not trust them but rather the proofs.

    • hjk says:

      Tom: a nice phrase used by some mathematicians and physicists is ‘morally true’.

      There’s a nice essay called ‘Mathematics, morally’, that covers this:

      Also has a relevant discussion of the relation between practice (of mathematics) and philosophy.

      From the abstract:

      “A source of tension between Philosophers of Mathematics and Mathematicians is the fact that each group feels ignored by the other; daily mathematical practice seems barely affected by the questions the Philosophers are considering. In this talk I will describe an issue that does have an impact on mathematical practice, and a philosophical stance on mathematics that is detectable in the work of practising mathematicians.

      No doubt controversially, I will call this issue ‘morality’, but the term is not of my coining: there are mathematicians across the world who use the word ‘morally’ to great effect in private, and I propose that there should be a public theory of what they mean by this. The issue arises because proofs, despite being revered as the backbone of mathematical truth, often contribute very little to a mathematician’s understanding. ‘Moral’ considerations, however, contribute a great deal.”

      Relatedly, I’ve found my adoption of (or return to) some Bayesian methods to be based on hearing better ‘moral’ arguments for them (and then using them) rather than the standard philosophical arguments.

      • Rahul says:

        Can you elaborate on what you consider to be “moral” arguments for Bayesian methods? I’m trying to understand the moral vs philosophical difference.

        • hjk says:

          Hmm that’s a good but tough ask. I’ll give it a go. Morality is quite a personal concept so I’ll focus ‘me/I’ a lot!

          Firstly, as Cheng implies, the ‘moral’ perspective adopted by mathematicians carries philosophical implications – just not those typically addressed by philosophers. So here maybe mathematicians and philosophers are just two different ‘moral communities’ who find it difficult to communicate in the same ways that liberals and conservatives may?

          And as I said I became more convinced by finding *better* moral arguments or arguments that had more ‘moral content’ to me. That is, arguments that got at my feelings of how statistics/inference/science etc *ought* to work based on my experiences *trying to do it* (i.e. ‘living in the community’, to some extent at least).

          For example, a lot of the standard ‘philosophical’ arguments tend to focus on formalising and debating the ideas of say ‘rationality’, belief’, ‘subjectivity’, ‘objectivity’, how priors relate to these things etc etc.

          Personally, these debates don’t have much ‘moral’ content for me – they are unlikely to have any bearing on *why* I adopt one approach or the other. I find them essentially irrelevant – not because philosophy is irrelevant but because the questions chosen to be addressed seem barely relevant to my main concerns. Though they do turn me off all such debates a bit I guess.

          I have different concerns – e.g. how to formulate a flexible range of meaningful models (e.g. possibly nonlinear, dynamic) incorporating things like uncertainty, regularization, prediction and estimation in a natural way, a smallish set of reasonable ‘objects’ or concepts with a nice notation that suggests new ideas etc etc.

          So I might ask – is it better to think in terms of optimization or probability/sampling? Are my main objects of interest finite- or infinite-dimensional. Should I smooth? How much should I regularize?

          I’ve found that the Bayesian perspective is a reasonably nice, suggestive and compact way to think about these questions. Not quite perfect but a decent ‘moral perspective’.

          Focusing on questions about ‘rationality’ and ‘subjectivity’ of models and priors seems like asking about the ‘subjectivity’ of differential equations and solvability or boundary conditions – not *completely* irrelevant but close to orthogonal to the most important questions to me.

          PS Shalizi’s and Andrew’s paper is a nice example of a good moral argument to me – it even contains statements e.g. about induction/deduction that are probably technically incorrect but still ‘morally true’!

      • Keith O'Rourke says:


        Perhaps similar to the phrase “understanding rather than getting used to math”.

        From Cheng “The key to moral understanding is the question “Why?”. Why is such-and-such true?
        “Because we’ve proved it” is no answer at all!”

        Don Rubin once told me, “even if you want to/have to use a frequentist proceedure, think through a Bayesian approach as it is easier to understand things that way.”

        • hjk says:

          I like your last two points/quotes.

          Re: the first – I’d say a crucial part of developing a good morality is having some balance of *both* experience ‘in the real world’ (getting used to math – developing an intuitive/unconcious morality) as well as some distance/space to reflect on this experience (conscious understanding).

          Lack of time ‘getting used to’ things is a common criticism (fair or not) of philosophers by mathematicians and scientists.

          [Peirce of course was a philosopher/mathematician/logician/scientist! :-)]

  8. James Thompson says:

    I think it’s more likely that you meant to say “went up to him” rather than “want up to him.”



  9. Kaiser says:

    Great post, and totally agree. What about Stats 101? Shouldn’t we start with regression and work backwards? We start with counting, then probability, then sampling distributions, then LLN, then CLT, then …. by the time we get to regression, either the students have lost the thread or end of term is already here.

    • Elin says:

      I’m working on a new stats 101 course and I actually really think that start with regression is actually a great approach. Students actually have a good understanding of scatterplots and you can do that experiment that I think Tukey did where they draw a line by hand and calculate the slope and intercept. I actually have an idea about starting with GapMinder data. But I doubt I will be able to convince my colleagues.

    • Andrew says:

      Kaiser, Elin:

      What I actually like to do is start with comparisons (not simple averages, but comparisons with some causal or descriptive goal) and then start controlling for differences between the two groups, which naturally leads to regression.

      Treatments of regression often get confused because regression has different facets: it can be interpreted as:
      1. A specification of the conditional expectation of y given x;
      2. A generative model of the world;
      3. A method for adjusting data to generalize from sample to population, or to perform causal inferences.

      All three of these purposes are legit, but there’s a tendency in textbooks to pick just one interpretation and not clearly connect to the others.

      • Elin says:

        I definitely think starting with comparisons is important, just on substantive reasons it provides tremendous motivation for why the class is important, I try to encourage people even when they are teaching baby stats univariate calculations to have problems where you work on the distributions for subgroups (male, female; treatment, control; urban, suburban, rural; 2000, 2010), and then have students actually think about the results in addition to calculating.

        Is the choice of 1,2 or 3 disciplinary based do you think? That’s an interesting way to frame it and now I’m going to look at some text books and see what I can see about that.

  10. Christian Hennig says:

    Re the posting’s topic, often I think (although I’m not making claims about this particular theory) that the order in which people have developed things is a good guideline for how to teach them. Granted, one needs to avoid throwing too much at the students that they ultimately don’t need because later things were polished, errors were removed, and things were made more elegant and general. But still, I think that the original motivation to do things is very instructive, and the original way to approach a problem tells students more about research practice than any order that is “optimised” later for elegance, or even for some kind of didactic reasons (some didactic reasons may be OK, I accept that).

  11. Eric Rasmusen says:

    Actually, I don’t think you mean:

    Step 1: The proof. (first week of class)
    Step 2: The counterexamples. (second week of class)
    Step 3: Going backward from there,

    The Lakatos style is more like:

    Step 1: A problem we’d like to solve, and a conjecture.
    Step 2: Find a counterexample to that conjecture. Fixing up the conjecture.
    Step 3: After a little of that, stating the actual theorem.
    Step 4. Proving the theorem, with emphasis on which counterexamples each part of the assumptions is ruling out.

    Could one teach regression like that? Hypothesis testing, probably— I wish I’d started with just thinking about the problem of inference and how one might think about tackling it.

  12. Floundunder says:

    I think that Spivak is not a very good example, although I did not like his book either (I can’t remember why. Perhaps a lack of specific examples which can be calculated. I would recommend R.W.R. Darling, “Differential Forms and Connections”, as an alternative. It develops everything for manifolds embedded in R^n before going to abstract manifolds and has pictures and explicit calculations.)

    To prove the general Stokes’ Theorem, you need to state it. To state it, you need to know what a differential form is. To know what a differential form is, you need more than a week of work as they are rather difficult to comprehend. Tao wrote a good introduction here:

    I think that Spivak really assumes that you have already taken a course in multivaiable calculus and seen a heuristic proof of the 3d Stokes Theorem. Without that background, I think his book would be incomprehensible!

Leave a Reply