Andrew Sutter writes,

I’m a practicing attorney with a background in physics. I’ve recently begun reading a lot of papers by economists, on topics like intellectual property, economic development and world trade. I’m pretty comfortable reading many technical papers in engineering and physical sciences for my work, so I’m not a quantitative basket case. Nonetheless, I am stupefied by some of the literary conventions of this new genre.

Whoever decided it was great idea to present one’s findings in the form of acres and acres of regression coefficient tables? (Using dense, opaque clusters of capital letters as captions for the rows or columns is another odd notion, but easier to handle.) Why not use pictures? Of course, from what I’ve seen of your blog, I think I may be preaching to the choir here (or to the bishop, so to speak).

Aside from my puzzlement at why social scientists like to spill so many digits, I’m stuck with the more practical problem of how to interpret these tables. I have a basic idea of what a regression is, and what correlation is. Is there some shortcut to interpreting these tables for someone who (i) is unlikely to run a regression himself in the foreseeable future, and (ii) doesn’t have time to wrestle with a 600-page econometrics textbook?

Is the gist of it that, if I want to find what the authors will claim to be the biggest effects, I should look for the coefficients with the biggest absolute values and the tightest significance levels? Should I worry about the second row of numbers in each box, which is usually in parentheses, but that sometimes represents t-value, other times standard error, or something else? Is it safe to ignore those on a first reading?

If there isn’t such an easy answer, then can you recommend any concise (say, <300 pp), real-world-oriented book you can recommend that could give me this background – esp. at a price below the now traditional $94 quantum for sucking money from students? Or can you point me toward some webpage or paper that has been designed as a guide for the similarly perplexed? (Most Internet resources I’ve found so far are geared to those who will be running regressions themselves, and/or relate to a particular piece of software, and/or are very abstract.) I look forward someday to digging into the epistemological and interpretive questions you discuss on your blog. But in the meantime, I’d first like to know what these economists are saying in their fantasy models of patent licensing and such. Is mine a vain hope?

My response: for a quick start, the importance of a predictor, within the context of the multiple regression, is basically the absolute value of the coefficient, multiplied by the magnitude of a typical change in that variable. (For example, many predictors are simply 0 or 1, but if a predictor runs from 1-7, then a typical change might be 4 (going from 2 to 6). The standard errors tell you the uncertainty in the coefficients but not their importance. That said, things are more complicated. For example, two predictors can be jointly important, even if neither looks so big alone. And models typically include interactions, so that a single input variable can enter into many predictors.

Why do they use tables instead of graphs? An economist (or sociologist) could answer better than me, but my guess is a mixture of:

– Tables are standard, and econ journals are pretty conservative regarding graphical presentation.

– Graphs take more work than tables to produce.

– Students learn what they are taught, thus the tablular format is self-perpetuating.

– When graphics are studied in statistics, it’s usually in the context of plotting raw data, not plotting estimates.

– Numbers are unambiguous and objective. Perhaps there’s a fear that you could cheat or mislead using graphics. Hence, the econ papers I’ve seen with graphs also have tables. They don’t seem to trust graphs on their own.

Of course, when it comes to books, I recommend my own (forthcoming) book with Jennifer Hill, Applied Regression and Multilevel Models. It’s twice the length you want (sorry!) but will only cost $40 and has lots on how to understand multiple regression models. And most of what you want is in the first 230 pages…

Thanks, Andrew.

The last of your guesses about why there's a preference for tables is especially intriguing.

Physics and other natural science papers are much more likely to present graphs in the body of the text, with numerical tables as appendices or as web-only supplemental information. (Obviously there are some exceptions, such as when the point of the paper is to catalogue, say, work functions or other parameters for a zillion different materials.) Why might graphs be trusted more in these fields?

Could it be because people in the natural sciences tend to assume that their science *is* "objective" — they feel they don't have to prove that it is? (I'm speaking of attitudes, not "objective" reality.) Whereas social scientists have more of a chip on their shoulder about this point?

Ironically, graphs have been central to uncovering some cases of scientific fraud in the natural sciences. E.g., the breakthrough in Jan Hendrik Schoen scandal in 2002 came when a couple of people noticed that the noise in one of his Nature figures was the same as in a figure from one of his earlier papers. Would this have been so easy to uncover if people had had to wade through piles of numbers (especially if the numbers had been fiddled a bit within the error bars)?

Nor has the use of tables meant that different social scientists always come to the same "objective" conclusions when they study the same problems. A lot of the results presented that way by economists etc. are debatable – and debated. Maybe the emphasis on *appearing* objective is a kind of non-verbal rhetorical ploy?

So your answer may very well be right — but I hope someday someone does get around to looking at this from a sociological point of view. And maybe by that time I'll be able to understand their tables, too.

A good guide for attorneys would be the Reference Manual on Scientific Evidence published by the FJC and available online (see link under my name). There is a chapter on statistics, and a chapter specifically on regression (a 50 page primer written by Dan Rubinfeld).

… "Why do they use tables instead of graphs? "

As an undergraduate in economics, I'm starting to get pretty used to the tables. But I'm curious: how are regression results presented in other fields?

A most useful book: "A Guide to Econometrics" by Peter Kennedy (MIT press). It's $36 from Amazon for the fifth edition; earlier editions are avail. for less from the standard sources. It's a great, highly readable overview of the main frameworks that economists use for looking at non-experimental data. All the technical material is pushed into appendices; the emphasis is on intuition and statistically correct reasoning.

Another plausible reason why economists use tables: in the olden days, most econometric models were structural, and the point estimates thus had (or were believed to have) real economic significance (e.g., elasticities of supply and demand).

I guess I'll step into my usual role as the devil's advocate here, by defending the lowly regression table. Suppose (as is often the case) I have six or seven or more explanatory variables, and I'm interested in how well they predict something, both individually and in various combinations. In one small table, I can give coefficient estimates and measures of model fit for maybe a dozen different models. Why would I not do this?

Of course, if appropriate (as it usually is) I should _also_ choose one or two or three models to explore in more detail, and this will probably involve various graphical displays. But there's no way to compactly present a dozen different models using graphical methods, except in special cases.

Many thanks for all the recommendations to references that could help me out of my perplexity.

I'm reassured that even Anonymous seems to agree that efficiency and intelligibility aren't necessarily the same thing. So graphs and tables shouldn't be mutually exclusive. It's the frequent preference for exclusively presenting tables (see, e.g., all of the papers in Fink & Maskus, eds., _ Intellectual Property and Development_ (OUP: 2005)) that I find so amazing, exasperating and exhausting.

Anyway, the issue of graphing may be beside the point. Even if there were some way of using gradations of monochrome shading to highlight the most significant results in a table, that would help the intelligibility issue. It would certainly help to guide the eyes of policy-makers, managers, lawyers and others who have some instrumental reason to read the paper.

Malicious, incompetent or simply mistaken authors could of course mislead a reader by inappropriate shading. But presumably there's always some population of critical readers anyway; and the shading could help them too. Analogous to the Schoen example I mentioned above, a device like shading could actually amplify the signal of an author's incorrect or unwarranted interpretation.

BTW, in the math and physics world regressions aren't used so much; but there's still some analogue to this cultural divide. On the one hand is "Bourbakism" (as dubbed by VI Arnol'd), a French school of thought that promulgated the perverse idea that especially when you're writing a treatise on, say, geometry, topology or graph theory, you should do so without using any pictures. On the other are people like Feynman, John Wheeler and Arnol'd himself, who like to think visually and are generous with illustrations.

I can't believe that even the Bourbakistes think their books are more fun, or easier to understand. There's a social subtext to their style, viz. that greater austerity signifies intellectually superiority. Perhaps such attitudes have leaked into the social sciences too.

Both have uses. A graph can't reveal exact values, but can make detecting trends and cycles very easy. I always provide both with my regression model workups.

I'll second the recomendation for Kennedy.

I have found the Verbeek's "Guide to Modern Econometrics" is even better. It has slightly more rigor than Kennedy but is very comprehensible. There is not a large difference between the 1st and 2nd editions.

Economists usually use multiple regression, with, say, six explanatory variables, and then run several specifications (e.g., dropping outliers, adding some new explanatory variables, etc.). A table can show a lot of info very concisely. How would a graph even start to do this? (If there's a way please tell me– my question is not rhetorical.)

Andrew S.,

Eric Rasmusen's post is — maybe unfortunately — quite apt. The regression summary statistics table (evolved over several decades) is currently the most efficient way of conveying the information of the analysis. I suspect what you lack is a feel for the data to regression summary transformation. Applied economists (political scientists, sociologists, etc.) glance at those tables and say, "OK, I know what you've presented. Now tell me the problems." Peter Kennedy's book (noted above) is almost entirely addressed to how the basic model can be changed to address the various problems. If the tables bother you, I would not recommend Kennedy, Marno Verbeek's or any book on the regression model. You are most likely interested in the meaning of the results (and of course what could be wrong, but that comes later) not the logic of the process. However, go read some of the excerps at Amazon.com and see. (And also look at an elementary introduction to biostatistics where the treatment is usually more succinct than in the social sciencies or some of the baseball "SABRmetrics" books where the explanation is for an audience usually less attentive to statistical technique than a good lawyer.)

From my experience, what I think you would find most useful is a small, well crafted[*], and well annotated example with a simple software (the Excel spreadsheet is good enough at this stage) calculation that allows to see the effects of some simple changes or problems. You see the "model" (the thing you are trying to represent); you see the data; you see the regression model and the summary table. I suspect that you can both generalize and realize there is a lot more. You get someone decent to give you an hour and the tables shouldn't bother much. Then hit the books (but don't worry too much about the equation to equation derivation stuff).

[*] The well crafted example will take you through the one explanatory to multiple explanatory variables and in the process show you one the "problems" addressed in econometric and such texts. The example can be set up to give you a couple other "problems," not so much to give you a check list, but rather to improve the feel of process.

Eric,

A graph can indeed display the results from several regressions–and do it far better than a table, I think, and taking less space. The key is that the single display can be a grid of small plots. See, for example, Figure 4 of this paper.

However, I agree that it's not always easy to make these plots. One of my goals is to write a program in R (which could then be translated into Stata) that would automatically display estimates and uncertainties from several fitted regressions. If done right, I really think it would take up less space and be clearer than the usual tabular display.

Martin,

Based on many papers and talks I've seen over the years, I think that tables of regression coefficients hide a lot of information. My own applied research would be much poorer if I had stopped at tables and not displayed my inferences graphically. That is, I agree with you that a lot can be learned from these tables, but I think that even more is possible with appropriate graphical displays.

Andrew G,

You say "…over the years, I think that tables of regression coefficients hide a lot of information." Amen. You and I — or just about any group of regression "users" — could give example after example and then come up with our pet peeves. But that is not the issue for Andrew S now. You and I have found our way of dealing with the standard summaries (and maybe, over time, we can change them).[*] I further bet that Andrew S. and I would say something like "That's neat" when you showed us a couple graphical alternative presentations of regression results.

But that is not the issue. Those tabular summaries are here for awhile, and that subsample of people who present and use the tables, with more or less disdain, have some responsibility for making the connections between the commonsense statement of the model in question, the data used for inference, and those tables that summarize that inference. After that, and here is where I may disagree with you, and only after that do we get the "problems" (my term) or the issue of "stopping [after reading the tables]" (your term).

Given I have taken this much [of my effort and your time], let me ask this: Suppose you had prepared a well crafted example (full of graphs) and you had Andrew S. sitting next you at a terminal. How long would it take you convey — not the preparation time, just the explanation time — the crude fundamentals of the summary so that Andrew S. could both understand it and refer back to the print-outs to refresh his memory in the future? (Note, I use Andrew S. as a proxy for an intelligent and interested individual because the population mean time may be infinite. I ask the question for a subset of the population. I also should note, the understanding is not that of a good student in your course, but of an lawyer who is reviewing a paper by an expert he will either use or examine later.) Now how long would it take Andrew S. to acquire that same feel from a textbook or article?

[*] Thirty — well make that closer to 40 — years ago when I worked with the editorial staffs of economics journals, I had a little to do with producing the current, imperfect state of statistical presentation. I think I am being fair when I say that the objective was not the best summary, but the summary that best captured the consensus of minimally informative.

Thanks again to Andrew G. and to Martin R. for their suggestions. Martin, you are absolutely right on several counts. Both Kennedy and Verbeek are rather long for my modest near-term needs, and Verbeek definitely looks like overkill even in the long term. Rubinfeld looks like a reasonable place to start, after which I might treat a local postdoc or two to lunch here in the Palo Alto area. (It was such a lunch a few weeks ago that piqued my interest in this whole issue, when a couple of postdocs told me that the regression coefficient tables often were the *only* things they read in some papers.)

Also correct: I certainly did say "That's neat!" when I saw Fig. 4 in Andrew G.'s paper. So although I accept that the tables will be around for the near future at least, it's a very grudging acceptance after seeing how things *could* be done.

One assumption that missed the mark, though, is that my needs relate to litigation. I'm actually a transactional attorney; I started reading econometrics papers only because of an interest in some policy issues relating to IP, venture capital and development issues, especially in China. I wouldn't bother to mention that but for one point: while I might be just a kibitzer, real policy-makers are also among the consumers (or potential consumers) of many such papers. I imagine that they're even less patient than lawyers when it comes to figuring out this stuff. Graphs and similar aids to understanding could probably make the hard work of most readers of this blog more useful in the real world.

For posterity and rigorous science, the table is more useful. And books like Wilson as long ago as 1950 were imploring that scientists should use more tables and less graphs. But the fight seems to have been lost.

Graphs are more generally useful for seeing relationships and understanding a story (communicating).

I think perhaps natural science tends to use more graphs then dismal science because (in theory) the experiments are more simple and definitive and less of a statistical tea-leaf reading overfitted model game.