Displaying regression coefficients and standardized coefficients

Manoel Galdino pointed me to a discussion on the Polmeth list on the topic of reporting p-values and regression coefficients. (The polmeth listserv doesn’t seem to have a way to link to threads, but if you go here for March 2009 you can scroll down to the posts on “Displaying regression coefficients.”) I don’t want to go on and on about this, but in the interest of advancing the ball forward just a bit, here are a few thoughts:

1. I like standardized regression coefficients; see here for a full discussion with examples.

2. Discretizing predictors can also be useful. With a good discretization, you can present the equivalent of regression results in a more interpretable way; see here.

3. I agree with everyone on that thread who said that it’s not a good idea to report p-values. Larry Bartels puts it well: “Simply reporting p-values will only be informative to readers who happen to be interested in the null hypothesis that the relevant parameter value is zero, which I [Larry] almost never am.” I would go even further and point out that, in social science, I already know ahead of time before collecting any data that the null hypothesis is false!

4. In medical journals, it is becoming common practice for regression results to be presented graphically. This is definitely the way to go, in my opinion.

5. To avoid a graphs vs. tables flame war, let me try to focus the discussion on the question of what to communicate. Larry’s post quoted by Galdino had the following bit:

When space constraints necessitate a choice my [Larry’s] preference will usually be for the table. . . . if the point is simply to report regression coefficients, graphs are likely to take more space and convey the relevant information much less precisely. . . . accuracy and efficiency should often take precedence over “present[ing] your work in the most compelling visual manner” . . .

For me, the sticking point is the phrase, “simply to report regression coefficients.” It’s rare that this has been my research goal. And, the one time I can say that this clearly was our goal–my 1990 article with Gary estimating the incumbency advantage–it was still better to display the estimates graphically. In another example where we did present estimates of regression coefficients, our most important conclusions were shown by a pair of graphs. These graphs did incorporate fitted regression lines. In yet another case, I really did want some numbers, but (a) they were not raw regression coefficients, and (b) they were best portrayed graphically (in this article, see Figure 4, which I admit is a bit messy; if I were to redo it, I’d make a grid of graphs rather than just two very busy graphs

Finally, when there is a single coefficient or two that are of particular interest, it’s perfectly fine to give the number in the text (for example, “6.3% with a standard error of 2.3%”). No need to have a table with dozens of numbers. You can graph what is important and then supply the key numbers in the text as necessary. I can accept that there may be occasional examples where you really want a table of numbers (for example, a report of economic performance in 24 different industries, where the reader of the article might really want to know that the return on investment in a certain sector is 3.8%, or whatever), but that certainly doesn’t come up much in political science articles, as far as I am aware.

Returning to the above quote: the question is, what is the “accuracy and efficiency” you are looking for? Ultimately it’s about conveying what you found in your research, conveying this to yourself as well as to others. Larry has written a whole bunch of important and influential articles on public opinion and politics, and tables have worked for him as a way of organizing his results. I wouldn’t suggest for one minute that Larry change his ways. For those of us with less of a track record, however, it might be worth thinking a bit about what information we’re trying to convey, and whether the reader–even the scholarly reader–needs to know that the coefficient for “age squared” in your regression is 0,0000213 with a standard error of 0.0000079. Maybe you’re better off just explaining that you controlled for age in your analysis, or displaying the coefficient graphically, along with all the others, after an appropriate standardization. In my opinion, you should display this “0,0000213 (0.0000079)” only if you have a good reason to do so.

6. I’m a reformed sinner myself. See, for example, Figure 1 here. In my defense, nobody ever told me I should’ve graphed it–an excuse nobody has anymore!–but . . . how could I have thought that a reader would want to know, for example, that my posterior distribution for the volume of poorly perfused tissues for patient A in the study had a geometric mean of .649 and a geometric standard deviation of 1.04?? I mean, what was I thinking? I was focusing on the proximate goal–recording the numbers–rather than the ultimate goals of understanding the methods and the fitted model and its implications.

P.S. But it’s good to see an anti-graph backlash. This tells me that the method of displaying research results using graphs–or, more specifically, the idea that graphical display should be the default–is making progress.

5 thoughts on “Displaying regression coefficients and standardized coefficients

  1. I think is healthy for people to use whatever they think is best. However, my concern is that tables are became the norm so that we're "forced" by the referees to used them anyway. For instance, in the economic journals you really can't use graphs.

    Related to this discussion, tables also encourage people to report essentially meaningless numbers, such as coefficient for interactions linear models or even coefficient for non-linear models (logit, multinomial models, etc). Then, in the text, they provide the interpretation in terms of probability or whatever. For me this is odd, because this probabilities are really the quantities of interest. However, people do this in the best journals in political science.

    Maybe we should organize an special edition of the POLMETH about how to display data and results in political science research: what do you think?

  2. If there is access and documentation to the data and analyses carried out – no detailed information need be given as it can be acquired to any level of accuracy desired.

    If not, then at least coefficients and their standard errors arguably should be supplied so that others can see if they or others replicated your findings (only chance differences).

    As for neglecting nuisance cofficients (age^2) – one man's nuisance parameter is another man's interest parameter ;-)

    Some discussion of the purpose of publication perhaps is in order here and even Tukey's warning to not change the way people do science based on mere technical knowledge…

    Keith

  3. I think that tables could be better some years ago, when a good graphic was not so easy to make.

    However, nowadays the only constraint is the ability of the researcher to make good graphics. So, it would be nice if we have more stuff on this subject (for stuff I mean books, articles etc.).

    Good examples are the article by Eduardo Leoni and Kastellec about graphics and this link (http://addictedtor.free.fr/graphiques/thumbs.php?sort=votes), provided by professor Gelman in this Blog some time ago.

    Besides, since we move more and more to eletronic formats, colorized graphics and other tools more web based can be used by reasearchers, making tables (and even some graphics) inefficient.

  4. In clinical trials, we are struggling to get out of the 60s, at least regarding displays. Some clinical trial reports have thousands of pages of tables and listings.

    Me? I'd personally prefer to start with graphs, but have interactive drill-down to explore things like outliers, if needed. It is also helpful in some cases to have the actual numbers (e.g. actual drug effect among males over the age of 65), not just the relationships with other numbers as is shown most effectively by graphs. I think we have a long way to go (and many more graphs to show!) before striking the right balance between figures and text.

  5. Regarding #6 above:

    I'm wondering if someone can help me understand the relationship between a coefficient and its standard error. I have a feeling that a low n in one of my subgroups is resulting theoretically in greater standard error and less opportunity to detect an effect, but don't know whether I have to actually check the numbers to confirm or just go off the theory. And do you have to evaluation the size of the standard error in relation to the size of the coefficient? For example, how would you evaluate a coefficient (logistic regression) of .13 with a standard error of .11?

    In general, if the standard error is very close in size to the coefficient, does one conclude that that's not good?

    Thanks for any advice!
    Rebecca

Comments are closed.