Skip to content
 

An Excel add-in for regression analysis

Bob Nau writes:

I know you are not particularly fond of Excel, but you might (I hope) be interested in a free Excel add-in for multivariate data analysis and linear regression that I am distributing here: http://regressit.com. I originally developed it for teaching an advanced MBA elective course on regression and time series analysis at Duke University, but it is intended for teaching data analysis at any level where students are familiar with Excel (and use it on PC’s), and it is also intended for serious applied work as a complement to other analytical software. It has been available to the public since May 2014, and a new version has just been released. If I do say so myself, its default regression output is more thoughtfully designed and includes higher quality graphics than what is provided by the best-known statistical programming languages or by commercial Excel add-ins such as Analyse-it, XLstat, or StatTools. It also has a number of unique features that are designed to facilitate data exploration and model testing and to support a disciplined and well-documented approach to analysis, with an emphasis on data visualization. My frustration with the stone-age graphics output of the leading regression software was the original motivation for its development, and I am now offering it for free as a public service. Please take it for a test drive and see for yourself. I’d welcome your feedback.

I don’t know Excel at all so I can’t take it for a test drive . . . but I bet that some of you can! Please share your thoughts. So many people use Excel that an improvement here could have a huge effect on good statistical practice. I don’t know if Reinhart and Rogoff read this blog but there must be some Excel users in the audience, right?

P.S. Nau wanted to share some further thoughts:

It may appear at first glance as though there is little that is new here: just another program that performs descriptive data analysis and plain old linear regression. The difference is in the details, and the details are many. Every design element in RegressIt has been chosen with a view toward helping the user to work efficiently and competently, to interactively share the results of the analysis with others, to enjoy the process, and to leave behind a clear trail of breadcrumbs. In this respect, RegressIt is a sort of “concept car” that illustrates features which would be nice to have in other analytical procedures besides regression if the software was designed from the ground up with the user in mind and did not carry a burden of backward compatibility with the way it looked a decade or two ago. Also, it tries to take advantage of things that Excel is good for while compensating for its lack of discipline. The design choices are based on my own experience in 30+ years of teaching as well as playing around with data for my own purposes. When a student or colleague or someone on the other side of the internet wants to discuss the results of an analysis that he or she has performed, which might or might not be for a problem whose solution I already know, I want to be able, with a few mouse clicks, to replicate their analysis and drill deeper or perform variations on it, and compare new results side-by-side with old ones, while having an armchair conversation. I might also want to do this on the spur of the moment in front of a class without worrying about my typing. When I am looking at at one among many tables or charts, I often wonder: what model produced this, and what were the variables, what was the sample, when did the analysis take place, and by whom? What other models were tried before or afterward, and what was good or bad about this one? If a chart is just labeled “Residuals of Y” or “Residuals vs. Fitted Values”, that is not very helpful, particularly if it has been copied and pasted into a report where it takes on a life of its own. And when I look at the output of a model on the computer screen, I want to see as much of it at one time as possible. I want an efficient screen design—ideally one that would look good in an auditorium as well as on my desktop—and I want easy navigation within and across models. I would rather not scroll up and down through a linear log file that reminds me of line-printer days (which I do remember!) and makes it hard to distinguish the code from the results. I would like to see a presentation that by default is fairly complete in terms of including some well-chosen chart output that allows me to engage my visual cortex without saying “yuck”. And I want the same things if the original analyst is not a student or colleague but merely myself yesterday or last week or last year.

I hope you will give it a close look, kick the tires, and take it for a drive with some data of your own. And please read everything that is on the features and advice pages on the web site. Otherwise you may overlook some of what RegressIt is doing that is novel. And whatever you may think of it in the end, I would welcome your input on improvements or extensions that could be made. Is there any low-hanging fruit could easily be added, or is there some deal-breaking omission that absolutely needs to be fixed? We can make changes in a hurry if we have to–there is no calendar of scheduled releases. We are two professors who work on this in our spare time. RegressIt’s feature set is limited at present, but our hope is that the features it does include will be useful in some circumstances to people who do most of their work in R or Stata and well as to people who do most of their work in Excel, and we plan to add more to it in the future. Thanks in advance for your input!

27 Comments

  1. Bruce McCullough says:

    I went to the regressit site and poked around. Perhaps I missed it, but I saw no evidence of the accuracy of the various procedures offered by the package.

    Is there any reason other than naive hope for me think that the statistics returned from regressit are accurate?

    There exist several benchmarks for such packages. See, e.g.,

    B. D. McCullough
    “Algorithm Choice for (Partial) Autocorrelation Functions,”
    Journal of Economic and Social Measurement 24(3/4), 265-278, 1998

    B. D. McCullough
    “Assessing the Reliability of Statistical Software: Part I,”
    The American Statistician 52(4), 358-366, 1998

    B. D. McCullough
    “Assessing the Reliability of Statistical Software: Part II,”
    The American Statistician 53(2), 149-159, 1999

    B. D. McCullough
    “Experience with the StRD: Application and Interpretation,”
    Computing Science and Statistics 31 , 16-21, 2000

    B. D. McCullough
    “Wilkinson’s Tests and Econometric Software,”
    Journal of Economic and Social Measurement 29(1-3), 261-270, 2004

    B. D. McCullough
    “Testing Econometric Software,”
    Chapter 28 in Handbook of Econometrics, Volume 2
    Mills and Patterson, eds., New York: Palgrave, pp. 1293-1320, 2009

  2. Matt DeMonte says:

    The output looks pretty nice for the limited features it supports, considering that it’s free. The charts and reports are similar to commercial software packages with GUI’s such as JMP and Minitab. It’s not really breaking new ground (even considering the reproducibility of the analyses). Free is better than not free, though! Also, it’s a huge upgrade for business manager types doing an occasional regression on their sales data. These folks aren’t going invest the time to learn how to use a comprehensive software package, but know how to use Excel and can be taught the basics of regression in a day or two. There are a lot of people like that in the world, so I think it makes a lot of sense for MBA’s to use.

  3. Rodney Sparapani says:

    RegressIt only works on the Windows version of Excel. This is totally unacceptable! To be taken seriously, it needs to work on both versions of Excel. I can’t recommend spending time on anything that does not provide cross-platform compatibility.

    • Dan Hicks says:

      As I understand it, the current Mac version of MS Office is a port of the 2010 Windows version, and the combination of being one version behind and a port makes writing add-ins for both Mac and Windows a massive PITA.

      This is the same reason why PowerPoint slideshows written on one platform get screwed up when displayed on the other platform.

    • Rahul says:

      Strong words. Although cross platform is nice, there’s tons of software out there that only works on one OS. Porability needs time & effort. One has to start somewhere. Maybe they will port it to more OS’s in the future.

  4. zbicyclist says:

    I’ll take a look. Because of tenure track maternity leave, my adjunct contract included 4 sections of business statistics last year.

    The Excel built-in regression package is inexcusable. I don’t see any defenders here, so I won’t elaborate. As a result, I’ve switched them over to Minitab for regression (the last 4 weeks of the course). Minitab works fine, but gives me another software learning curve that they see as of little value.

    • Steve Sailer says:

      The lack of improvement in Microsoft Office over the last 15 years or so is shameful.

      I can recall in the 1990s when Microsoft Word brought out its grammar and style checker, which was pretty nice for the time. Then Microsoft hired the well-known journalist James Fallows of the Atlantic Monthly on a six-month contract to help them come up with improvements for it. I sent him a list of simple additions that he really liked.

      Next thing I know, Fallows is back at the Atlantic and Microsoft does absolutely zilch to improve the style checker.

      • Rahul says:

        +1

        Look at the awful default colors that were there in Excel for so long. Or the default plot settings. Or some of the notorious bugs with Regression etc. Even when using Word for Complex documents the crashes and pitfalls are so well documented that there are entire pages written by MVPs on those.

        They could have been so easily fixed so long ago.

      • Bugs in Excel have become relied-upon features. People who care about those bugs have moved on… so the bugs don’t get fixed. This is at least one reason.

  5. Keith O'Rourke says:

    > know how to use Excel … There are a lot of people like that in the world, so I think it makes a lot of sense for MBA’s to use.

    Its the trade off of low barriers to end user adoption to high barriers of making do with not the most appropriate software.

    Here I do suspect lower barrier to accessing more appropriate software using Excel as simply the interface.

    With the usual warning from Yogi – http://blog.rstudio.org/2015/06/30/accelerating-r-rstudio-and-the-new-r-consortium/
    (i.e. Microsoft is likely put R interfaces into most of their products.)

    • Rahul says:

      Microsoft’s being a part of this makes me worried about R’s future.

      It’s like a Michelin star restaurant becoming part of a McDonald’s franchise chain.

      • Andrew says:

        Rahul:

        I dunno, R’s not quite a Michelin-starred restaurant. It’s more like a food truck.

        • Rahul says:

          The food trucks you know must be awesomely good.

          Well, ok, at least the top morsels from R’s menu are Michelin star quality. :)

        • I would say if R is like a food truck, it’s a TARDIS food truck that’s bigger on the inside and offers a vast array of multi-ethnic cuisine which, if not all entirely tasty, is at least A rated on sanitation.

          • I mean, unlike (some versions of) Excel, you’re never going to see R return a negative value for the sample variance.

          • Rahul says:

            If R is not Michelin stars, what is?

            • Well, what does it mean to be Michelin starred? I suppose more stars mostly mean “More luxury”, both ambiance and fanciness of presentation, and ingredients and techniques that take massive amounts of time or effort….

              R is more like a well stocked kitchen than a prix fixe luxury experience…

              R gives you sharp knives, high quality cookware, a very adjustable gas stove, a high quality blender, a good food processor, and enough twine to either truss a turkey or hang yourself whichever you are more likely to do :-)

        • Keith O'Rourke says:

          It’s Alice’s restaurant – you can get anything you want excepting the most popular interface for various communities.

          • Rahul says:

            I’m not sure I understand what you are trying to say with the interface comment.

            Are you saying R is the Jack of all trades but not really good at anything?

            • Keith O'Rourke says:

              Rahul:

              Not at all
              1. there is an almost unlimited menu of packages, some fantastic and some not so good, and you freely get to make a choice.
              2. you can’t (currently) make that choice (run code in R packages) from popular software programs like Excel (i.e. using Excel as an interface.)

              “you can get anything you want (1) excepting the most popular interface (2) for various communities”

              Now later in the song, in three part harmony – “Honest officer Oggie, we put that ton of cash into various R foundations and bought the largest service up grader to prevent other companies from preventing everyone from using R”

              OK my guess is it was to do in SAS and IBM (SPSS) and other competitors and I think their competitors business strategists have not gotten very much sleep since the announcement.

              There does not _seem_ to be plans to control R but rather simply enable access to it in the various platforms – why would they want to discourage the ongoing invaluable development work that takes place in the large and growing R community?

    • Andrew says:

      I inherited an Excel forecast and find it cumbersome to move data in and out of R for simple things like tests of stationarity or corellograms. It would be very useful for me at least to be able to run those in Excel itself.

  6. Jim Savage says:

    Hey Bob – nice work. It looks as though it’d make some lives easier.

    Just a nit to pick though on the example given for linear regression on the website: it suggests that putting price on the right hand side is a good idea. I spend half my life fighting people who want to put endogenous variables on the right hand side and give causal interpretation to the coefficient (in this case, the expected change in sales given an exogenous change in price). So it’d be good if the website didn’t encourage this sort of misguided thinking.

    Or perhaps a further add on? Allow an IV estimate where folks can select several columns of truly exogenous cost shifters? That’d make good practice easy.

  7. Andrew says:

    I am in a forecast role and refer to Mr. Nau’s online notes constantly. They are very helpful for a lowly technician implementing the various time series models. I’ve tried regressit in the past (one of my coworkers at a previous job had just finished his MBA at Duke and suggested it).

    Either way I was not able to get it working at the time but after seeing this post will probably try again. It’s sort of cumbersome to move data in and out of R all the time especially for something as straightforward as detecting autocorrelation — something that is exceedingly complex to implement in excel without addins.

  8. Bob Nau says:

    I’m glad to hear the forecasting notes are helpful. Please contact me if you are still unable to run RegressIt. As long as the Windows version of Excel, dated 2007 or later, is used, it ought to be possible to get it working by adjusting security settings or identifying conflicts with other software that can be temporarily shut down.

Leave a Reply