A trick to speed up R matrix calculation

This is another example of why defaults matter a lot.

I got an email of Evan Cooch forward by Matt, saying that there exists a trick to speed up R matrix caculation. He found that if we replace the default Rblas.dll in R with the proper one. It can boost R’s speed in doing matrix caculation.

The file is here (This file only works under Windows). For Mac and Linux users, see here.

Here are the steps to replace the Rblas.dll file (for Windows users):

1. Check what kind of processor (CPU) your PC or laptop is using (My computer –> Property). Download Rblas.dll from the corresponding directory under http://cran.r-project.org/bin/windows/contrib/ATLAS/.

2. Go to your R directory tp locate where the Rblas.dll is, for example, c:/program files/R/R.2-7.0/bin. Rename it as Rblasold.dll so that if the new Rblass.dll doesn’t fit, you can use the old one by renaming it back.

3. Copy the new Rblass.dll you just download in this folder.

4. Restart R!

Here is an example to test:

X <- Matrix(rnorm(1e6), 1000)
print(system.time(for(i in 1:25) X%*%X))
print(system.time(for(i in 1:25) solve(X)))
print(system.time(for(i in 1:10) svd(X)))

Here is a test result on my machine (Intel Pentium (R) M process 1.73GHz with 1 GB RAM).

Default Rblas.dll

> print(system.time(for(i in 1:25) X%*%X))
   user  system elapsed 
 114.19    0.38  121.04 
> print(system.time(for(i in 1:25) solve(X)))
   user  system elapsed 
  87.03    0.28   89.31 
> print(system.time(for(i in 1:10) svd(X)))
   user  system elapsed 
 232.29    1.44  242.64 

New Rblas.dll

> print(system.time(for(i in 1:25) X%*%X))
   user  system elapsed 
  37.18    0.36   37.89 
> print(system.time(for(i in 1:25) solve(X)))
   user  system elapsed 
  30.62    0.56   31.78 
> print(system.time(for(i in 1:10) svd(X)))
   user  system elapsed 
 102.89    2.17  107.17 

Overall, R with the new Rblas.dll is twice faster than R with the old one. Now we wonder why this isn't a default design for R.

7 thoughts on “A trick to speed up R matrix calculation

  1. I was under the impression that the Mac version of R comes from an appropriately compiled Rblas. I know the default Mac R is much, much faster in matrix computation than the default Windows R, for this reason.

  2. Well, use of Atlas (and Goto Blas which cannot be re-distributed) has been clearly documented in the 'R Admin' manual for ages/

    Also, if I may tout our horn here, autoMAGIC support for Atlas has been available in Debian for probably half a decade. We build against 'reference blas' that are not tuned. If you simply call 'apt-get install' on the Atlas libraries you get tuned blas that will automatically be used in lieu of the untuned reference blas. It's a rather nice feature. Also works on Ubuntu the same way.

    That said, it was Brian Ripley himself who said that for most real-world problems, the net gain tends to be not all that dramatic. But it can be, and as it can be had for basically no cost, why not use it.


    (Debian R maintainer)

  3. does any one know of a better linux installation guide for that stuff ? the link provided is somewhat minimalistic.

  4. sigh…

    If anyone uses Mac and can translate the information on that page into something a mere mortal could understand please post it here. Specifically steps 3 and 4. Also do you need to load the package each time you use it or is there a way to make it the default like the windows guys are doing?

  5. What Dirk said.

    More bluntly: if you are running anything large enough in R that tuning matters, you should not be doing it in Windows, period. It just doesn't make sense to kneecap yourself like that when you're slinging around large data sets.

    Full disclosure: I run a Debian install at home. My work box is Windows, and actually a pretty nice one, but ssh to some central server tends to serve me much better.

    I do wish R were a little more multiprocessor aware, though.

  6. Greetings,

    I run R 2.9.2 and I tried this 'trick'. My machine is 3Gb RAM Core2Duo Win XP.

    Using the Old Rblas (i.e. the one supplied with R), doing the matrix multiplication part:

    print(system.time(for(i in 1:25) X%*%X))

    took around 46 seconds.

    Using the new ATLAS Rblas, it took 8 seconds. So the speed up is definitely there.

    However while doing some kalman filtering and simulation smoothing, there was no speed up at all.



Comments are closed.