Skip to content
 

Plot the data

Bill Harris writes:

One of the early mantras one hears in statistics is “Plot the data.” When I first heard it, it was followed by “by hand”; I suspect that part gets elided these days. Still, the advice is good. It’s often easier to make sense of a list of numbers if you can visualize them.

Most of the time, that takes time we don’t have.

When we get an email or a report with a table of numbers, we know that plotting the numbers means grabbing a piece of graph paper (does your office supply cabinet even stock graph paper anymore?) or opening up your favorite spreadsheet, copying numbers, and drawing a graph. I rarely take the time.

Last week, I got yet another email with a table of numbers showing how something had changed over time. I was curious, so I wrote a short J script (now edited into a one line script) to turn the clipboard into data and another to plot the data.

Voilá! Now I had an easy and quick way to grab and plot data. I tried grabbing data out of an OpenOffice.org Writer document, and it worked, too. Grabbing data out of a Writer table was almost as good; my script lost the shape of the table, but that’s easy to fix.

What’s more, when you’ve got it in J, you can also apply various J statistical routines to the data, or you can pass it to R for more advanced statistical processing.

Yet another simple productivity tool, yet another reason to learn J as a tool for thinking and doing, yet another way to make sense with numbers.

As Xiao-Li can tell you, I don’t know J, but I do like the idea of quickly making graphs. In R it can take awhile. I’m getting better at it, but then again I’ve been using S and R for almost 20 years. Even so, I always have to spend a lot of time screwing around with the defaults to make things look good.

Bill writes,

Things can be pretty fast and easy in J. I didn’t fix the script to have the graph’s ordinate always start at or at least include the origin, but that’s pretty trivial, as is adding titles, legends, and the like.

What’s challenging is J’s extreme mathematical ability to deal with arrays and functions of arrays. J has compositions of functions (hooks, forks, conjunctions) that are extremely powerful and easy to use, once you catch on, but it’s like learning a foreign language — until a switch flipped in my brain, it seemed opaque. Similarly, J’s rank conjunction lets you create derived functions to operate on arrays in wonderfully interesting ways, making explicit program loops a thing of the past — but catching onto rank initially can be mind-bending. Fortunately, doing what seems reasonable often works. Thus (+/ % #) is the program (“verb”) for the arithmetic mean: the sum over a list (+/) divided by (%) the number of items in the list (#) — no mention of the size of the list nor even of the list itself — you can apply it to columns, rows, diagonals, whatever, of data.

This reminds me a bit of APL (and I don’t mean that in a good way), but maybe some sort of menu-based version can be set up. I used to laugh at menus but now I’m thinking this is the way to go. Also to have user-modifiable graphs that then get converted to a script so that the results can be easily saved and replicated.

12 Comments

  1. Alex says:

    I like using Gretl for this kind of graphical exploration. It has an easy GUI, can import data nicely, and you can save the entire session (analysis, graphs, data, et al) as one file. Oh, and the freeness.

  2. Anonymous says:

    Quickly making graphs in R is generally pretty easy. Getting them fancified to the way you want them for publication is another matter.

    The longest step for me is usually outside of R. How to get the data into a nice tabular format? If only people would send data in the form of pre-made sqlite databases ;-)

  3. Alex says:

    Oh, one thing I forgot: you can pass all of your data with time series properties intact to R with one click.

  4. Isabel says:

    For what it's worth, as a mathematician, I find myself doing computations by hand that I could do in some automated manner because I find that that helps me get a better feeling for what's going on.

  5. Kieran says:

    Isn't J descended from APL?

  6. derek says:

    I find most text tables are pretty straightforward to cut and paste into Excel with a little text-to-columns processing at the most. The exception is bloody Adobe Acrobat!

    What I'd really like is a little utility that can take the (extremely common and widespread, especially in the US government) tables you find in PDF documents and get them into Excel, or a comma-delimited format. Maybe it could take the form of a grid instead of the simple rectangle of the Adobe select tool, and you could move the grid boundaries with the mouse until they matched the table, then do the Cut and Paste operation.

    I emailed Foxit, the makers of a free, less-bloated version of Adobe's PDF reader, and they said that was a good idea, but not in their development plans at this time.

  7. Tal Galili says:

    Sounds good, I'd love to know if such a "drag and plot" tool could be made.

  8. NU says:

    J should remind you of APL: as Kieran notes, J is a version of APL, with some additional functional programming extensions and an ASCII syntax instead of APL's symbolic syntax.

    On this topic, it would be useful for someone to write a "Plot in R" system service for Mac OS X users: just highlight text in any application, hit a key combo, and it would launch R and have it plot the highlighted data. OS X services can be great timesavers.

  9. Bill Harris says:

    "This reminds me a bit of APL (and I don't mean that in a good way) …" You got a chuckle here. Yes, as a few have noted, J should remind you of APL, for Ken Iverson developed both of them. I think it can be a good thing if you use J as a notation for thinking that just happens to be executable; I agree that J can be arcane if you just treat it as a programming language. http://www.cs.ualberta.ca/~smillie/Jpage/jtsp.pdf gives one person's introduction to J using statistical examples.

    All that being said, I don't miss having to write (explicit) loops in programs anymore.

    I just bought a copy of your Data Analysis Using Regression … book, and I'm working through it now. I had two goals in mind: to learn more statistics and to learn a bit of R. So far, I'm enjoying it quite a bit, and it's helping me meet my objectives nicely. I do miss J, so I'm redoing some of the examples in J, as well, which is helping me polish my J skills.

    Thanks for the pointer to gretl. I'll have to check it out. I've installed two Firefox add-ons that might be of use to some. Table2Clipboard (https://addons.mozilla.org/en-US/firefox/addon/1852) makes it easier to capture entire HTML tables, and Data Analytics (https://addons.mozilla.org/en-US/firefox/addon/2010) lets you graph data off a Web page. I've not used either extensively, though.

  10. Anonymous says:

    Using ThisService (http://wafflesoftware.net/thisservice/), you can easily create a MacOSX service from a script written in Perl, Ruby, Python or AppleScript. For example, the script below will read a table from the selection, and plot it. The difficult part is to get R to do the right thing when reading and plotting.

    on process(_str)
    set CommandLine1 to "tmpclipboard="" & _str & """
    set CommandLine2 to "tmpdata=read.table(textConnection(tmpclipboard))"
    set CommandLine3 to "plot(tmpdata)"
    try
    tell application "R"
    activate
    with timeout of 60 seconds
    cmd CommandLine1
    cmd CommandLine2
    cmd CommandLine3
    end timeout
    end tell
    end try
    end process

  11. Tal Galili says:

    Thank you Bill Harris for your useful links.

    Cheers,
    Tal.

  12. ZBicyclist says:

    The tendency to skip over plots and data checks reminds me of this analytical dictum:

    "There's never enough time to do it right, but always enough time to do it over."

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.