Data technologies: Paul Murrell responds

In response to the comments on his forthcoming book, Paul Murrell writes:

First of all, thanks to Andrew for posting about my [Paul’s] book project.

As some people have noticed, this is (and will remain) a CC-licenced work. Part of the motivation for that was to be able to develop it not completely behind closed doors, which has the potential benefit of getting feedback on what people like and don’t like BEFORE it goes to the printer.

So the second thanks goes to those who have commented on this blog and provided such feedback. It is all extremely useful!

Unfortunately, one possible disadvantage to developing a book out in the open is that people might not like the in-development version and get put off; a book could die before it has even been born! So I would like to respond to one of the negative comments from a previous post in the hope that my book might still be able to limp its way to the printers after all :)

At issue is the section on “Plain text files” (Section 7.3).

Bob Carpenter wrote:

What I’d have liked to have seen is a reference to the International Components for Unicode (ICU) package, which is the best cross-platform, cross-language unicode processing tool.

The “defence” to this point is that this section is a discussion of the *storage* of data in a text format. Text processing, using R, is discussed in the “Data Crunching” chapter (in Section 11.8).

R uses the iconv library rather than ICU, which serves the same purpose of providing cross-platform, cross-language unicode text processing. However, that raises another point, which is that the audience for this book is not going to want/need to know this level of detail. When I teach this material, I have enough trouble getting people to understand the difference between a plain text format and a binary format, so details about underlying libraries are not really even on the radar.

As stated in the preface to the book, the aim is NOT to turn the reader into a programmer. The aim is to raise awareness of what computers can help you to achieve and to get you started with some of these cool technologies.

Given the lower-common-denominator bar that I have set for the audience of this book, some of the discussions are meant to be somewhat simplified. For example, the statements about UNICODE that also raised Bob’s ire are deliberately broad brush-strokes rather than precision laser beams. There is a temptation to include apologetic footnotes for expert readers, but I doubt that is likely to entertain the primary audience for the book. Nevertheless, I will look again at this section to make sure that I am not being too misleading.

Anyway, it’s great to get everyone’s comments. Thanks very much again to everyone who has contributed comments!

Paul

p.s. Just one other point I wanted to “rebut”: Sorry Andrew, but I LOVE 4 spaces for indenting, so that is definitely going to stay :)

4 thoughts on “Data technologies: Paul Murrell responds

  1. "p.s. Just one other point I wanted to "rebut": Sorry Andrew, but I LOVE 4 spaces for indenting, so that is definitely going to stay :)"

    I can see another holy war starting. I think I'll become an iconoclast, and start three-spacing.

  2. If you are not with us, you are against us. If you are not part of the solution, you are part of the problem. Etc.

  3. I don't see how 2 vs 4 spaces matters for indenting. Choose whichever works for you, as long as you don't do it in Emacs, because Emacs sucks.

Comments are closed.