One of the things Brad Paley talked about the other day was the computer program he used to make a visualization of the text of Alice in Wonderland [link fixed]. (Click on the “Alice in Wonderland” link; it’s really cool.)
My first question when I saw this was, why is the book presented as a circle rather than a line? The circle places the end of the book at the same place as the beginning. There are some reasons this might make sense–after all, Alice wakes up from her dream at the very end of the book, returning to where she was at the start–but, overall, I don’t see the circularity making sense. I asked Brad during his talk, but he did not have time to respond (too many questions were being asked, a problem I’d love to have at my own talks!). He indicated that he did have a good reason, though, so if he lets me know I’ll report it here.
People asked what was the point of the TextArc display (other than it looking pretty), and Brad gave a bunch of examples of what the plot showed. In some way it was similar to some of my statistical research efforts, in that the results were impressive but ended up confirming things that made sense and that, ultimately, we already knew. In my case, my colleagues and I found that American Indians are not randomly distributed in the social network; in Brad’s case, he found that Alice is a central character in Alice in Wonderland, that the words “Mock” and “Turtle” go together, and so forth. (See here for more.)
When pressed further, Brad justified TextArc as a souped-up index. This made a lot of sense to me: his graph tells you lots of information that’s not in a conventional index and also allows you to map straight back to the original text. I agree that it’s silly to criticize the program for what it doesn’t do. It’s an automatic program and does a lot. I’m also impressed by any program written more than 5 years ago that still works!
Anyway, one of Brad’s remarks about using this tool to understand text made me think that there are two kinds of books:
1. Books that you want to read straight through, from beginning to end.
2. Books that you use for reference, flipping through and looking for what you need.
The horrible thing is that I write all my books as if they will be read from beginning to end, but I’m pretty sure most people read them as reference books. For most people–even most statisticians–reading Bayesian Data Analysis from beginning to end would be like me reading the instruction manual for my washing machine. I pick up the instruction manual when I need it, and then I look for what I need.
Anyway, I thought this might be relevant to TextArc and similar projects. Maybe Alice in Wonderland is not the best example; it might make more sense to use TextArc for a book such as Bayesian Data Analysis that has a sequence but is primarily used for reference. (I went to the TextArc site but can’t find the program; at least, there’s no easy way to feed in a book and have it produce the TextArc picture.)