While Andrew is trying to get someone to make a t-shirt design “Gone fishing”, someone else thinks fishing is one of the “big data trends in 2015”. This advertisement by some company keeps re-appearing in my twitter feed.

Posted by Aki Vehtari on 16 May 2015, 3:41 pm

While Andrew is trying to get someone to make a t-shirt design “Gone fishing”, someone else thinks fishing is one of the “big data trends in 2015”. This advertisement by some company keeps re-appearing in my twitter feed.

## Recent Comments

- Keith O'Rourke on AnnoNLP conference on data coding for natural language processing
- Keith O'Rourke on And, if we really want to get real, let’s be open to the possibility that the effect is positive for some people in some scenarios, and negative for other people in other scenarios, and that in the existing state of our knowledge, we can’t say much about where the effect is positive and where it is negative.
- Daniel Weissman on Another Regression Discontinuity Disaster and what can we learn from it
- Graduated on Another Regression Discontinuity Disaster and what can we learn from it
- Ethan Bolker on Another Regression Discontinuity Disaster and what can we learn from it
- Anoneuoid on And, if we really want to get real, let’s be open to the possibility that the effect is positive for some people in some scenarios, and negative for other people in other scenarios, and that in the existing state of our knowledge, we can’t say much about where the effect is positive and where it is negative.
- jim on Another Regression Discontinuity Disaster and what can we learn from it
- Martha (Smith) on “The writer who confesses that he is ‘not good at attention to detail’ is like a pianist who admits to being tone deaf”
- Daniel Lakeland on Another Regression Discontinuity Disaster and what can we learn from it
- Martha (Smith) on Another Regression Discontinuity Disaster and what can we learn from it
- Martha (Smith) on Another Regression Discontinuity Disaster and what can we learn from it
- Andrew on Another Regression Discontinuity Disaster and what can we learn from it
- Ram on Another Regression Discontinuity Disaster and what can we learn from it
- Sameera Daniels on AnnoNLP conference on data coding for natural language processing
- Andrew on Another Regression Discontinuity Disaster and what can we learn from it
- Ram on Another Regression Discontinuity Disaster and what can we learn from it
- Andrew on Another Regression Discontinuity Disaster and what can we learn from it
- Koray on Another Regression Discontinuity Disaster and what can we learn from it
- Anoneuoid on Another Regression Discontinuity Disaster and what can we learn from it
- Ram on Another Regression Discontinuity Disaster and what can we learn from it

## Categories

Does Exploratory Fishing smell fishy?

If JSM isn’t particularly engaging it might be fun to drop by this “mysterious” company’s booth and calling them out on their BS ads.

Sure, the fishing analogy has negative connotations. But I’m not sure if I know enough about this company’s tools or philosophy to understand why they need to be “called out”.

Does the practice of storing *all* data for potential future analyses sound like a good idea? Yes. Given the volume of that data, will it require some very modern database and sophisticated computational algorithms? Certainly.

And as a staunch Bayesian, I am frustrated by a common practice in academia to seemingly ignore all previous studies and use “default” or “weakly informative” priors instead of moving science forward by building priors on previous study results.

Maybe that’s what they mean by fishing in the big data lake? Not ignoring what has already been measured.

I think fishing as uggh only when disguised in the garbs of a confirmatory study.

As a means for hypothesis generation alone what is wrong with fishing? Quite some science starts out as “fishing”.

“I am frustrated by a common practice in academia to seemingly ignore all previous studies and use “default” or “weakly informative” priors instead of moving science forward by building priors on previous study results.”

+1

I think people arguing for Bayes over frequentist methods should spend more time on this kind of demonstration than on demolishing frequentist methods. Maybe it’s already been done but I don’t know it. I’ve tried it in one paper; I did it in a second one but I was afraid that reviewers would think that I am trying to strengthen my (already strong) case by making my effects look even bigger. So I removed it.

Bayesian work on clinical trials are a good place to look for priors that incorporate info from previous studies.

I should really learn to grammar…

p.s. If I have googled onto the right folks, they do in fact refer to this Gartner release in their marketing materials:

Gartner Says Beware of the Data Lake Fallacy

http://www.gartner.com/newsroom/id/2809117

And their software. Urgh. Trying to get it to work makes you want to flip a table over

I know what you mean; I had the hardest time getting it to do what I wanted. But once you realize that you are actually manipulating what is a simplified OLAP cube, the concepts immediately became clear.

Their software was built around relational datastores (NoSQL wasn’t popular when they started in 2003), so the concepts map very well to relational theory.

I don’t mind their software that much — it’s a great EDA and viz tool. But the “fishing” thing is definitely a blunder on the part of their marketing team.

I was fishing in a data lake years ago, and out popped a major art project :)

http://www.translatingnature.org/thelake

The URL lost its hyphen!

http://www.translatingnature.org/the-lake/

I like this cartoon,

http://pages.stat.wisc.edu/~wahba/spiegelhalter.science2014.pdf

, since it suggests not only can you go fishing but you can even help guide the hook into the fish’s mouth (as discussed often on this blog and, for example, in Simmons et al (2011, http://www.haas.berkeley.edu/groups/online_marketing/facultyCV/papers/nelson_false-positive.pdf).