Computer science, CS, is not a social science. It’s an engineering discipline. This is confused because there’s a related discipline: ‘software engineering’, SE, too. Yet SE doesn’t really cover every aspect of software construction. For example, programming language disputes over functional vs. object oriented and strong vs. weak types don’t happen in SE. They’re found in computer language design circles (firmly part of CS, not SE). Such disputes can be resolved by gathering metrics (which a software engineer does). These are not ‘religious disputes’. In contrast, within climate science, disputes over models are disputes over the scientific method. These appear religious in nature because careers are at stake. Dare I suggest that careers are probably at stake in those ‘religious’ disputes you see in computer science?

]]>Again, there were three examples I was referring to and which I said should not be made based on a tail-area probability defined with respect to a straw-man null hypothesis of no interest. These three examples were jury decision, college acceptance, and drug approval. You discussed one of these decisions—drug approval—and in my comment just above I explained why I don’t think these decisions should be based on tail-area probabilities, while also saying why I understand that tail-area probabilities, if they are the only tools available, can be used to hack together a workable solution in particular problems. Again, I think p-values (or, for that matter, Bayes factors) are the wrong tool, and null hypothesis significance testing the wrong framework for making these decisions, but that they will be adapted, for better or worse, to problems at hand, if these are the only tools and frameworks readily available. Much of my career has been spent in an effort to make other, more flexible, tools and frameworks available in more settings.

]]>Even this suffices to deny that the results of statistical significance testing are useless for scientific or practical purposes. That the testing result is built on to create other claims doesn’t mean the latter didn’t depend on the former, or that the latter goal alone should matter. Finding the genuine risk increase–with design-based probabilities–even restricted to the experimental population, & without estimating a distribution of effect sizes, is of value. It’s this type of information that gives an indication of what to do next to learn more.

There are also uses of simple significance tests that don’t seem to go on to give estimates, e.g., discover an assumption of a fixed mean or variance is rejected. Would you describe your uses of p-values in testing assumptions as estimations? ]]>

There were three examples I was referring to and which I said should not be made based on a tail-area probability defined with respect to a straw-man null hypothesis of no interest. These three examples were jury decision, college acceptance, and drug approval. You discussed one of these decisions—drug approval—so I will respond to you on that one. Your discussion refers to an example of a study showing hormone replacement theory to have increased risks, rather than benefits, for the target population. I strongly agree with you that it’s a good idea to use data from controlled trials, where available, to inform decisions. But I don’t see what this has to do with p-values. I think this decision could be made much more directly in terms of expected outcomes.

I do accept that a tool, when available, can be used for different purposes. In particular, I recognize that null hypothesis significance testing can in some settings be used for parameter estimation, and that uncertainty statements obtained from null hypothesis significance testing can be treated as probabilities and used in decision analysis. So, yes, the tool can work to solve real problems. But, to me, the reason why the tool works is not because of the hypothesis testing but because they are used to create estimates and uncertainties.

It’s as if someone were needed a hammer and a chisel, and was using a couple of screwdrivers to serve both these functions. This can work (in problems where you don’t need a *good* hammer or a *good* chisel), but I wouldn’t take this as evidence that we need screwdrivers. This is not the best analogy because screwdrivers are useful in their own way, but maybe it will give some sense of my perspective here.

P.S. As always, I hope that somebody is still reading this deep into the comment thread.

]]>The point null is mostly advocated by Bayes Factor advocates. One-sided tests are generally preferred (or 2 one-sided tests).

While I advocate interpreting test results so as to indicate discrepancies (from a test hypothesis) that are well and poorly indicated, the simple significance test has its uses. You yourself use p-values to test models, do you not? Those test hypotheses must be of interest to you, enough to seek a revised model based on low p-values. Failures of replication are based on these simple tests, as are arguments about model violation and data too good to be true. The fact that biasing selection effects, multiple testing, data dredging and the like result in invalidating p-values is actually a major reason that the same tools serve in fraudbusting & uncovering QRPs. The fact that they’re a small part of a full error statistical methodology isn’t a good reason to banish the term or the tests

]]>“Uniformity in statistical rules and processes makes it easier to compare like with like and avoid having some associations and effects be more privileged than others in unwarranted ways. Without clear rules for the analyses, science and policy may rely less on data and evidence and more on subjective opinions and interpretations.”

I agree that “Uniformity in statistical rules and processes makes it easier to compare like with like and avoid having some associations and effects be more privileged than others in unwarranted way” — especially with the “Makes it easier” (if the rules indeed fit) — but I just don’t see how it would be possible to have “one size fits all” rules.

“Without clear rules for the analyses, science and policy may rely less on data and evidence and more on subjective opinions and interpretations.” This view throws the baby out with the bath. What are needed are clear explanations/justifications of why the methods chosen are the best for the particular situation, combined with critical reading of these explanations/justifications. Rarely do papers include such explanations/justifications; instead, methods are often chosen just because “that’s the way we’ve always done it.”

]]>I think of it as wanting to know about a population from a sample.

Justin

http://www.statisticool.com

if the answer is yes, can you explain why there should be a whole bunch of 0.0 values to 64 bit precision (smallest 64 bit machine float bigger than 0 is about 2.2×10^-308, Magic Jackson is amazing, he’ll give you all that, because he calculates your error quantity in 4096 bit ultra high precision floats, so he’s not limited by machine epsilon…

]]>grab some data from the UK Biobank and show that your method significantly outperforms traditional GWAS and is not computationally too expensive

Can you clarify this:

1) Pick an exact dataset you would want this done on

2) How is performance measured?

3) What does “significantly outperform” mean?

4) Can you link to some examples of traditional GWAS performance?

5) What is “computationally too expensive”?

now, you tell him your model involving all 650,000 SNPs, and he writes down all the coefficients. Do you think even *one* of those 650,000 coefficients will be 0.0 ?

]]>I’m interested in using polygenic scores for prediction

If you were really only interested in this you would be using ML methods and the entire concern would be out-of-sample (not cross validated, but real unseen data) predictive skill. The method used is irrelevant if that is all you care about.

I will easily beat any sort of statistical significance filter on that. Probably could do it in a couple hours if the data is clean.

]]>you took my digression defending Anoneuoid’s idea as if it were my own actual idea.

Well, you said it was “probably a good idea” and the genetic architecture you use when discussing your preferred method is again highly unrealistic. If there are large effects, they have already been found, so they’re not of interest in GWAS. What is of interest is finding numerous small effects. The objective of GWAS is to find all true effects, i.e. those that replicate in independent samples. You know that you’ve found all of them when the variance that they collectively explain equals the narrow-sense heritability of the trait. The point, for me at least, is not to find large or “biologically interesting” effects but rather to statistically account for full heritability in molecular genetic terms.

I don’t quite grasp your idea, but I would suggest you grab some data from the UK Biobank and show that your method significantly outperforms traditional GWAS and is not computationally too expensive. The resulting paper would be your most cited ever ;) (There may be some similar methods circulating about in genetics already, but they’re not popular.)

]]>I think it’s ridiculous to make decisions about drug development and approval based on tail-area probabilities relative to straw-man null hypotheses. It’s just nuts. Given that the system exists, I can believe that people will adapt to it as best they can. But, stepping back a bit, the whole thing is ridiculous to me. I think a decision-analytic approach would make much more sense. Don Berry and others have written a lot about this.

]]>As someone who is involved in bringing new pharmaceutical products to the market, he (somewhat unsurprisingly) finds that hypothesis tests invoke “too much conservaivism” and favours 75% intervals instead.

]]>I don’t think any of these decisions should be made based on a tail-area probability defined with respect to a straw-man null hypothesis of no interest. For more on this point, see section 4.4 of this paper.

]]>1) Download/install: https://addons.mozilla.org/en-US/firefox/addon/redirector/

2) Create new redirects:

Description: scihub

Example URL: https://doi.org/10.1086/288135

Include pattern: https://doi.org/*

Redirect to: https://sci-hub.tw/$1

Pattern type: Wildcard

Description: scihub2

Example URL: https://dx.doi.org/10.1086/288135

Include pattern: https://dx.doi.org/*

Redirect to: https://sci-hub.tw/$1

Pattern type: Wildcard

Universal one click access is much better than institutional log in. The service saves dozens to hundreds of hours per year.

]]>You are correct that CS has a lot of math as well as a lot of conventions, human concerns, etc. And in those non-math, subjective areas, there are a lot of debates (up to flamewars). But, these flamewars don’t really hurt anybody but the engineers themselves. You can choose the least popular technology and spend an absurd amount of time to develop your product with it. But, thanks to the math part of CS, we can all analyze the product, conclude that it works and use it happily. There are no more flamewars here.

When there’s a flamewar in statistics, it’s about what is a justified inference, and we’re all affected, not just the statisticians or the scientists. It’s a philosophical (and sometimes PR) crisis.

]]>They recover just about all of the “chip heritability”

Without going into what exactly “chip heritability” means, you say “just about all”.

Sounds like from the start you admit the remaining genes still have room to contribute then. No one said these correlations were not negligible… it is just that by raising the sample size you allow more and more negligible effects to pass into “significance” at the same threshold. With large enough sample size (or lax enough threshold) all the genes will eventually be included.

So no, these results do not at all contradict “the omnigenic model” as you call it (to me it is just “everything correlates with everything else”).

By skimming the paper I can see other problems with what you are saying as well (all these “effects” assume the model they use is correct, change the model and you’ll get different coefficients).

]]>There are plenty of ways to do investigation of scientific questions through abstraction other than reductionist molecular mechanism investigation, but if you literally don’t care about mechanism at all and just want a predictive tool for decision making… this is pure engineering, which is a valuable thing to do, it’s just not aimed at the same target.

]]>That’s hardly fair; there’s a level of abstraction between mere statistical association/correlation/prediction and mechanistic modelling. It’s causal inference (whether done with Neyman-Rubin potential outcome or Pearl-style causal graphs) and it is useful for legitimate scientific investigations.

]]>then you’re not doing science. problem solved!

Look, there’s lots of good stuff to do in engineering, which is what you’re talking about, coming up with a way to arrange certain devices so you can get an economically useful outcome, like maybe a rule for who should get alcoholism counseling at age 18 or which people should get annual mammograms or whatever… but don’t confuse that with science, which is by definition trying to figure out the mechanism behind how things work.

> only a fool will think that they will be able to build a mechanistic model of that

Hell, a golf ball consists of 10^23 individual molecules each flying through 10^26th different air molecules, and each of these contributes equally and symmetrically to the outcome, no fool would touch that stuff.

]]>Just doing my part to add noise to search engine results… not having a lot of luck with links lately.

]]>My own actual idea, how to replace your GWAS with something else, starts at “But that aside,…” and describes how to fit a Bayesian model of the kind of interaction you are explicitly asking for (a linear sum) where you have some idea about the number of “true causal effects” (your 10,000 effects number) compared to all the effects…. using a sparse horseshoe prior to express your knowledge about the order of magnitude of the number of “nonzero” coefficients in that model.

]]>I don’t have access to this paper

Ever heard of Sci-hub, old man?

The omnigenic model claims that every gene in the genome affects every trait. Lello et al. show that you can recover the SNP heritability of height using just 3% of the available SNPs. That contradicts the omnigenic model.

not if you care about mechanism

I don’t care about mechanisms at all. If 20,000 variants affect a trait, only a fool will think that they will be able to build a mechanistic model of that. As I said in the previous thread, GWAS is about creating polygenic scores to be used for predictions and interventions.

]]>there are 112 specific combinations of 35 different polymorphisms that will substantially increase the risk of alcohol addiction

The point of the additive model is that there are no “specific combinations” of any variants that are particularly important. Rather, the presence or absence of a given variant is what matters. This is why genetic relatedness is linearly related to phenotypic resemblance. How would your method perform compared to GWAS if the true causal model is say 10,000 small additive effects, using say, 1,000,0000 variable loci in 1,000,0000 people?

]]>also:

Note also that the model sums the SNP effects, indicating that you can forget about gene-gene interactions.

not if you care about mechanism. It’s been known for years and years that prediction from a simple “improper” (ie. equal weighted) linear model is robustly successful at predicting lots of things across the board: https://www.cmu.edu/dietrich/sds/docs/dawes/the-robust-beauty-of-improper-linear-models-in-decision-making.pdf that doesn’t mean criminal recidivism works by criminals summing up their age, sex, race, income, previous criminal history, and relationship status and figuring out if it exceeds a threshold that they should go out and mug someone.

]]>And clearly you didn’t understand what I told you, because that’s exactly what you got.

]]>Statistics, to some extent, seems to be going in the opposite direction. In the statistics realm, advice is given to analyze data with nothing more than “simple estimates and standard errors”. Innovation is squashed, as people much have vast leeway to create new algorithms, yet in Statistics such freedom to innovate is described as “Researcher degrees of freedom”, and dismissed as little more than an opportunity for scientists to make up whatever answers they want and effectively cheat. There are greater and greater pressures for statistical analyses to be mainly prescriptive, with preregistration of everything and with analyses conducted following predefined recipes using simple approaches, again as a way to limit researcher freedom which will hopefully result in findings that are “objective” and credible.

Of course it’s exciting that such “mathematical analysis of data” (which was once, to some, the very definition of statistics) is growing and growing in popularity. It’s just that I find, in my work, that young people who are eager to enter such a field are more and more interested in the above mentioned comp sci-based degree programs, and less and less interested in anything labeled as “Statistics”.

]]>Perhaps it’s just that despite frequent claims to the contrary, NHST works just fine if it’s used in a scrupulous manner. Finding genetic main effect seems to be one of those problems where NHST works well.

]]>When I speak of p-values in the context of GWAS, I’m of course referring to the use of a particular p-value-related decision rule, that is, the significance threshold of 5 * 10^-8.

People use p-values for other purposes (eg, ranking) so there is no way to know this.

the omnigenic model is nonsense;

Did you explain this in the previous thread?

I genuinely want someone to describe a realistic alternative to current practices, not just some handwaving about neural networks.

What is unrealistic about it? What I was thinking of is very simple to do.

I think you just want people to suggest another statistic to compare to an arbitrary threshold to tell you which correlations are “real” or not. It really doesn’t matter what you do then, there is no “right way” to do that.

]]>First off, I gotta say, the neural network approach here is actually probably a good idea, because even if you don’t think “all genes are involved” it seems entirely likely something like: “there are 112 specific combinations of 35 different polymorphisms that will substantially increase the risk of alcohol addiction” and you have *zero* chance of understanding the 35000 dimensional space well enough to pull out these 112 combinations from your data using any kind of manual specification. You might get away with organizing your data using principle component analysis… but I digress, i’m just saying that Anoneuoid’s “handwaving about neural networks” isn’t handwaving, it’s good intuition that things are likely to be high dimensional and impossible to make progress by excessive reduction, and impossible to find combinations by hand, and obviously he hasn’t sat down to do this work for other people because, hey they should be doing it themselves.

But that aside, perhaps there are a few SNPs that are very often involved in each of the 112 combinations… Like out of the 112 combinations 85 of them include the SNP named alr1 or something (I’m making this up, alr1 = alcohol risk 1 so if there’s something called that already ignore it.) So you’re interested in a small number of SNPs that by themselves substantially add risk.

So, you run a logistic regression: alcadd ~ inverse_logit(f(snps)+c) and you need to specify the function f(snps). You start with a linear combination a[i] * snp[i] where snp[i] is either 0 if the person doesn’t have it and 1 if they do. Now you need to specify a prior for the a[i]… Your basic premise is you’re interested in sparse solutions, so you go read Aki’s paper on the modified horseshoe prior: https://arxiv.org/pdf/1707.01694.pdf and you express a prior over the sparsity (ie. that there are probably only around N SNPs involved, which sets the shape of the horseshoe). Run your regression, and come up with coefficients, now rank your coefficients in descending order and investigate the biological processes that each one is involved in…

]]>