I misremembered! The thing I was thinking of was about Carl Morris…

]]>Discussed earlier in this thread, academic pricing for the book (PDF version), and for software, has been posted.

]]>Here is a link to a brief, conceptual overview of the ODA paradigm.

https://odajournal.com/2017/04/18/what-is-optimal-data-analysis/

]]>Well, none are on line, if available they are in the ODA article and/or the original article.

Sorry to say, but important IMHO, valid evaluations of what is X or Y for me, or comparisons of any state for X for you versus for me, require more information than is available to you.

]]>Roger, Godspeed. Open invitation…

]]>Alas, I have some own ideas to elaborate and I’m far too slow at that, so thanks for the offer but no.

]]>Your prior comments were good examples of flippant disregard.

Here you are guilty of specifically attacking something clearly without having the required knowledge or experience in what you are talking about, and making incorrect assumptions that lead to an incorrect conclusion. These are two main problems mentioned as an enemy of science. This would be very easy to demonstrate to laymen, for example using a football example.

THIS thread IS worth a TED talk. The figures would be tables: on the left the ignorant attack, on the right the truth. The subsections would be misbehavior categories. This thread is an automatic data machine, and properly disseminated may actually make a difference! Keep it coming!

Youngsters–what kind of scientist to you want to be: educated and open-minded, or ignorant and closed-minded? Why do you youngsters think this is happening?

Selecting people you believe are qualified reviewers may increase the quality of your reviews, and ultimately of your paper. If a qualified reviewer with good intentions disagrees with your paper, then listen! My first cross-cultural paper was reviewed by Harry Triandis, the great man in the field. He was very instructive, told me what books to read. He reviewed the revised paper, now a confirmatory study, and published it without revision. I and my colleagues learned a lot, and went on to publish lots of great cross-cultural research.

The point is not to get an easy review, but to get a professional, competent review. People who hate and fear new methods, and know nothing about them but diss them anyway, are an impediment to progress… Washington isn’t the only swamp making life irrational…

]]>This material is used to inform a panel of laymen (approved/picked by lawyer of the legacy company/statisticians) who made up the jury.

Defense lawyers, as usual in such cases, parade their experts, who pontificate their formulas and espouse their self-ratings, and of course, lie.

BUT, the jurors were not fooled. Really simple, crystal clear, completely obvious examples revealed the lies and harmful malpractice of the statisticians. When jurors *understand* they can cross-generalize.

Among the top causes of mortality in the US is taking a prescribed medication. ALL safety analysis is *mandated* to be conducted by regression models. More and more really simple demonstrations are coming on-line that demonstrate that regression models are not at all accurate (here is a little article on regression, logistic is no better–there are many examples in indexed journals as well as in ODA journals, type logistic in the search box on the journal home page).

Every late-night TV program at night is funded in part by teams of lawyers looking for people to sue companies that produce dangerous drugs, and the statisticians that gave the drug the green light.

What could go wrong?

]]>That is even easier for you if the dataset is online. Just give a link to the dataset and paper containing the results…

]]>Dear Mr anonymous,

I reject your counter-offer. I hope what I offered you initially is clear this time.

a) You can select from many data sets already analyzed

I have already published many, many data sets in the ODA journal, so that they can be re-analyzed. These are free to anyone in the Universe. If you want one of mine, please select and use it–the ODA analysis is already published.

If you want me to select, use the article comparing scores on MMPI taxons for many different samples. Or, the data on inter-rater reliability of plant health. I recall that I found the results of the analyses interesting.

b) When I donate time to do work that I do to make a living, I prefer to use new data sets that I haven’t already used and made available to everyone.

If you want this to happen, you can send me a new data set, and your analysis (intro, methods, results),

c) Please contact me via RG, this forum is inefficient. I can’t send you a RG message–you have no real name.

Dear Silent Youngsters:

Some of the posts on this public thread are empirical evidence for what John Tierney writes is “The Real War on Science”.

However, there are additional biases and tactics exposed in this thread that John didn’t write about yet. Can you detect and name them?

These little exchanges are qualitative data that are easily content-coded into ordinal scales. The reliability paper that I mention above discusses how to assess inter-rater reliability of the codings.

This is extremely important. Clearly, being a reviewer for an ODA paper implies the reviewer should know what ODA is, or at least find out a little bit. Obviously, apparently many vocal hot-shot legacy statisticians dismiss new work without knowing anything about it!

So, a roster of all people who published papers using optimal (and legacy) methods–who are thus potentially proven-qualified reviewers, and who wish to be on a list of prospective reviewers, will soon be one click away from every editor on the planet!

]]>I agree. It’s a tour de force. By itself it could provide the basis for a TED talk.

I like how it shows that once you’re dissatisfied, you’re dissatisfied, period. No degrees. On the other hand, the things that *lead* to dissatisfaction are broken down into subtle subgroups. A distinction is made between “very poor” and “poor and better” waiting times and between “fair or worse” and “good or very good” courtesy. Why those particular breakdowns? There must be wisdom behind them.

Next we come to the math. The percentages (44.2%, 39%, and 16.8%) add up to 100%, which suggests that they represent proportions of the whole, not of the subgroups. There are 285 patients in all (95+41+149). Of these, 41 had “very poor” waiting time. Within that category, 39% of the whole patient group–that is, 110.76 of the 285–reported dissatisfaction. This means that of the 41 patients with “very poor” waiting time, 110.76 were dissatisfied. Mysterious multiplicity! Maybe some of them were bearing twins and triplets while waiting.

Or maybe the percentages are of the subgroups, not of the whole, and it’s just a coincidence that they add up to exactly 100%. In that case, 15.99 of the 41 patients with “very poor” waiting time are dissatisfied, in contrast with 25.032 of the 149 with “poor or better” waiting time and 41.99 of the 95 patients with “fair or worse” courtesy.

But is the message here that if you’re dissatisfied with *any* part of the ER experience, you’re dissatisfied, period, and it doesn’t matter how great or how small your dissatisfaction? This would have to be elucidated in the TED talk.

In the meantime, the p-values are impressive.

]]>Another example of this? http://statmodeling.stat.columbia.edu/2016/09/08/its-not-about-normality-its-all-about-reality/#comment-303932

]]>>”donating pro-bono work”

Just that you are treating this as such an undertaking makes me question your algo.

If you upload a dataset and tell us results that you already have, I will plug into xgboost and report back… it should take a couple of minutes (maybe a bit longer depending on what format the data is in, etc). BTW, if you are familiar with R or python it should take you no more than a couple hours to get xgboost going.

]]>Fig 2 in this paper is absolutely spectacular!

]]>We have different conceptualizations of a comprehensive paradigm, but your list is really cool. And funny! :-)

Legacy had its chance, and it failed.

Air, water, land, food, medicine, finance, peace, life quality–everything necessary for modern life is stressed.

The Zeitgeist is change, a search for NEW directions. Including a search for predictive accuracy in conjunction with increasing accountability for errors.

May the most accurate models win…

]]>Your patient insistence on seeing some math, and your rapid and astute evaluation of a new-to-you mathematical model (and request for yet more details), is clear evidence of sincere analytic drive and talent, and of strength of character. All go, no show…

It occurred to me that, if you wish to collaborate on a comparison of ODA and other methods, perhaps you would be interested in crafting a follow-up to a recent article: https://www.ncbi.nlm.nih.gov/pubmed/26805004

If interested, please contact me via RG message. It would be an honor to be your wing-man…

]]>The judges will allow it! And may I just say that I wouldn’t have counted Robbins’s empirical Bayes had I not learned from a thing you wrote that I can’t locate just now that what Robbins had in mind was a lot deeper than just type II maximum likelihood.

]]>I make no claim that this list is exhaustive. Nine times is just a lower bound — but now that I count again I see ten entries in that list. Counting was never my strong suit…

]]>Corey:

In all seriousness, I would put Bayesian data analysis (as expressed in our book) as a paradigm that is distinct from all the paradigms you listed just there, and at least as important as most.

]]>I love this rhetorical question! Let’s see… Neyman-Pearson hypothesis testing and let’s put confidence intervals in there as well, let’s put likelihood and derived concepts — Fisher’s maximum likelihood, Wedderburn’s quasi-likelihood, Owen’s empirical likelihood, Nelder’s h-likelihood — in one bucket, let’s put all the variations on Bayesian foundations — de Finetti, Jeffreys, Savage, Cox’n’Jaynes, Wallace’s minimum message length — in one bucket and let’s stick maximum entropy methods in there as well, Wald’s statistical decision theory and derived concepts, Robbin’s empirical Bayes approach, Rissanen’s minimum description length approach, Valiant’s probably approximately correct (PAC) learning, I don’t know if Benjamini’s false discovery rate approach qualifies as *entirely* new but it’s pretty damn original and gave rise to a large varieties of novel methodologies so I’m counting it, Davies’s model-as-data-approximation approach, and my newest entrant, ODA — wouldn’t want to step on any toes! So that’s nine times a century, if that century is the 20th.

My last post to Mike explained my criteria for donating pro-bono work. It is possible that you meet my criteria. If you remain interested, send a write-up of your research hypothesis, methods, and results to me vis-a-vis RG. This thread is exhausted.

I appreciate your suggestion about how to get people to “pay attention to the method you are advocating”. As of a minute ago a total of 332 people in 50 countries read 1,123 ODA papers since Monday night. The people reading the posts and THEN reading ODA articles ARE the people that I want to reach, while the people making the posts and NOT reading ODA articles provide invaluable opportunity to defeat baseless objections…

Legacy wishes to legacy statistics fans!

]]>Ah, now I see. Thank you, Brian, for the clarification.

]]>I want to know what, apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, have the Romans ever done for us?

]]>Do you want an “Amen”?

]]>There shall, in that time, be rumors of things going astray, and there shall be a great confusion as to where things really are, and nobody will really know where lieth those little things with the sort of raffia work base that has an attachment. At this time, a friend shall lose his friend’s hammer and the young shall not know where lieth the things possessed by their fathers that their fathers put there only just the night before, about eight o’clock.

]]>We all have limited time and resources. I suggested an idea for a heuristic as the first post in this thread. It should take less effort on your part than making all these posts. You chose to side step it, which is fine I guess. However, if you want people to pay attention to the method you are advocating, implementing my suggestion would be the best way to do it.

]]>How do you know? What have you read? Tell me, what do you know about novometric theory?

Absolutely nothing, but who Who cares, right! It doesn’t matter! Why bother to find out…

Corey has the answer, everyone stop working!

HINT#1: If ALL you have is to offer the past, don’t bother–EVERYONE KNOWS. :-)

HINT#2: NEVER speak about something you don’t understand.

HINT#3: REALIZE that if you don’t know what something is, then you can’t understand it.

]]>…what? Is there some problem here?

]]>PS: Sigh, Christian, it occurs to me that you missed the point and said something ignorant, again.

STATED FACT

All indexed articles require the identical information–definitions of class variable, attribute, sensitivity, confusion matrix, ESS, permutation p, jackknife analysis–to be repeated in every article (a teacher’s manual is under preparation, to help teach college courses using the book).

THE POINT IS

It is extremely difficult to say the same thing over and over, each time perfectly, each time differently.

Try it for whatever you use–describe everything involved in the procedure perfectly, twice, differently–then do it 200 times. It is boring, and it is difficult.

THE POINT IS NOT

That fewer free ODA articles are fleshed-out presently. :-)

Whomever may wish to read fleshed-out articles can obtain copies of ODA articles in all the indexed journals: from a library or publisher, not from me. There are an amazing number, only a tiny fraction have “ODA” in the title, Rob and I named the paradigm after we discovered it–at the beginning of the collapse of the field due to student drain.

Begin with the earliest and working forward, including cited manuscripts–in all the indexed journals.

THE POINT IS

Whomever wants the most efficient resource, pony-up a few bucks.

Gunny also said (paraphrase): Invest or get off the pot

]]>Yo, Christian et al.,

Yes, it would be awesome if there was one efficient resource that would do the hard work of synthesizing the literature, and presenting it in a straightforward manner designed to be well understood. That would be GREAT!

I remember I posted this comment to Ian et al: “The book finally made sense of it. It is the only efficient way to learn what is known. The latest book covers through novometrics with binary outcomes. I decided to write it before I died, so that people would know what was happening. I had so many of my dearest friends die…”

THEREFORE, To anyone actually interested–read all about it, the PDF is cheaper than a night at the movies! The only book of its kind!

If one doesn’t have the impression that this is something worth investigating now, I can understand: few lead, some follow, many never get into the action.

However, bear in mind that everything is moving forward faster, in more directions, and the best is surely yet to come.

If the cost is out-of-budget, one might ask the reference librarian to submit a purchase request.

Or, one is free to read every ODA article–in expensive indexed journals and in the free ODA eJournal article, and all the citations, and identify things that need to be resolved, and resolve them (and correct things that are no longer state-of-the-art)…

Or, nothing of the kind! :-)

]]>“1. For these indexed journals, we need to “flesh-out” everything–which is agonizingly boring and difficult to restate in a myriad of ways, but what can you do…”

But this is exactly what you need to make your work reproducible and transparent, as it should be in science.

Darn it!

Mike, I should explain:

I don’t use my personal finite time doing anything other than ODA. In gestalt I am interested in finding the model that presents the best combination of predictive accuracy (normed against chance) and parsimony, as indexed by the D statistic. However, models of different complexity may be appropriate based upon statistical power (an exclusion criterion) and theoretical clarity or pragmatic significance (inclusion criteria). I know that the best any present model can do is explicitly identified, so there is no guesswork. That is,

1. If accuracy is defined as in the ODA paradigm, then in training analysis ODA will find the best model.

2. If accuracy is defined as in the ODA paradigm, and (as in novometrics) if one is only interested in validity performance, then CTA software allows the operator to set either of two criteria: (a) find the best model that has identical training and jackknife (or any other validity criterion) performance; or (b) find the best model that has highest jackknife (or whatever) performance with experimentwise (or whatever) p<0.05 (or whatever). The software allows operator control of many constraints, there are ODA articles on this, and of course the book synthesizes the matter…

I look in books or articles for data sets. When I find a data set, sometimes it is analyzed using XYZ method. So, I summarize the findings reported in the article using XYZ, and then re-analyze the data using ODA. If you have such a data set, we certainly could talk about a collaborative paper–I do this for fun, and to learn more about my trade, other methods, applied results. Right now I am a bit swamped–why I must return to work.

If I find a data set that was not analyzed, I only use ODA to take a look.

I can't do everything. I know, I tried, I failed…

]]>Core dump, two annoying memory traces:

1. For these indexed journals, we need to “flesh-out” everything–which is agonizingly boring and difficult to restate in a myriad of ways, but what can you do…

2. Gunny said: ASSUME = make an ASS out of U and ME

Thank you ALL for your time, wit, interest, and participation. And, ultimately, for being pretty cool.

Until we meet again!

]]>Sorry, hijack someone else. Read the book, purchase consulting time, or invent your own solutions.

]]>I appreciate your interest and love and respect your concern–it is perfect motivation to learn ODA. And, I understand that it seems like reading yet another entire book (that has almost no formulas) may be a daunting task. Especially if the book covers the same old crap covered in all the other books you ever read on the subject, and makes the same untenable assumptions, repeats the same methods and reaps the same deficiencies… But, dude–you are 300 pages away from the promised land! :-)

Mike, the entire book is about correct fitting, the entire paradigm! I can’t re-write the book here. Perhaps a brief response will satisfy your request, I hope so. :-)

The final Axiom of novometrics mandates replication/validation in order to estimate predictive accuracy–training results are not used as estimates. The most common validation methods are various jackknife, K-fold, Monte Carlo, bootstrap, hold-out, and multi-sample methods (AFAIK, only ODA software performs many of these methods for ALL statistical analyses). The novometric D statistic norms model quality as a function of accuracy and parsimony (I cited an article on this in another response in this thread–IMO it may address all of your concern in two pages–Theoretical aspects of the D statistic).

These are described and used throughout the book. These validation methods are also discussed in a forest of other books and a sea of other articles. It is easiest to read the book, it covers all the bases.

Training is for practice–validation is for real…

]]>At the time we discovered the open form solution, the field was finally established! Hundreds upon hundreds of articles and algorithms were being constructed, because computers were becoming faster. The first PC was on the market–the field was starting to explode, we (the community) began to hold conferences! Then “greed is good” (marketing) and “dot com” (hackers) became the zeitgeist, most systems engineering (engineering colleges) and quantitative methods (business colleges) programs lost so many students that the programs were dissolved–faculty and remaining students scattered about into non-fitting programs. Only a few of the early leaders stuck with ODA. Rob and I never left, there is no other quantitative perspective that we find so captivating, it was the purpose of our lives. The youngsters today have forgotten the math that got mankind to the moon without computers, and have resorted to using pre-ODA methods that have problems which motivated the rise of ODA in the first place. Those who forget history are doomed to repeat it–indeed!

]]>Clearly computers are needed to elucidate the exact distribution for non-directional analysis–but all the computers in the world couldn’t solve the problem for even a moderate N:

Yarnold, P.R., & Soltysik, R.C. (1991). Theoretical distributions of optima for univariate discrimination of random data. Decision Sciences, 22, 739-752.

However for directional hypotheses there is an closed-form solution:

Soltysik, R.C., & Yarnold, P.R. (1994). Univariable optimal discriminant analysis: One-tailed hypotheses. Educational and Psychological Measurement, 54, 646-653.

Carmony, L., Yarnold, P.R., & Naeymi-Rad, F. (1998). One-tailed Type I error rates for balanced two-category UniODA with a random ordered attribute. Annals of Operations Research, 74, 223-238.

]]>…I’m still stuck in the Friend zone…

]]>“It should be done by you, not by others (it’s your question)”

a. Select an example involving binary data that was analyzed by ODA.

b. Construct the data set.

c. Do whatever you wish. The ODA analysis was already done.

END OF THREAD

]]>Is maximum-accuracy analysis leading to overfitting? (“best model” vs “generalizable model”)

Please describe validation procedures for ODA/ CTA.

We cannot read a whole book for understanding validation procedures of your method.

]]>Isn’t there a clear link between conceptual clarity and transparency about underlying mechanisms/ assumptions?

Falsifiability is a core concept of science. Intransparency and falsifiability are somewhat contrary.

]]>I am also interested in a comparison between xgboost and ODA.

Please work on such a report. If ODA is doing a better than xgboost, it would be a great advertisement for ODA. Please go for it. It’s your product, so it’s your job (not someone elses job).

]]>I am also interested in such a comparison. Please work on such a report. Then we can discuss more concrete and more fact-based. If ODA is better than xgboost, it would be a great advertising for ODA. It should be done by you, not by others (it’s your product).

]]>Remember kids, to describe many species of particles, take the tensor product of as many different Fock spaces as there are species of particles under consideration.

]]>To test the hypothesis that one’s manuscripts with fewer pages are more likely to be published…

“To conduct an optimal data analysis, the ODA software would begin by arranging all of the manuscripts (i.e.,

observations) along a continuum formed by page length, with each manuscript represented by a 0 or 1 depending on its

publication status. ODA would then examine all possible cutpoints along the continuum (i.e., midpoints between two

successive observations that have different values on the class variable) and would separately evaluate the classification

performance achieved across all observations, using each cutpoint that conforms to the directional hypothesis (i.e., for

which the lower score on the page-length continuum is associated with acceptance and the higher score on the pagelength

continuum is associated with rejection). The final ODA model would consist of the cutpoint that matches the

directional hypothesis and produces the greatest overall percentage of accurate predictions across both categories of the

class variable. For example, the optimal model might be, “If page length ≤ 25.5, then predict the manuscript is accepted

for publication; otherwise, predict the manuscript is rejected.” This particular model would be considered optimal

because no other cutpoint consistent with the directional hypothesis could achieve a greater overall percentage of

classification accuracy with these data.”

This reminds me of one of my lecturers mentioning that one of the previous professors at the Uni had come up with some completely different way of doing data analysis, but it never really took off and there wasn’t the computing power for it back then. I’m beginning to think there must be countless examples of such efforts…

]]>Wait a minute…. are we allowed to add self-given honorifics here? I’ve been totally missing out.

]]>IBM3090, sorry, such small text

]]>Three million does not sound like a lot, I know. But, in reality, three million is a lot! In many classical phenomena N of this size are sufficient to detect “ecologically” rare phenomena—accidents, errors, diseases, interactions, tornadoes, etc. More complicated models, and/or analysis of even rarer phenomena require the most powerful computer—the brain of the analyst.

We ran our first-ever *large* experimental MultiODA on a CRAY-2 (NCSA, Urbana). Exponential in N, the problem had a binary class (dependent) measure and three ordered attributes (independent variables) for N=39 (thirty nine), and it red-lighted the CPU forcing a cold boot.

Years later we were able to solve MultiODA problems for uniform random data involving five attributes and N=1,000,000 in several CPU seconds using an IBM3060-400VF supercomputer (UI, Chicago).

Today we get better nonlinear answers to problems involving four attributes and N=3,000,000 in CPU seconds using a 64-bit PC.

]]>