Ethan Steinberg points to a new article by Saul Justin Newman with the wonderfully descriptive title, “Supercentenarians and the oldest-old are concentrated into regions with no birth certificates and short lifespans,” which begins:
The observation of individuals attaining remarkable ages, and their concentration into geographic sub-regions or ‘blue zones’, has generated considerable scientific interest. Proposed drivers of remarkable longevity include high vegetable intake, strong social connections, and genetic markers. Here, we reveal new predictors of remarkable longevity and ‘supercentenarian’ status. In the United States, supercentenarian status is predicted by the absence of vital registration. The state-specific introduction of birth certificates is associated with a 69-82% fall in the number of supercentenarian records. In Italy, which has more uniform vital registration, remarkable longevity is instead predicted by low per capita incomes and a short life expectancy. Finally, the designated ‘blue zones’ of Sardinia, Okinawa, and Ikaria corresponded to regions with low incomes, low literacy, high crime rate and short life expectancy relative to their national average.
As such, relative poverty and short lifespan constitute unexpected predictors of centenarian and supercentenarian status, and support a primary role of fraud and error in generating remarkable human age records.
Supercentenarians are defined as “individuals attaining 110 years of age.”
I’ve skimmed the article but not examined the data or the analysis—we can leave that to the experts—but, if what Newman did is correct, it’s a great story about the importance of measurement in learning about the world.
There’s also this:
Also this: https://www.bbc.com/news/world-asia-pacific-11258071
Basically a lot of stuff (diet, genes, etc) thought to be “linked to longevity” may actually be linked to pension, insurance, and tax fraud.
Also, why is this plot using points instead of bars?
That doesn’t make sense! It’s points because both X and Y axes are irregular. Bars with irregular/overlapping X data are unusual and almost always a bad idea.
I assume it’s probably integer years. The advantage of bars is that you can trace them back to the x axis to figure out which one is which. The disadvantage is that it makes you think of the area rather than the value plotted.
It looks like it could be binned by year, maybe there are overlaps… I can’t tell.
For some reason this reminded me of the fact that the listed heights of NBA players are usually exaggerated.
It’s interesting that the supercentenarians seemed to be on their way down even before “complete birth registration”. Was this because birth registration was ramping up in those years but not yet “complete”? (Maybe this is discussed in the paper, I admit I was too lazy to read it)
I’m going with vampires. Vampires go around causing lots of early deaths, mayhem, hopelessness and subsequent drug use; and, to avoid detection, they like to do their business in poor, remote areas where population statistics are collected sporadically or not at all. As a result, while the population, crime, etc means are shifted to the left the vampires emerge as outliers on the far right.
But do you have any data to support your hypothesis? ;~)
Well, p(outliers|vampires) = (p(vampires|outliers)*p(vampires))/p(outliers) gives me some hope. I think.
As the Senior Consultant for Gerontology for Guinness World Records and the Director of the Gerontology Research Group-Supercentenarian Research and Database Division, I am surprised to see this paper improperly using GRG data without talking to us first about it. This paper, intentional or not, is loaded with so many errors and misrepresentations that I feel that the author should contact the GRG (and the IDL), and myself personally, to discuss a major revision, if not outright retraction.
Just for starters, the author did not lay out in detail which particular datasets were used but the GRG has a lot more private data than what is publicly available and the so-called “dropoff” in supercentenarian counts may be entirely due to backlogs in data information gathering and case processing, which is expected because much of the data on supercentenarians is protected by laws that allow for gradual release (for example, US SSDI records become available three years after death…did this paper account for that?). I would go so far as to say that once all the results are in, the data will show an increase, not decrease, in supercentenarian numbers for both Italy and the USA. Had the author of this paper bothered to contact us first about this, we would could have made that clear to him.
Second, in 110+ years there have been huge population shifts from rural to urban and there’s no clear evidence from what I see in the above paper that the author made any effort whatsoever to properly adjust the data for shifts in population over a century-plus. You cannot compare cohorts from 1900 with data from 2019 and not properly adjust it for the 119 years of changes in between. Consider, for example, the population of fast-growing areas such as Los Angeles, which had a population of about 102,000 in 1900 and close to 4 million today (city limits alone). It’s also the case that both the USA and Italy had much less of a middle class 119 years ago and that much of that developed post WWII.
Third, the author appears to selectively misuse data to make a point, which is more remniscent of a political than scientific approach. For example, many of the “missing” Japanese centenarians were actually persons that disappeared in WWII and is not relevant. Further, the island of Ikaria, Greece has never produced a validated supercentenarian so is not a part of the IDL or GRG datasets. In addition to that, the Sardinia data has been properly validated with actual evidence and the data from Sardinia fits well with other validated data. For example, no one in Sardinia’s data has been older than 112. Compare that to the typical longevity claim to ages 115-130 and it’s a totally different situation.
The bottom line: sometimes hasty conclusions can be drawn when not enough proper baseline study is put into a paper such as this. I’m fully confident that once this paper is submitted to the proper scrutiny it alleges other scientific works should be subjected to, this paper will require a major revision.
I also want to make it clear that this paper failed to properly distinguish between validated and unvalidated data and also did not lay out any “solutions” for better-cleaning the data. The remains an issue that this paper itself needs to address.
Thanks, I have nothing to do with this paper but am just wondering whether this information on the methods has been published previously? It seems impossible to interpret the data without it.
+1. The paper looks superficially convincing, but falls apart once you scrutinize the details.
Update: Upon closer inspection it does appear that the author did some population change adjustments. However, without more clear information on what data was used and how it was used, it’s not clear if the adjustments were done properly. The jury is out on that issue.
For some other issues, however, we have totally outlandish statements such as this one:
Likewise, findings from the Italian data support the hypothesis that these
‘semisupercentenarians’ largely constitute a collection of age reporting errors .
A comment such as that from Saul Newman is greatly flawed, unfounded, uncorroborated, and unhelpful. In addition to the documents checks there are also real-person checks. In many towns in Italy actual centenarians are visited and honored by local government officials who check the rolls. There’s no possible way that a majority Italy’s validated semi-supercentenarian data, published in Science magazine, are statistical errors. By definition, validated data has already been checked and verified. There’s too much evidence to refute that unfounded hypothesis.
And Newman’s thought experiment of 1 person out of 100 cheating and on their age misses a few more points, also: first off, the whole point of having many observations is to reduce reliance on any one datapoint. Taking Newman’s “100 supercentenarians in a room” thought experiment:
“Consider a room containing 100
real Italian supercentenarians, each holding complete and validated documents of their age. One
180 random centenarian is then exchanged for a younger sibling, who is handed their real and
validated birth documents. How could an independent observer discriminate this type I
substitution from the 99 other real cases, using only documents as evidence?”
Strangely enough, Newman seems unaware that demographers already have methods to detect ID frauds…and have already done so, in Italy. For example: https://www.researchgate.net/figure/Family-composition-of-Damiana-SETTE_tbl1_226973121
It seems strange, and uninformed, to set up a “thought experiment” about a hypothetical question that was already answered decades before the question was asked. Perhaps Newman needs to study the topic area further before trying to shoehorn in controversial, unproven assertions.
But even if that ONE false observation slipped through, you still have a 99% certainty in his own thought experiment. Most science considers a 95% confidence level to be statistically significant. Getting to 99% is already going beyond what other scientific fields require. So how can you go from arguing, maybe, hypothetically, the data has a 1% error return rate to “largely errors”? He contradicts himself.
Of course, Newman is not a demographer so perhaps this is what occurs when one wades into a field where they have little to no knowledge of or expertise in. I’d much rather hear what Dr. Kenneth Watcher has to say on this. https://u.demog.berkeley.edu/~wachter/
Not sure if this posted so I’m reposting:
Just a quick little response to part 1 of that very unfortunate and personal attack.
1. I wouldn’t retract from Biorxiv, even if such a thing were possible (it isn’t).
2. Your comment “The author did not lay out in detail” seems to be an unsupported guess. I laid out, exactly, the data and quality controls in the reproducible supplementary code, which does not seem to have been read. Further details will be available when it is published.
3. As for “the so-called “dropoff” in supercentenarian counts may be entirely due to backlogs”: it is not. I tested these data by right-censoring all deaths after the years 2000 and 2010, and a bunch of other dates of death (and birth for left-censoring). It made no difference to the result.
I gave you seven or or seventeen years to complete your record ascertainment, but there was no substantial change in the result. So, your follow-up question “…did this paper account for that?)” is yes.
Read the supplementary code, please.
4. “there have been huge population shifts from rural to urban”: I corrected these data by the 1890-1910 US census of population sizes of the birth cohort, eliminating such shifts. Again, if you read the code, or even just the methods, it is all there.
5.”You cannot compare cohorts from 1900 with data from 2019 and not properly adjust it for the 119 years of changes in between”
I made no such comparison. I used the supercentenarians places of birth, not their 2019 location of residence, to deliberately avoid these problems.
Read. The. Code.
6. “It’s also the case that both the USA and Italy had much less of a middle class 119 …..”
The hypothesis was that MODERN rates of fraud, driven by CURRENT rates of poverty, were causing fraud-based errors in these records. That is why I used corrected, modern data on economic and crime indices.
All of these cases, involving undetected wide-scale errors in a database of 100+ year olds, are directly completely relevant for someone claiming that databases of 110+ year olds are clean data. Claiming otherwise is patently absurd.
These are exactly the people who were (until recently) included in the GRG database. If these data were not cleaned in 2010, then many would have entered your database by 2019.
Sogen Kato, dead in his apartment for 30 years, for example: he had a fully valid set of documents, plenty of witnesses as to his age. Also a mummified corpse, but hey, maybe that’s because of his high tumeric intake.
Or the previous record holder for world’s oldest man, Shigechiyo Izumi, a fully validated case (until he was found to be using his brother’s ID).
8. “Further, the island of Ikaria, Greece has never produced a validated supercentenarian so is not a part of the IDL or GRG datasets.” The island of Ikaria is a blue zone, and was addressed as such.
9. “did not lay out any “solutions” for better-cleaning the data”
The point of the paper is that there are no ‘solutions’ to cleaning these data, and that new approaches must be sought that do not rely on documents (e.g. isotopic dating, as stated in the manuscript).
If there were document-based ‘solutions’, every single government in the world would not have problems with identity theft.
Modern biometric and electronic identification methods do no ensure against identity fraud… what chance do your handwritten docs and newspaper clippings have?
To sum up, the idea that a ragtag group of volunteers and demographers have succeeded in solving a problem all the governments of the world have failed to address, identity fraud, simply using handwritten documents they found in online searches of regional newspapers, stuffed in century-old archives, or hiding in grandad’s sock drawer for 100 years …
Well, I’ll let people draw their own conclusions here, but color me skeptical.
Your comment here flatly makes clear that your paper has no credibility:
“Every ‘supercentenarian’ is an accidental or intentional identity thief, who owns real and validated 110+ year-old documents, and is passably good at their job.”
Your comment above is a personal attack on not just the researchers who have worked in this field for 140+ years, respecting and citing the works of others, but also upon the thousands of real supercentenarians who are not ID frauds and have the documents to prove it.
When you make a comment this atrocious, it calls out the rest of your “work” here.
But while that is the most atrocious comment, it’s certainly not the only one. Your claim that “most” Italian semi-supercentenarians are errant/fraudulent is also totally uncorroborated.
But two very bad unforced errors aren’t enough. We also have your claim that 99% accuracy is mostly fraud…a contradiction in terms.
I could go on.
A few points also:
1. You did not ask for permission to use GRG data and you did not properly come to us to ask us about it. Normal scientific efforts at collaboration would do so. You also failed to respond to any of my e-mails.
2. Your pre-print doesn’t show your data analysis details, convenient for you to hide errors in datasets, data analysis, et cetera. Neither has your pre-print gone through peer review.
3. You do not have a clue as to GRG data methods, techniques, et cetera and I already made it clear that you cannot make those data analyses because you improperly used our data without permission and you don’t understand that there’s a lot more data that has not yet been publicly released (our GRG Charts and Graphs made that clear).http://www.grg.org/SC/Graphs/YOB2.png But that’s only verified and pending from 3+ years ago. We have more cases than that since 2016 and we also have cases that are neither verified nor pending but once all is said and done I am firmly confident that your pre-print will be refuted. I already pointed out that more recent data is less complete due to issues such as less availability for living-case than deceased-case data. If the law prevents the release of 2017 SSDI data until 2020, how can you then compare released data to unreleased data? You stated you used GRG data including to 2017. Once all the “votes are in”, your pre-print analysis will be substantially altered.
4. There’s never been a demonstrated method to accurately ascertain a person’s age using your unproven methods asserted. So, how can you claim to have a solution that has never been properly demonstrated to work, not even once.
5. How can you claim in your own thought experiment that 99% accuracy is mostly fraud? Your own thought experiment is a contradiction in itself.
6. If you compare the literacy rates, life expectancies, and other factors that you used for your “analysis” between the areas you alleged are “mostly fraud” (Sardinia, Okinawa, Ikaria, USA, Italy) to similar factors worldwide you’ll see that in many instances these areas are actually ones of high life expectancy, high income relative to the world standards, et cetera. Even for Ikaria, you allege that 90% literacy rate from a century ago is “high” illiteracy when actually that’s a fairly good result on a world scale from the same time period.
7. The point of age validation is to minimize the impact of data quality issues such as fraud. Your paper seems to overlook this.
8. Sogen Kato was never included in the GRG database. Moreover, one clue that the case was a problem was no recent photos available, yet over 95% of GRG cases have photos available showing that the person is alive. And we include “photo available” as a category filter so if you really wish to exclude “no photo available” cases, you can do so.
9. I never called you an “amateur”. I suggested that your area of expertise was in another field and that your wading into a field outside your area of expertise produced amateurish results. And your extreme comments attest to that, I don’t have to say anything when you are asserting such self-described absurdities that “every supercentenarian case ever was an ID-switcher”.
10. You failed to account for mathematical tests that can demonstrate data quality, such as gender ratio, mortality rate, et cetera.
11. The fact that you declined to discuss this with me privately despite my e-mailed request to you to do so (and you publicly admitted above that you had NO intentions of doing so) and instead went public is a major reason that I’ll calling you out right here. Your fringe hypotheses are uncorroborated, un-peer-reviewed, and not accepted by the scientific mainstream but it does appear that you succeeded in one particular area: generating media attention for an idea that you yourself decribed as “absurd”. And it is.
So be it.
Gerontology Research Group-Supercentenarian Research and Database Division
Senior Consultant for Gerontology
Guinness World Records
Now for part 2:
I have already debunked the claims of Ken Wachter in Science. I wouldn’t go running to him for assistance.
For those of you not “in the know”, Ken’s Science paper had 861 choices for a regression model.
They picked the one regression that:
1) Had the worst possible empirical fit (for both the late-life data and mid-life data)
2) Gave them the most favorable result possible
All other model choices had a smaller effect size, and the vast majority (including best-fit models) HAD NO OBSERVED EFFECT AT ALL.
I already pulled this problem apart in Plos Biology and Science, using open data and code.
As for the rest, you are deliberately or accidentally misrepresenting my arguments. I am stating how, given perfect documentation, you can still have undetected errors at an unknown rate. Even Ken Wachter didn’t argue with that conclusion.
Finally, this whole thread is highly unprofessional, as are the long, abusive, rambling emails you sent me. Most is not worth the candle of a response.
If you can’t make your point with data or logic, please refrain from abuse.
The ‘uninformed’ ‘amateur’ Dr Newman,
Biological Data Science Institute,
P.S. If I truly am an outside amateur, perhaps you should reflect on how easily I have cast shade on ideas that you have cherished for three decades, using a pen.
He never called you an amateur. It seems you are the one resorting to unprofessional personal attacks. And the poor work by Wachter et al in Science doesn’t excuse your own sloppiness. It’s too bad that you are distracting from the good work you have done (your paper is not without value) by losing your cool and making such heated insults.
He certainly did, in his private correspondence. He also called me “totally outlandish”, “greatly flawed, unfounded, uncorroborated, and unhelpful” and many other nasty things in private.
In addition, these comments are ignoring the thesis of the paper, that you may have perfectly consistent, perfectly valid, yet completely inaccurate birth documents. Given these errors fundamentally cannot be detected *using* documents, assuming the error rates of documents are low violates the principle of “Nullius in verba”.
This is evident in Wachter’s previous defense.
He stated that these undetectable errors are rare, as does R. Young.
This raises the very obvious question: if these errors are undetectable, how did you establish that they were RARE?
Finally, these responses make zero mention of Italy, which constitute half the paper and have none of these supposed problems.
I guess I’ll have to take your word for what he called you in your emails. Although, you’ve managed to miscontrue his public comments. For example, he called your *analysis* uninformed here. And “uncorroborated” sounds like something someone would call an analysis, not a person. So, color me skeptical.
If the errors are undetectable, then how do we know they even exist? This is becoming an exercise in philosophical speculation, along the lines of Russell’s teapot and the invisible pink unicorn.
Just scanning Young’s comments, I see multiple references to Italy, and one to Sardinia, which is a part of Italy, so….
Adede, you make a great point here. Where is professional courtesy? Saul failed to contact us or to secure permission to use copyrighted material. He also did not respond to my e-mails and made it clear that he had no intention to do so. The level of his paper is intentionally provocative.
There could be a point here that could be reasonably discussed. I am not in favor of the “Blue Zones” marketing which makes it seem that certain longevous population longevity regions are extremely different. But the marketing done by some is different from the valuable work of the original scientists such as Dr. Michel Poulain, The Willcox Brothers of the Okinawa Centenarian Study, and the like. Notably, no “Blue Zone” has yet produced anyone 115+, which means that what is being discussed is not on the same scale as the myths of superlongevity once seen in regions such as the Caucasus Mountains, Vilcabamba, et cetera which I discuss here:
Where claims to 130+, 140+, even 160+ were seen. Those regions had actual low rates of literacy and poor to no systems of documentary recordkeeping.
We don’t see that with any of the “Blue Zones” which have not produced even a claimant to age 115+.
Also, he didn’t account for the cohort lifestyle changes associated with Okinawa’s shift in obesity/smoking rates for the younger vs older generations.
While there may be ground to discuss the marginal issues, including the occasional hypothetical chance of a fraudulent case slipping through (and science doesn’t guarantee 100% accuracy but a method of testing), I find it highly disturbing that someone is going to claim 99% accuracy as “total fraud” (Saul’s hypothetical thought experiment). His very thought experiment refutes itself.
Personally, when I look at the above exchange the person who looks amateurish and unprofessional is Robert Young. He is either being obtuse deliberately or just ignoring the data presented in Saul Newman’s paper. Not reading the data method supplement is just one example.
The role of Science, Barbi et al. and Wachter in making a result out of questionable regression inputs:
The role of extremely rare errors in late life:
Now we have something to discuss.
I want to make a few things clear:
1. There are two larger background issues that I would like to make clear for third-party readers.
They are: A. There has been a 20+ year academic dispute between competing maximum lifespan hypotheses. The Vaupel camp favors mortality deceleration. The Gavrilov camp favors mortality acceleration/exponential model. Some of these papers are intended to push one point or the other. I have advanced a third option, “Mortality Peak”, which posits that the correct statistical model favors neither extreme. The data show some signs of acceleration until a mortality peak is reached around age 114-117, then an outlier effect (‘deceleration’ if you so choose). But GRG and IDL data show that.
2. There is a push among some to make this a “paper validation vs biopassport” validation issue. I want to make it clear that this is a false dichotomy. The late Dr. Coles, founder of the GRG, believed in the need for both. That’s why Dr. Coles actually conducted biosampling and supercentenarian autopsies and discovered a new cause of death among the very oldest-old (amyloidosis as a significant cause).
3. If Saul Newman were to greatly cut back some of his more outrageous pronouncements, there could be something to discuss here. His own analysis of GRG Data suggests improving data quality over time. But then he throws that under the bus by suggesting that no amount of data cleaning is ever sufficient, which is not the case. Mathematical tests can determine whether the data quality level is correct or not.
4. Some of Saul’s data is flatly erroneous but until he shares the base data used it’s difficult for others to see that. But I know enough about this situation that we are going to see his claims walked back on several issues. Whether the walking-back is done by himself or others has yet to be determined.
5. I can be nice and professional with those who choose to be nice and professional. But Saul Newman has already engaged in improper behavior here, taking copyrighted material without permission and then being too self-important to actually respond to private e-mails. So it appears that he wants the discussion in the open, but won’t share the details of his data analysis that would allow for checking/re-testing.
This is not the last word, I’m sure this discussion will continue. Kudos to Columbia University for showing interest in this discussion. The irony is that the longest-lived persons tend to not get stressed out about things, and over 90% of validated supercentenarians are female (another irony: is Newman suggesting that females disproportionately lie about their age/cheat because that’s the only other way to explain the gender ratio for validated data than the accepted explanation, which is that women live longer and the data quality of validated data is reflective of this intrinsic biological quality…one which his ANI could test for and confirm).
Are you suggesting this is implausible? I’m pretty sure it is pretty much an accepted truth amongst actuaries that women commonly understate their age until a certain point and then begin overstating it.
I couldn’t find the source I was thinking of but came across this:
1910 US Census report: https://www2.census.gov/prod2/decennial/documents/36894832v1ch05.pdf
I wish I could find that source though, it had some pretty amusing anecdotes.
Unvalidated data from questionable source (i.e., regions lacking comprehensive birth registration systems 110+ years ago) have generally shown gender ratios closer to 60% female, 40% male. When data requirements include documentary requirements such as birth registration, the gender ratio switches to closer to 90% female, 10% male. Historically, MALES have overstated their age at a greater level, particularly when extreme age overstatements conferred high status. This is true whether the myth was from China (Li Ching-Yun who claimed to be ‘256’ in 1933), the UK (‘Old Tom Parr’ who claimed to be ‘152’ in 1635), or just about any area looked, from Africa to the Middle East, South America, etc. Indeed, when extreme female age claims (to 130+) existed, they were often part of a larger cultural theme that the population was a “special longevity region” such as the Caucasus. But even there, the highest ages claimed were males to 160+, with the highest female ages in the ‘140’ range.
Considering that in the 1970s, the media often touted extreme claims to 130+ as ‘true’, and only since the push to actively debunk these media myths began to shift the conversation towards a more scientific approach (perhaps one of the first big pushes to debunk longevity mythology included the 1974 questioning of the Caucasus longevity myth, the 1979 questioning of the Vilcabamba longevity myth, and the debunking of the Charlie Smith claim to ‘137’ in the USA in 1979), what emerged instead was a focus on “authenticated” recordholders. But that was a media shift. Even before that, the actuarial experts considered claims to above 113 to be extremely suspect. The gains in population data size, data coverage, and life expectancy increase helped fuel a push of persons living longer in more recent decades. Contrary to Saul’s assertions, we see a major growth in supercentenarian numbers AFTER systems of authentication had been in place for quite some time. As recently as 1999, Japan had no more than 4 living persons age 110+ at the same time, but the numbers have been growing rapidly…at the base of the pyramid, as expected with the rectangularization of the mortality curve and compression of morbidity phenomenon. Had the data increase been pushed by a decline in data quality, we would see a more haphazard datapoint distribution, but that’s not been the case at all. Currently, Japan’s oldest living person is 116 but Japan also has just one person age 115. The data tracks well with gender ratio and mortality rate tests, among others. Note that Japan’s oldest living man is 112, which comports well also with the general trend of the “oldest living male” being consistently 3-5 years younger than the oldest living female in many national validated datasets. Thanks for your interest.
Sorry, but I don’t see how that addresses:
Yes, obviously this is going to be a cultural thing. I would not expect it to be stable worldwide.
Someone who is 100 years old today was born in 1919. You really think there were reliable birth records back then even in the US?
Also you seem to have a lot of interesting facts on hand but I don’t understand why you don’t include the references.
Dr Saul Newaman seems to have good answers to some critique, but sells these short with responses like “Read. The. Code.” These come across as though the intent is not to advance the discussion and to instead seem smarter than others.
A few suggestions for Dr Newman:
1. Simply point people to the precise section of the code or manuscript that counters the critique. Consider the possibility that the way preprint was written might make it hard for some readers and try to find ways to improve the next version.
2. Do not refer to private correspondence. Alternatively, make it public. All that people see here are claims of things being written. How are we meant to know who said what?
For a journal editor reading this discussion the number one question regarding Dr Newman’s pre-print would be “permission to publish data”.
Dr Newman: make sure you have freedom to use the data. Consider that the dataset pertains to people and the possible implications if the conclusion of the analysis is that these people are dishonest. You need to have solid foundations on this issue. Especially now that opinions are being aired in newspaper articles, let alone online forums.
I’m a bit horrified to see here that I am being accused of copyright infringement. On what grounds?
That I analyzed your data?
I have a login to your website, I have permission to use the data and, crucially, I didn’t supply it to anyone else!
Are you so afraid of people seeing the problems with these data that you are going to bring out slander and lawyers? I mean, your data is published in the public domain, without any license attached.
This is horrific. If you have anything meaningful left to say, after peer review, save it for the literature.
You already lied about my ‘lack of correction’ for population size (which I did), and are accusing me of making things up when, by your own admission you haven’t seen the underlying data or evidence!
At what point do you stop trying to slander me?
If anyone wants to see the real response to this article, from people whose jobs and careers don’t depend on the result, please look at the many, many positive judgements being made online by demographers, epidemiologists, scientists and statisticians. There are thousands of them. Or, make up your own mind.
After all, this hasn’t been peer reviewed yet: I had been optimistic in sending it to a low-ranked journal, and it hadn’t been seen by anyone at all, other than me, before going viral.
Please note, and I state again:
1. You never asked permission to use the GRG data that I’m aware of and it’s copyrighted material, to only be used with permission after discussing this with us first, which to my knowledge you did not. Further, I e-mailed you to discuss this after I became aware of your improper use of GRG data and you not only failed to send even a single e-mailed response, you also said here that you did not plan to, either. Hence, you made zero attempts to work anything out…probably because you’re afraid that your incorrect methods of data misuse would be exposed.
2. Who gave you “permission” to use the data? Where did you get a GRG login? Why have you not responded to my e-mails?
3. You pre-analysis of the data is so far off reality that I’m quite confident that you made a lot of glaring errors in your assumptions and methodologies, some of which the GRG (including others in the GRG besides myself) has already identified. But the biggest issue is that you appear to be comparing apples and oranges. I already made it clear that the GRG data that is publicly displayed is nowhere near the entire dataset as can be gleaned from, for example, a graph such as this one: http://www.grg.org/SC/Graphs/YOB2.png Even that is 3-year-old data. Had you properly worked with us, you could have produced something similar to what we see with the Gavrilov use of GRG data. But you did not.
4. Your own comments in the pre-print are, essentially, SLANDER…not only to the actual 140+ years of research done by actuaries, demographers, and the like but to each and every real supercentenarian, ever…and also to many semi-supercentenarians in Italy. Considering that the number of validated cases now is in the several-thousands, there’s no possible way that your bold assertions have any degree of validity. Ignorance is no excuse.
5. Your mischaracterization of the alleged “low-longevity, high-crime areas” when these are areas that actually show relatively high levels of population longevity is improper and incorrect. It may make for sensational media headlines but it doesn’t pass a proper evaluation. Media go for controversy, and you know this. But cut the viral social media trend and look for the actual facts and we will see that it’s not the case. Okinawa and Sardinia are regions of high longevity and high life expectancy, not the mischaracterizations you, and by proxy the sensationalist media, make.
6. My comment about population size adjustments was not a “lie”. This was brought to my attention very suddenly and if I was not made fully aware of what you did, well, the fact that this pre-print went viral before you even talked to the GRG about it is just an example of why “virality” of social media trends does not make the best way to conduct science. Even so, without seeing the details of your analysis, which you did not provide, even adjusting for base population in 1900 does not adjust for the migration factor. Suppose, for example, someone were born in Kansas in 1877 and moved to Los Angeles in 1898. They would be in the 1900 US Census for California but the birthplace would be a rural, small town. Considering that Los Angeles went from about a small town of about 50,000 people to 1+ million people in less than 40 years, population movements must be factored in, not just population density in 1900. That’s why I say the jury is still out on this. Moreover, even some of your characterizations of US vital statistics by state in 1900 appear to be incorrect.
7. Your fringe theory hypothesis greatly contradicts the works of many expert demographers in the field and I expect, should your pre-print be greatly toned down through peer-review and be presented in a scientific paper form, that there still will be issues of degree. It’s one thing to say that there are errors in the data: it’s another to say that all, or most, of the data is erroneous, especially when you are talking about validated and semi-validated data from countries such as the USA and Italy.
Again, I told you that you could e-mail me privately to discuss this, but you did not. I’m firmly confident that your data analysis is very incorrect in both misappropriation and misuse of base data that was not presented as final-form data but as “snapshot” data. If a major news network has votes coming in on election night and only 30% of precincts are reporting, you don’t conclude that is the final vote tally, nor would you compare the 30% tally to four years earlier and say, “what a dropoff in voters!”. Surely you should know this.
Again, I’m always ready to discuss things to get to the facts.
For Supercentenarians, Age Claims, Demographic Questions contact
Robert D. Young, Director, GRG Supercentenarian Research and Database Division
Email: [email protected]
Senior Consultant for Gerontology
Guinness World Records
1. You write,:
Setting aside all legal issues: That seems a terrible attitude, contrary to all principles of science, to think that someone should need permission to analyze data. If you think that Newman or anyone else has made mistakes in their data analysis, then, fine, you can write that up and, sure, feel free to express your irritation at whatever irresponsible mistakes have been made in those data analyses. But to try to hide your data so that others can’t do their own analyses—that’s just wrong.
You write that Newman is “afraid that [his] incorrect methods of data misuse would be exposed.” Fine. Expose the incorrect methods! That is best done openly, not by blocking outside use of data.
2. As you note, Los Angeles had a population of 50,000 in the year 1890. I would not call that a “small town.”
3. You say, “the jury is still out on this.” That’s fine. So let’s start by making the data available to all, not claiming the right to decide who gets to to use the data.
To better understand what Guinness World Records has become, check out John Oliver’s Last Week Tonight episode of August 11, 2019 (season 6, episode 20). No longer amusing, just plain evil.
1. How about you e-mail me privately at [email protected] to discuss the background on this. I also don’t see your e-mail address. You don’t even use a last name…how am I supposed to know who you are?
2. Actually, I completely and totally disagree with you that you suggest that scientists don’t have a right to have their own data. Data is a product of the input variables and it’s important to understand the context of the data. That cannot be gleaned by simply lifting data without permission, without background context.
3. Because Saul Newman has failed to contact me, even once, there is no way for me to see the own data that he claims that he used with permission (but he can’t be bothered to e-mail me and he doesn’t have permission, so that was a LIE). How can you claim that, on the one hand, people don’t need permission to “analyze data”, yet, on the other hand, it is Saul Newman who refused to communicate, NOT me. Saul. Refused. To Communicate. So, who is hiding data, again? Who doesn’t want to expose what’s behind the scenes?
4. The point of a population shift from 50,000 in 1890 to 1,238,000 in 1930 is that per-capita population numbers will vary greatly in areas of high immigration/emigration. Someone may be born in Kansas in, say, 1877, move to LA in 1898, die in, say, 1987 age 110. How in that situation is the person’s life related to the population data? That needs to be reviewed, because many supercentenarians were born in small town/rural areas and later moved to large metro areas.
1. You can click on the link where my name is. All the authors of the blog, including me, are listed on the Authors link at the top of the page, so you can see my last name there.
2. It’s fine for you to have your own data which you don’t let people analyze without permission. That’s your right. But then it’s also our right not to believe what you say about these data. It’s your call: Share the data and move to open science, or keep your data a secret and then deal with the fact that lots of people won’t trust your claims.
It’s as with science in general: if you describe the result of a chemistry experiment but without giving enough detail to reproduce it, then people have to take your claims on trust.
3. I don’t think Newman needs to contact you. I do think it would be good for him to post his data online so that anyone who wants to analyze it can do so, and it would be good for him to address publicly (not just in an email to you or to me or to anyone in particular) any objections to his analysis such as related to your point 4 above.
1. Ok, Andrew Gelman, when I clicked on the hyperlink to your name “Andrew” it didn’t lead to a profile or anything. Thanks for clearing that up. I really wish to speak with you privately because I know who you are and I have some very important issues to discuss with you. So, please e-mail me at [email protected] for followup on that.
2. I have been open to working with people that are willing to work with the GRG. Here is an example of some who have worked with us:
Adam Lenart, Jose Manuel Aburto, Anders Stockmarr, James W. Vaupel, “The human longevity record may hold for decades ,” /Link 11 Sep 2018.
Natalia S. Gavrilova and Leonid A. Gavrilov, “TESTING THE LIMIT TO HUMAN LIFESPAN HYPOTHESIS WITH DATA ON SUPERCENTENARIANS,” Innovation in Aging, Volume 2, Issue suppl_1, 1 November 2018, Page 888 /Link 16 November 2018.
Anthony Medford and James W. Vaupel, “Human lifespan records are not remarkable but their durations are,” PLoS ONE 14(3) /Link 14 March 2019.
Natalia S Gavrilova and Leonid A Gavrilov, “Are We Approaching a Biological Limit to Human Longevity?,” The Journals of Gerontology: Series A, glz164, /Link 04 July 2019.
On the other hand, we have previous experience with some simply lifting GRG data without permission and coming to erroneous conclusions. Here is an example:
Xiao Dong, Brandon Milholland and Jan Vijg, “Evidence for a limit to human lifespan,” Nature. Link Oct 5, 2016.
Strangely enough, the GRG agrees with the general assessment of this Nature article of an average-maximum lifespan close to 115 and a maximum-maximum human lifespan of 125. Where we disagree is the suggestion that the data is stagnating/not still rising, when in fact it is (and others have called them out on this…for example: https://www.nature.com/articles/nature22788).
But the above still fits within the context of disagreement among experts, where we can respect the disagreement.
On the other hand, Saul Newman’s ridiculous assertions included the claim that all humans 110+ are frauds/errors and that most validated Italian semi-supercentenarians are frauds/errors, is so far out of bounds that one Italian researcher described it as “not worthy of a response”. In addition to those outlandish assertions, the entire context of the pre-print is off. He’s describing areas with a high population longevity as areas of “low life expectancy”. That’s not simply an error, that’s a misrepresentation. That’s just one of many objections which I and others have already made. With Japan, he compares unvalidated Japan birth data (which includes missing persons from WWII) with validated Japan supercentenarian data. Notably, the Sogen Kato case was an exception, and the fraud was caught. One exception doesn’t prove a rule. And then Newman’s own thought experiment is self-contradictory: even in his own thought experiment, he is suggesting that 99 of 100 supercentenarians are valid and one is a fraud. The whole point of large-scale data analysis is to get the Big Picture, not to rely on any one case. Most scientific fields, such as astronomy, wish they could have such data accuracy, yet they have situations such as the “Methuselah star” whereby the age variable is 800 million years of uncertainty. The validated human data is far more accurate than something like that.
1. You can google me if you want to know more about my research!
2. You write, “I have been open to working with people that are willing to work with the GRG.” If you only want your data to be used with people who are willing to work with you, that’s your call. But then you’ll just have to accept that lots of people are not going to trust your claims, because your data are not open.
The same would go for Newman or anyone else: if you make a controversial claim and don’t share data or methods, you can run into a trust problem. This is not about you, or Newman, or longevity research, or anything particular to this example. It’s a general issue of reproducibility. See here for an example.
1. I do think that we need to have a private discussion. I had been meaning to contact you already but I wanted to be clear that this was you first. I trust that you will be very interested to hear what I have to say.
2. Technically speaking, the GRG can also be open to working with “doubters”, and the reality is that Vaupel and Gavrilov are at opposite ends of the mortality curve spectrum when it comes to maximum lifespan modeling…so that shows that the GRG data is respected by opposite ends of the spectrum. In these cases, they are following the scientific method, not merely lifting material without permission. Here’s an example of physicists, for example, not sharing material yet because it is intended for refining, review, and future publication/release in 2020: https://www.quantamagazine.org/possible-detection-of-a-black-hole-so-big-it-should-not-exist-20190828/
Are we going to “call out” these scientists for respecting confidentiality? “LIGO/Virgo team members would neither confirm nor deny the rumored detection”? I mean, if it were hidden 20+ years, that would be one thing. But there’s an expectation that data needs to be reviewed properly before it goes public and, also, that those who gather the data should have some control over it before data submission for possible publication. That’s why we have something called “peer review”, which you are firmly aware of.
Additionally, we should not fall for the false equivalency fallacy. If two people are debating whether humans cause climate change and one group has actual evidence-based scientific research and the other group has politically-based assertions, we cannot compare the two. http://trulyfallacious.com/logic/logical-fallacies/presumption/false-equivalence The GRG has actual evidence, Saul Newman has a thought experiment and personal opinion. Not the same.
It is unfortunate that Dr Newman seems to have disregarded some very wise advice to be more aware of how his responses come across to the readers of this forum (in my view, he appears extremely arrogant and disrespectful.)
My date of birth is publicly available information. It’s on the internet. If by chance I follow my family’s trend of women living a long life (my grandmother just passed away, age 98, forty years after her husband’s death), I hereby declare that I DO NOT give Dr Newman or anyone else permission to use my data to make disparaging comments about my life or my place of birth.
I don’t understand your comment. Nobody needs anyone’s permission to use publicly available data.
I was trying to suggest that just because my data is available, perhaps there needs to be some consideration before it’s use by others. That is, some sensitivity that the data pertains to people, it’s not just numbers. I’m no expert here, but I don’t think my family would be very happy if I was called a liar or fraudster after I died.
Dear Dr. Newman,
I agree with all points mentioned by Robert Young and I would like to encourage you to read more about the methodology of modern age validation, outlined by Dr. Poulain of Belgium in the Max Planck Institute for Demographic Research monograph “Supercentenarians”. See his chapter.
I’ve been using the modern age validation criteria for the verification of age of supercentenarian claims from Poland. This methodology allows to successfully recognize, separate and verify cases of true supercentenarians from the raw data. Altogether, the GRG has validated the authenticity of age of 1918 supercentenarians in 37 different countries. More soon.
I do not appreciate the title “superfrauds”. The word itself is offensive and the usage of it is incorrect in the light of our experience, as intentional age misreporting was only one of reasons for age exaggeration.
Waclaw Jan Kroczek
Gerontology Research Group
Administrator for Case Validation Reports
Correspondent for Poland