Djokovic, data sleuthing, and the Case of the Incoherent Covid Test Records

Kaiser Fung tells the story. First the background:

Australia, having pursued a zero Covid policy for most of the pandemic, only allows vaccinated visitors to enter. Djokovic, who’s the world #1 male tennis player, is also a prominent anti-vaxxer. Much earlier in the pandemic, he infamously organized a tennis tournament, which had to be aborted when several players, including himself, caught Covid-19. He is still unvaccinated, and yet he was allowed into Australia to play the Open. . . . When the public learned that Djokovic received a special exemption, the Australian government decided to cancel his visa. . . . This then became messier and messier . . .

In the midst of it all, some enterprising data journalists uncovered tantalizing clues that demonstrate that Djokovic’s story used to obtain the exemption is full of holes. It’s a great example of the sleuthing work that data analysts undertake to understand the data.

Next come the details. I haven’t looked into any of this, so if you want more you can follow the links at Kaiser’s post:

A central plank of the tennis player’s story is that he tested positive for Covid-19 on December 16. This test result provided grounds for an exemption from vaccination . . . The timing of the test result was convenient, raising the question of whether it was faked. . . .

Digital breadcrumbs caught up with Djokovic. As everyone should know by now, every email receipt, every online transaction, every time you use a mobile app, you are leaving a long trail for investigators. It turns out that test results from Serbia include a QR code. QR code is nothing but a fancy bar code. It’s not an encrypted message that can only be opened by authorized people. Since Djokovic’s lawyers submitted the test result in court documents, data journalists from the German newspaper Spiegel, partnering with a consultancy Zerforschung, scanned the QR code, and landed on the Serbian government’s webpage that informs citizens of their test results.

The information displayed on screen was limited and not very informative. It just showed the test result was positive (or negative), and a confirmation code. What caught the journalists’ eyes was that during the investigation, they scanned the QR code multiple times, and saw Djokovic’s test result flip-flop. At 1 pm, on December 10, the test was shown as negative (!) but about an hour later, it appeared as positive. That’s the first red flag.

Kaiser then remarks:

Since statistical sleuthing inevitably involves guesswork, we typically want multiple red flags before we sound the alarm.

He’ll return to the uncertain nature of evidence.

But now let’s continue with the sleuthing:

The next item of interest is the confirmation code which consists of two numbers separated by a dash. The investigators were able to show that the first number is a serial number. This is an index number used by databases to keep track of the millions of test results. In many systems, this is just a running count. If it is a running count, data sleuths can learn some things from it. This is why even so-called metadata can reveal more than you think. . . .

Djokovic’s supposedly positive test result on December 16 has serial number 7371999. If someone else’s test has a smaller number, we can surmise that the person took the test prior to Dec 16, 1 pm. Similarly, if someone took a test after Dec 16, 1 pm, it should have an serial number larger than 7371999. There’s more. The gap between two serial numbers provides information about the duration between the two tests. Further, this type of index is hard to manipulate. If you want to fake a test in the past, there is no index number available for insertion if the count increments by one for each new test! (One can of course insert a fake test right now before the next real test result arrives.)

Wow—this is getting interesting! Kaiser continues:

The researchers compared the gaps in these serial numbers and the official tally of tests conducted within a time window, and felt satisifed that the first part of the confirmation code is an index that effectively counts the number of tests conducted in Serbia. Why is this important?

It turns out that Djokovic’s lawyers submitted another test result to prove that he has recovered. The negative test result was supposedly conducted on December 22. What’s odd is that this test result has a smaller serial number than the initial positive test result, suggesting that the first (positive) test may have come after the second (negative) test. That’s red flag #2!

To get to this point, the detectives performed some delicious work. The landing page from the QR code does not actually include a time stamp, which would be a huge blocker to any of the investigation. But… digital breadcrumbs.

While human beings don’t need index numbers, machines almost always do. The URL of the landing page actually contains a disguised date. For the December 22 test result, the date was shown as 1640187792. Engineers will immediately recognize this as a “Unix date”. A simple decoder returns a human-readable date: December 22, 16:43:12 CET 2021. So this second test was indeed performed on the day the lawyers had presented to the court.

Dates are also a type of index, which can only increment. Surprisingly, the Unix date on the earlier positive test translates to December 26, 13:21:20 CET 2021. If our interpretation of the date values is correct, then the positive test appeared 4 days after the negative test in the system. That’s red flag #3.

To build confidence that they interpreted dates correctly, the investigators examined the two possible intervals: December 16 and 22 (Djokovic’s lawyers), and December 22 and 26 (apparent online data). Remember the jump in serial numbers in each period should correspond to the number of tests performed during that period. It turned out that the Dec 22-26 time frame fits the data better than Dec 16-22!

But:

The stuff of this project is fun – if you’re into data analysis. The analysts offer quite strong evidence that there may be something smelly about the test results, and they have a working theory about how the tests were faked.

That said, statistics do not nail fraudsters. We can show plausibility or even high probability but we cannot use statistics alone to rule out any outliers. Typically, statistical evidence needs physical evidence.

And then:

Some of the reaction to the Spiegel article demonstrates what happens with suggestive data that nonetheless are not infallible.

Some aspects of the story were immediately confirmed by Serbians who have taken Covid-19 tests. The first part of the confirmation number appears to change with each test, and the more recent serial number is larger than the older ones. The second part of the confirmation number, we learned, is a kind of person ID, as it does not vary between successive test results.

One part of the story did not hold up. The date found on the landing page URL does not seem to be the date of the test, but the date on which someone requests a PDF download of the result. This behavior can easily be verified by anyone who has test results in the system.

Kaiser explains:

Because of this one misinterpretation, the data journalists seemed to have lost a portion of readers, who now consider the entire data investigation debunked. Unfortunately, this reaction is typical. It’s even natural in some circles. It’s related to the use of “counterexamples” to invalidate hypotheses. Since someone found the one thing that isn’t consistent with the hypothesis, the entire argument is thought to have collapsed.

However, this type of reasoning should be avoided in statistics, which is not like pure mathematics. One counterexample does not spell doom to a statistical argument. A counterexample may well be an outlier. The preponderance of evidence may still point in the same direction. Remember there were multiple red flags. Misinterpreting the dates does not invalidate the other red flags. In fact, the new interpretation of the dates cannot explain the jumbled serial numbers, which do not vary by the requested PDFs.

This point about weighing the evidence is important, because there are always people who will want to believe. Whether it’s political lies about the election (see background here) or endlessly debunked junk science such as the critical positivity ratio (see here), people just won’t let go. Once their story has been shot down, they’ll look for some other handhold to grab onto.

In any case, the Case of the Incoherent Covid Test Records is a fun example of data sleuthing with some general lessons about statistical evidence.

Kaiser’s discussion is great. It just needs some screenshots to make the storytelling really work.

P.S. In comments, Dieter Menne links to some screenshots, which I’ve added to the post above.

64 thoughts on “Djokovic, data sleuthing, and the Case of the Incoherent Covid Test Records

  1. Regarding the desire to believe, I always try to remember that Feynman quote: “The first principle is that you must not fool yourself, and you are the easiest person to fool”. It’s so hard to know if I’m not falling for these traps myself, in some ways. Confirmation bias is a hell of a drug.

  2. https://news.ycombinator.com/item?id=29895398

    “Absolutely wrong. Timestamp is the time of pdf download. Everytime you repeat the process timestamp will change. Difference in confirmation code is because he used 2 different state labs on 16th and 22nd. Very poor analysis, I have to conclude.”[1]

    “You need to understand that these papers are not fakes. Both tests, database records as well as pdf certificates are 100% legit. He and his family have enough influence in state institutions to organize 100% original PCR certificate but with false positive result.”[2]

    “I have now checked your theory with my PCR test from October 1st. I have downloaded it now from http://e-zdravlje.gov.rs, scan QR code and timestamp from URL is 1641917096 (Tue Jan 11 2022 16:04:56 GMT). So timestamp in URL is not when test is done, but when test is downloaded”[3]

    • The argument that the test result is fake indeed is based on [2]. The claim would go as follows –
      They (in Serbia) issued a 100% original PCR certificate with false positive result but did not update the QR code. They did not pay attention to the test ID number either because of carelessness or because they did not expect these documents to become public.
      Having said that, since Serbia test results are at a risk of being black-balled, the health minister stepped in to make a statement that the tests are legit.

  3. Because of this one misinterpretation, the data journalists seemed to have lost a portion of readers, who now consider the entire data investigation debunked. Unfortunately, this reaction is typical.

    This reaction to this is because the initial claim spent most of its time on this one timestamp issue. That means it was what the author thought was the most significant finding. And it was refuted. It was something that warranted a bit further research, and instead they made VERY SIGNIFICANT ACCUSATIONS without checking with any Serb, even.

    If it’s wrong they should issue a mea culpa, and reconsider their other supporting arguments. I have not seen that here. You can’t send out a shotgun-like smattering of “red flags” and then keep your prior conviction when one gets knocked down due to sloppiness or laziness.

    This is no different than the and the birther movement that found lots of minutiae to keep their “just asking questions” method going.

      • Yes, I read that. Is “yes, the first and largest part of out accusation is flat out wrong, but just ignore it then” really addressing it? It’s certainly not reconsidering their work as I suggested was warranted.

        As a Bayseian, don’t you have to consider the rest of their work at high-risk of being faulty after that? If they didn’t confirm any of their assumptions, (which was shown to have been easily done), and they’re making no effort to confirm them now that one was shown to be wrong, why should you trust them at all?

        • There is of course a chance that they are driven by their own presumptions. The Bayesian would reduce the probability of them being correct but it certainly does not shrink that probability to zero. The idea that a mea culpa is needed subscribes to the black-or-white mindset but we like to embrace uncertainty! If there is further information that comes out to explain the serial numbers, I’d become more skeptical.

          I think of how I would react if this were my own work. Much of statistics work is data sleuthing.

    • “You can’t send out a shotgun-like smattering of “red flags” and then keep your prior conviction when one gets knocked down due to sloppiness or laziness.”

      Si unum ex his occidi, omnes occidi.

      The red flags were up before this data sleuthing started. Djokovic submitted a false visa request stating he had not traveled, then said a minion did it by mistake. He thumbed his nose at quarantine requirements. These were falsified government forms, and those false claims influenced the decision to give him a waiver. Zhou Fang put the sleuthing in the proper context, it yielded an interesting collection of small flags and one had to be taken down. Those little flags hardly showed against the big flags behind them anyway.

      I’m kind of astonished at how well this worked out for Australia. I think the ministers et al played their cards brilliantly. I also think they got a few strokes of luck. Djokovic says he won’t talk about it anymore, the Open won’t be disrupted by (much) anti-vaxx shenanigans, lockdown resentment has been mollified, the press is moving on. What’s not to like?

  4. Some aspects of the story were immediately confirmed by Serbians who have taken Covid-19 tests. The first part of the confirmation number appears to change with each test, and the more recent serial number is larger than the older ones. The second part of the confirmation number, we learned, is a kind of person ID, as it does not vary between successive test results.

    Why don’t they just ask whoever made the website rather than speculating? That is what I would expect a competent journalist to do.

    • Along those lines, I wonder why the people who do lots of web scraping won’t just ask the owners of those websites to give them the underlying tables rather than fishing the data that have been inserted into irrelevant code?
      (If the answer is the website owners won’t allow it, it raises a different question!)

      • “(If the answer is the website owners won’t allow it, it raises a different question!)”

        There are many, many good reasons not to allow direct access to the databases that back public websites. Key among which is security.

        • I thnk Anoneuoid was talking about contacting the db owner directly and request the data, which is different from allowing public access to the db itself.
          What I had in mind was a website like ESPN, which obviously does not want people to scrape their website because the developers make it difficult to scrape. If we ask them for the data directly, the answer will be no. If we then go ahead to scrape (work around their obstacles), how do we feel about that?

        • I thnk Anoneuoid was talking about contacting the db owner directly and request the data

          Not even the data, just ask them how the confirmation number is generated.

      • Huh that’s an interesting thought. My reaction is similar to dhogaza and Zhou Fang — like there are tons of reasons to not offer a public API.

        Like, your main purpose with the API would be to serve whatever internal applications you have, so it could change on a dime and leave any external users hanging, and you might not have time to update docs for external users, yada yada.

        And so I don’t think it would be practical to offer a stable interface. Maybe web scraping tools kinda have the built in assumption that what you’re trying to get at is kind of wacky — like the interface is some sort of weird blob of HTML and javascript and whatnot, and it tries to work with that? Or maybe the user interface which the scraper parses is actually more stable than the underlying interfaces?

        And so if you build your stuff around scraping then you’re kinda more ready for wackiness and change than if you try to assume an API?

        I don’t know. I’ve not scraped very much in my life.

    • > That is what I would expect a competent journalist to do.

      You must be continuously amazed at how few people satisfy your standards for competence and morality.

  5. Djokovic was deported because he won the Australian Open 9 times, and the authorities do not like his attitude.

    “Given Mr. Djokovic’s high-profile status and position as a role model in the sporting and broader community,” Hawke said in a statement, “his ongoing presence in Australia may foster similar disregard for the precautionary requirements following receipt of a positive Covid-19 test in Australia.”

    • I’ve been saying since early 2020 these PCR tests have never been properly validated.

      Now it seems that when CT is over 27, there is rarely any virus found (in the US they were using a cutoff of CT = 40 originally, no idea if that has changed or how that differs around the world). And below that there is little correlation with the amount of virus:

      Only specimens with CT-values below 27 for the E-gene RT-PCR diagnostic target (Cobas, Roche), as determined by the clinical laboratory, were included in our study, as previous we and others have shown that infectious virus cannot be reliably isolated from samples with higher CT-values (9, 30).

      […]

      The magnitude and timing of infectiousness of COVID-19 patients is a key requirement to make informed public health decisions on the duration of isolation of patients and on the need to quarantine contacts. Infectiousness is strongly influenced by VL in the URT of infected patients (4). However, VL is often measured as RNA copy number and not actual infectious virus. In this study we could show that RNA copy numbers in NPS samples poorly correlated with infectious virus shedding. This is in line with several other studies that found RNA is a poor infectiousness indicator especially in the presence of neutralising antibodies (9, 32). In addition, in an animal model it was demonstrated that infectious virus, but not RNA, is a good proxy for transmission (8).

      https://www.medrxiv.org/content/10.1101/2022.01.10.22269010v2

      That is why using these tests to decide who can go where does not work.

      • > That is why using these tests to decide who can go where does not work.

        Agreed. I’ve seen credible experts explain that at minimum a level of 30 should be the limit, if not lower. And RATs should be used for the purpose of identifying infectiousness.

        That said, in Australia they conducted a HUGE number of tests with a very low positive rate until recently.

        Which shows why the “false positive” rhetoric is reprehensible. (Not that Anoneuoid would ever engage in such rhetoric, of course).

        • Presumably several factors played into the decision, sure. But to say he was excluded because he won “because he won nine times and the authorities don’t like his attitude” is to ignore the two -largest- factors! He is unvaccinated and he lied on his visa application!

        • Mendel:

          Or, to put it another way, if Djokovic wasn’t a role model, maybe they wouldn’t have even considered letting into the country given that he is unvaccinated and he lied on his visa application.

        • Right, no need to guess. He is not being punished for lying. Based on what the authorities say, he was deported out of fear that he might influence public opinion.

        • Roger –

          > Right, no need to guess. He is not being punished for lying. Based on what the authorities say, he was deported out of fear that he might influence public opinion.

          What explains this binary thinking? It doesn’t have to be one or the other.

          At any rate, I love the characterization of “fear that he might influence” ..that’s one way to put it. There are others.

          How about “entirely appropriate and prudent concern among public health officials that his propogation of misinformation could lead to unnecessary illness and death?”

          Or perhaps even “prudent enforcement of public health policies despite whining and self-victimization by an entitled elitists because he couldn’t just trample the rule of law simply because he wanted an opportunity to make millions by a few hours of playing tennis?”

          Or then on the other side perhaps, since you think Malone is credible, “he was deported because billions of ‘cowards’ have been hypnotized into a state of’ ‘mass formation psychosis.'”

          Would that work?

        • Roger –

          I get that – but obviously, they’re concerned about unnecessary illness and death. Describing that as merely “fear” of his influence seems rhetorically focused. There’s obvousjy relevant context beyond just a binary framing.

        • According to the official statement he was deported for his presumed political beliefs. What those beliefs are is basically irrelevant; this should be plainly unacceptable in a democratic society.

        • D, Roger:

          Sure, but, as Phil says, all of this is conditional on Djokovic not getting vaccinated and then lying on his visa application. Given all that, the whole thing became one big negotiation and then lots of issues came in to play.

        • d –

          > According to the official statement he was deported for his presumed political beliefs. What those beliefs are is basically irrelevant; this should be plainly unacceptable in a democratic society.

          That’s anther example of exactly what I was saying above.

          You want to ignore many things that are incredibly obvious about the context and narrow your focus to a totally literal interpretation of a sliver of what they said, because doing so reinforces YOUR political agenda. To do so, you have to conceive if him as some hapless victim – as if he had no agency or accountability. The presumed embedded entitlement that has to be canceled out in the calculus that adds up someone so uniquely privileged being viewed a poor little innocent is rather remarkable.

    • This is the right take and “sleuthing” the Serbian PCR front-end misses the forest for the trees. If not his PCR test, some other pretext to deport him would have been found.

      Were transmission into Australia actually the driving concern presumably testing Djokovic and enforcing a quarantine would have sufficed? That nearly the exact opposite occurred — locking him in a crowded jail, shuttling him back and forth between endless hearings — makes clear this was political theater and not source control.

  6. Only the issue of the QR code changing within the hour stands from the original Spiegel article. The test result number is not necessarily proof of fraud.

    A Serbian:

    I have 2 more datapoints. My wife went with me to the testing on the same date. If you’re negative on rapid test, they usually take PCR as well. Since she was negative on the first one, there was a PCR too.
    What’s interesting is that her test IDs are 7601574 and 7631146 while they were taken within 15 minutes from each other. There’s some 30k difference, and I think Serbia runs around 40k tests a day. PCR samples are sent to central lab and processed later, that would explain why PCR’s ID is much higher.”

    https://news.ycombinator.com/item?id=29894843

    • This datapoint doesn’t really dispel the suspicion re: test ID.
      As mentioned, that 30k difference is smaller than the daily testing numbers in Serbia nowadays and there is a perfectly reasonable explanation about the difference in rapid tests and PCR.

      Djokovic’s test on 12/16 with test ID 7,371,999 is off by 100k
      (when daily testing was ~12k/day)

      https://twitter.com/msuvakov/status/1481282301837422593/

      The green dots are crowd-sourced datapoints and the white dots are imputed.
      The red circle is the test in question.

      • We don’t know how these test IDs are assigned. As the poster says, different labs could be assigned different batches of test kits from which the numbers are derived.

        • It’s clear the statistical evidence is suggestive but not sufficient. But this theory you cited has even less data supporting it than the original theory. But I think you bring out another interesting aspect of my reaction to this work. The consultancy’s theory sounds plausible to data analysts because we have all worked with various indices and serial numbers, and their interpretation is very mainstream. No one except the people who set up the database can be sure whether they chose a less common way of setting up indices. Any statistical argument can be “overturned” by the outlier exception; that’s why I said in my post, I don’t like outlier arguments, especially if there is no evidence of it.

      • Thumbs up on this crowdsourcing exercise!

        There is actually a whole cottage industry of people analyzing serial numbers. For the data sleuths here, this article covers how people try to estimate iphone sales using crowdsourced serial numbers. More fun stuff!

  7. Kaiser says he doesn’t like outlier arguments, but doesn’t that approach preclude an outlier like a tennis player who wins too many matches? Do you suspect he cheats at tennis too because he’s so different from the average?

    • Rsm:

      People have used statistical evidence to motivate a more careful search for physical evidence. The most famous case of this is Barry Bonds’s unprecedented late-career trajectory as a slugger. I guess it is true that when a player does very well, sometimes people will raise suspicions. People said things about Serena Williams back when she dominated women’s tennis. But I’ve not seen any claims that Djokovic cheats at tennis. The issue of lies on his visa application are different because there’s evidence, as discussed in Kaiser’s post. This is all consistent with Kaiser’s general point of the necessity of looking at all the available evidence. You might be interested in his posts from a few years ago about the Lance Armstrong case; see here.

      • Why do I still feel like you’re finding convenient, arguably flimsy evidence to support arbitrary state authority, because mean (in both senses of the word) reversion?

        What if I’m an outlier (if Kaiser’s general objection to outlier arguments allows individuals as data points) that defies your model, do you simply throw out my point of view when summarizing your recommendations to policymakers? Should outliers simply succumb to sacrifice for the good of the model?

        • Rsm:

          Travel restrictions are definitely “state authority,” no question about that. Are they “arbitrary state authority”? That’s a matter of opinion.

          When my grandmother was two years old, she and her family immigrated to New York. When they got to Ellis Island, the authorities turned my grandmother away because she was sick. They sent her back to Europe and it was only a few years later that she returned to the United States. This was state authority, and in the telling of the story it always sounded horrible to us. Was this “arbitrary” state authority? I don’t know. It depends how you define “arbitrary,” I guess.

          In any case, this was originally about Kaiser’s post, which wasn’t about state authority at all! It was about evidence from data.

    • Let me make the argument more precise.
      What I don’t like is the following typical argument: (I’m speaking generally now, not about Novak)
      A: Here is my theory of what happened. I have three pieces of evidence, X, Y and Z, that all point in the same direction. Therefore, I believe he did it (beyond reasonable doubt).
      B: Wait. You interpret Y to support he did it. But here is an alternative interpretation of Y which suggests he didn’t. Therefore, he didn’t do it.

      If one turns this around and ask what B’s theory of what happened is. It is that: An interpretation of Y suggests he didn’t do it. Therefore, he didn’t do it.
      In the meantime, we know that X and Z contradict the he didn’t do it theory. Further, the alternative interpretation of Y has no direct supporting evidence, it’s just something that is possible.
      Then, typically, B comes up with an alternative interpretation of X, suggesting that he didn’t do it. Again, this is just a possibility. B – at this point of the conversation – typically has not gathered any evidence to support these alternatives. Sometimes, these interpretations are “outliers” by which I mean, in the probability distribution of possible interpretations, those are in the tails, not close to the median.
      My point is that the above types of debates are exhausting and unproductive. What’s more productive is if B goes away and comes up with a different set of evidence (K, L, and M) that suggests he didn’t do it. Then, we can adjudicate whether X, Y, Z is more plausible than K, L, M.

      • I think you’re shoe-horning this situation into something that it’s not.

        My interpretation:

        A offered X (if M is true), Y (if N is true), and Z (if O is true) as evidence. B proved that M was not true, and that therefore was false. It is clear that A did not make any effort on M, N, or O. At this point B has also shown that N is not strictly true.

        And we’ve completely forgotten about W (the test result changing) which was ignored in this whole thread! (That would require evidence tampering at multiple levels which would blackball all Serbia’s test results).

        I think it’s the responsibility of A to tamper their expectations more than B here.

        Note that NONE of the B’s chiming up in this thread are saying it’s a zero chance. Only that the prior of falsifying a test date (say 0.5%) hasn’t changed. Evidence which hasn’t been mentioned include (a) the Serbian Minister of Health confirming the tests, and (b) Djokovic admitting against interests to other malfeasance while being ill. But those are MINOR updates compared to the huge leaps of assumptions needed for X, Y, and Z.

        • I would say that if I were on Djokovic’s PR team, the least damaging admission would be to say that he didn’t get the Dec 16th results in time for the 17th maskless meeting with kids but to admit that he met with L’equipe and didn’t inform them since that would be 2 days after a critical test report was issued. It is very strange that he would not be eager to see a critical test report at the earliest. Additional information to consider: (a) after the Adria tour, he informed everyone he had contracted COVID and was isolating (b) No contemporaneous announcement: what is the secrecy in announcing a positive test given several double-vaxxed player had just announced that they were positive after visiting Dubai.

          We also have the issue of the AO biosecurity manager resigning on December 28th for undisclosed reasons.

        • Further the PR team suggested he contracted it at a December 14th basketball game. The Dec 16th sample was collected ~1pm. It seems to be an awfully short time to test PCR positive. We need some epidemiologist input for this.

        • Typically in these discussions, A did all the homework, B is presented with the findings and reacting to them. So, B did not “prove that M was not true”. B offered an alternative which is that not X (if Q is true). That doesn’t even capture it because typically M is true has some data supporting it (in this case, the somewhat flimsy analysis on the gaps between serial numbers) while Q is true has nothing supporting it but a structural argument (e.g. that the serial number could have been constructed in a different way). The problem is that not X (if _ is true) is too easy to construct if (a) _ can be anything that leads to not X, and (b) _ does not need to be proven. You can invalidate most statistical arguments using this tactic.

        • So, B did not “prove that M was not true”. B offered an alternative which is that not X (if Q is true).

          Ok, fine, then you’re talking typically, and not what happened here specifically. Twitter and HN absolutely did more work investigating whether the timestamps were related to the test date or access than did Spiegel. I’m talking about why this specific case isn’t the typical case that you seem to be referring to.

        • While it’s true that the timestamp is related to when it was accessed, it is also true per Serbians that as soon as a test is completed, an email with attached pdf is auto-generated. So, it would be consistent with the proposition that the Dec26 timestamp comes from the auto-generated email for a test recorded on that date.

  8. Date of result: 16.12.2021, 20:19:56 / Timestamp: 26.12.2021, 13.21:20 – the test was completed in the evening but the pdf was generated in the early afternoon. But that only means it’s not just about “26” to “16” or did I get it wrong?

    • Every time a download is performed, a timestamp corresponding to that date is attached. It could be argued that the test was completed on 16.12.2021 in the evening, but result submitted in court was downloaded on 26.12.2021. So timestamp in isolation is not a flag. It’s only the confirmation number 7371999 for the 26th December download that is consistent with a test performed on the 26th that is a red flag.

      • I had the impression that a popular theory was: the test was done on December 26th and then it was “changed” to December 16th – just “26” to “16”. But I would say: it can’t be that there was only this tiny change – if it’s true that you can’t manipulate a test several hours before it’s finished.
        Of course, since it was proven that the timestamp changes every time the pdf is generated again an again, this specific timestamp does not prove that this test was done on December 26th. But you can say: that indicates that the test has been put into the database before December 26th, 13:21h.
        If the numbers of the confirmation codes are in fact sequential you really can/must argue that the test was put into the database very close to December 26th, but not later than 26th, 13:21h, (according to timestamp).
        Please correct me if I made wrong assumptions so far.

        • I’m not aware of whether the popular theory was a simple date change i.e. 26th changed to 16th. I think the possibilities are:
          1. Valid test conducted on Dec 16th but for some unproven unknown reason has a confirmation code that is out of sequence. A printout on Dec 26th would put this possibility entirely in the favorable to Djokovic scenario.
          2. A fraudulently inserted test on Dec 26th backdated to Dec 16th. Includes the words “positive” and sampling date “Dec 16th”. But QR code links to database showing a negative test result. Could be done either by somebody within the health system or was postprocessed to alter date and change “negative” to “positive”.

          Based on the crowd-sourced Serbian test data by @msuvakov above, case 2 looks more likely as the confirmation code fits a Dec 26th date of insertion into database. It is also confirmed in a parallel method used by the Spiegel consultancy who take into account the tests per day in Serbia between Dec 22 and Dec 26.

          So, yes, per confirmation code and timestamp, you are correct that the test was very likely put into database very close to Dec 26th 13:21h

          Unproven argument against possibility 2: that the tests came from different labs who are assigned different confirmation numbers for their tests. There are a handful of data points from @msuvakov where this is not the case i.e. results from different labs still have higher number for later tests.

        • For the reasons you state (time of test on 16th and timestamp on 26th) it is also more likely that this was an inside job within the health system and not post-processing

  9. So what is the upshot –someone did or did not tamper with the results. Serbia is known as a country where officials are on the take. A bribe might go a long way in such a country. See https://en.wikipedia.org/wiki/Corruption_in_Serbia#:~:text=According%20to%20Global%20Corruption%20Barometer,social%20welfare)%2C%20had%20paid%20bribe

    and
    https://www.transparency.org/en/cpi/2020/index/srb
    The country had alot to gain if DJoker played.

  10. As a reply to the comment of RB, 22.01.2022 12:53 PM:

    Like you said, the different-labs-theory is unproven until now – and it doesn’t really make sense. The sequential-numbering-theory on the other hand is supported (Suvakov, Vreme) and does make sense.

    About that Sunday, 26th: It is possible that there were no regular appointments for PCR testing at that institute on that day, December 26th. On the website now: Monday-Saturday, 7:15h-14.30h. But I wouldn’t say that this would exclude the December 26th, because it is possible that such a prominent person would get to be tested there on a Sunday.

    About 51.080 * 51.081: The difference between both confirmation codes is 51.080. That is easy: 7371999-7320919. The number of all tests done in the country between December 22nd and 26th is 51.081. That number is given by ourworldindata. Being the first and the last test of the 51.081-portion would explain the 51.080-difference.
    I don’t know when, how, where these numbers are recorded, I would just think the numbers for the daily tests are reported at a time when there is no new input, i.e. after hours. The difference 22nd/26th is equal to the sum of the numbers of the daily differences given by ourworldindate: 22nd/23rd+23rd/24th+24th/25th+25th/26th. Could it be that the 22nd-test was put into the system as the first test after the number of the tests “21st/22nd” was recorded? And the 16th-test was put into the system exactly as the last test before the number of the tests “25th/26th” was recorded?
    I guess, it would have to fit to the timestamps – that it would only be possible to generate that pdf when the data has already been put into the database. Assuming that the pdfs were generated immediately after the tests were put into the database – would it then be possible that the number of tests for “21st/22nd” was already recorded when the 22nd-test was put into the database – on December 22nd, 16.43h as the first test for “22nd/23rd”? And accordingly that there were no other tests processed and recorded after Sunday 26th, 13.21h, for the number of tests for 25th/26th?
    If this was true it wouldn’t prove or disprove that the tests were fake. Just that the tests were put into the database irregular, i.e. before and after the regular hours. I could actually think about more or less convincing reasons why this was done this way, even without any fraud involved. And these reasons could even include a harmless explanation for the negative-positive-switch-thing. But I would think the “how” must make sense before the “why”.
    That’s why I would like to hear if I made mistakes in regard to the “how”. Thank you.

      • Thank you anyway for your reply!

        Little addition: The numbers shown by ourworldindata are the numbers shown on each of these days in December on the official government website, on every day with the note “updated xx.12.2021. at 15:00h”.

  11. I don’t know which period of time the daily number of tests covers. So I assume: It might be the actual last 24 hours (15h-15h) or another 24 hours like the previous day (0h-24h). That means that the portion of 51.081 tests could be the tests put into the database between 22.12., 15h and 26.12., 15h. Or between 22.12., 0:00h and 25.12., 24h. Or another period of 96 hours before 26.12., 15h because it’s only certain that the time for the website update (15h) is the latest possible time and date for that portion of 51.081 tests.

    The numbers for the daily tests are:

    website update 23.12., 15h: 14.291
    website update 24.12., 15h: 14.656
    website update 25.12., 15h: 12.869
    website update 26.12., 15h: 9.265

    14.291+14.656+12.869+9.265 = 51.081

    The other numbers/dates are:

    7320919-7371999 = 51.080

    Test 7320919: Sampling 22.12., 14:12h / Result 22.12., 16:15h / Timestamp 22.12., 16:43h

    Test 7371999: Sampling 16.12., 13.05h / Result 16.12., 20:19h / Timestamp 26.12., 13:21h

    Even when I assume that it was possible to put a real test from December 16th into the database on December 26th for whatever innocent reason I still have no idea how to get all the dates into a timeline. Not even with moving the tests within the portion of tests (1.+50 and 51.081+50 for example). Not with and not without including fraud.

    I hope I didn’t mix anything up, please check if something seems wrong and tell me. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *