The Impossibility Of Florida Data Manipulation

Claims of data manipulation in Florida can only survive if one is ignorant of the data process.

[EDITOR’S NOTE: This piece was originally written based on conversations with a high-level contacts within the Florida Department of Health. Upon publication, I was contacted by a very kind and generous employee Florida DOH about things that I had gotten wrong about the COVID data flow. These errors were almost certainly the result of my own poor note-taking or misunderstandings. I have re-worked this piece sentence by sentence with someone within the FL DoH to make sure it is the most accurate recitation possible of the FL DoH data flow. This is hyper-nerd stuff so I thank you for sticking with us on this journey.]

If you haven’t read Charles CW Cooke’s excellent piece on how Rebekah Jones maneuvered her way from disgraced former employee of the Florida Department of Health to CNN celebrity, it is worth a read. Cooke details the turmoil that Jones caused within the DoH until she was fired and how she has cast her own mistakes as a grand conspiracy from Ron DeSantis and the “alt-right”.

After Jones was fired fired from her position at the Florida DoH, she launched her bright new career as “A person who will not shut up about how the data is fake but refuses to give you any concrete examples about how the data is fake.” The strategy she has settled on is to simply say that things are fake and then ask for money so she show you the “real data”.

There’s not any gentle way to say this: This is a scam. If you know how COVID data works in Florida, it’s easy to pick this scam apart. But most people don’t know how it works, because things are complicated and “Florida evil” draws a lot more clicks and a lot more money on a lot less effort than “here, let’s walk through how things work.”

In light of that, I would like to make what I hope is my final contribution to the discourse on the reliability of Florida data. We will see how Jones’ accusations aren’t just improbable, they depend on people simply not having any idea how a data pipeline works and not being able to compare one state to another.

The Journey of COVID Data

Our journey begins, much like my 5th birthday, with a trip to a medical professional in which an uncomfortable stick is removed from your nose. The contents of the stick are tested for the COVID virus.

Once the results are obtained the majority of care facilities enter their information through ELR (Electronic Laboratory Reporting). ELR is a reporting implementation that is supported and monitored by the CDC. Those facilities without ELR may enter their information into a secure web portal hosted by the Florida Department of Health. If they don’t have the capability to send the information in a secure electronic form, they may send it to the county health department directly via fax. 

From ELR, the web portal, or the county, the data is collected into the Bureau of Epidemiology’s web-based reporting system, Merlin.

Merlin hosts this data and manages a task list for each county to review and manage their county data. From there, the county may request more details (for example, a request for full medical records) from a facility. 

Every morning the Florida Dept of Heath prepares a county-by-county PDF report based on the data from Merlin. These reports are sent to the county health departments where the county epidemiologist reviews it for accuracy and must sign off on it for the weekly data summary.

You can see the current COVID-19 summary report here. In addition to the data on COVID testing and positives, we have data on Flu-like illness, COVID-like illness, cough-associated admissions, and how many ER visits include cough, fever, or shortness of breath.

This ER data is from Florida’s Electronic Surveillance System for the Early Notification of Community Based Epidemics, which is shortened to ESSENCE and is an excellent example of why governments should not be allowed to name things and while we’re at it let’s ban them from logo design as well.

This data is updated daily or bi-hourly directly from emergency departments and urgent care centers around the state. Like Merlin, there is no intermediary that can intercept this data and change it. To submit false data, you would need to be in contact with each individual medical facility, gaining the confidence of every doctor, encouraging them to look at their patients and say “I know it looks like they are coughing and have a fever, but I need to do my part to lie about my patients so Ron DeSantis looks good, so I’ll just say they got bitten by an alligator instead”. It should go without saying, but county health departments can view the data in ESSENCE so that they can appropriately verify the COVID report that the state health department sends them.

The data is cleaned constantly. First, there is an attempt to verify state and county resident status, which is done by the counties. Let’s say that a patient crossed from one county to another to take a COVID test. The cleaning process will make sure that the result is attributed to the county the patient lives in, not the county they took the test in. 

Another cleaning process, one performed on the state level, is deduplication. Some deduplication is performed due to malformed data. The system looks for fuzzy matches, instances in which two records are similar enough that their differences are probably due to an entry error. For instance, if two patients have the same name, home county, and test date, but a patient’s date-of-birth is incorrectly entered, the system will treat them as a new patient. The deduplication process performed by the data team looks for these kinds of errors and merges them.

Another reason records are deduplicated is due to multiple tests. Let’s say someone takes multiple tests in a week, the first one negative and the second one positive. The Department of Health will “de-dupe” the records so that only the positive record shows. We can illustrate this with an absurd example. Say 10 people take 10 COVID tests each in a week for a total of 100 COVID tests. 2 of those tests come back positive, which means that 2 of those people have COVID. If we looked at just the test results, we would say that there is a 2% test positivity rate. In reality, there is a 20% individual positivity rate, which is the far more important number.

The Florida Department of Health reports the “test positivity” on a day by day basis in their weekly reports, but in order to accurately determine the “resident positivity”, they have to detect and remove those duplicate test results and the deduplicate process is how they do that.

It is important to note that, when this cleaning happens, it is not done by removing case records from the database. For audit reasons, even duplicated records are not deleted. They are merely marked in such a way that they are not double-counted when the states and counties report their numbers. Even this is done in the context of a huge electronic paper trail, with communication back and forth between the county and state health departments and direct requests to the Merlin administrators to make these changes.

Any change made to the county data would be reviewed by the county epidemiologist and would need their approval and sign-off. This vastly expands the required participants in this imagined conspiracy and would necessarily include conspiratorial cooperation from health departments in deep blue counties such as Dade county.

Upon sign-off on both state and county level, there is an API (application programming interface) access layer between Merlin and the Florida Department of Health’s ArcGIS portal. It is from this data portal that the official Florida COVID dashboard pulls its data for display.

If you are deeply familiar with the saga of Rebekah Jones, you will recall that it was the publication of a specific column to this portal that was the point of contention between her and the state’s epidemiological team. Jones had the administrative ability to show or hide columns of data, but absolutely zero access to the underlying data, which is nothing more than the data that has been specifically selected for the Florida dashboard.

In fact, Jones’ own dashboard, which she claims to be the cure to Florida’s bad data, pulls from exactly the same data layer as the Florida DoH Dashboard. She simply pulls it into another ArcGIS portal so she can run her own dashboard without those pesky state epidemiologists telling her what to do. As a result, she shows more kinds of data, but also does things with the data that no epidemiologist would sign off on.

In summary:

  • COVID data is submitted directly from healthcare facilities to county health departments, who enter it daily into Merlin. Manipulation at this level would require corrupt testers working together with corrupt county health officials.

  • Additional admissions data is submitted directly to ESSENCE. Manipulation at this level would require doctors and nurses to submit obviously false admissions reports by the thousands.

  • the Florida state reports are drawn directly from those databases and reviewed by county epidemiologists before publication. Manipulation at this level would require corrupt state health officials colluding with corrupt county health officials and hoping that hospitals and testing facilities didn’t notice.

  • the Florida dashboard (and Rebekah Jones’ dashboard) draw directly from these state health surveillance databases. Manipulation at this level would be instantly provable simply by comparing the dashboard to the weekly reports.

This should dispel the claims of data manipulation having to do with cases, tests, hospitalizations, and all other data coming directly from medical providers. There is a tightly monitored path for that data that runs directly from the raw medical reporting to the dashboard. Every inch of that path is reviewed and monitored by multiple public health and medical professionals. Any attempt at manipulating the raw data would be visible on a facility, county, and state level. All you would have to do to prove these manipulations is compare the state report to the county report or the facility reports and point, plain as day, where the data manipulation had taken place.

No one has done that because the data has not been manipulated.

Death Data In Context

The last order of business in closing the book on claims of data manipulation is the question of COVID deaths. With the data mentioned above, all the information originated within from an interaction with a medical professional. But it is quite possible for someone to die of COVID without ever coming into contact with a medical professional. Many people die at home or in the hospital without ever taking a COVID test.  

All the pointers and articles about how Florida is undercounting COVID deaths claim that this is not only the case, but that it is systematically the case. They suggest that it is the work of ignorant or politically malicious medical examiners or elected officials who are politically disinclined to label deaths as COVID deaths.

However, if this were true at any kind of scale we would expect to see these numbers show themselves in the state’s excess death number.  “Excess Deaths” is a term used to judge the toll an epidemic against the “expected” rate of death in a state. Since death is the way of all flesh, we can expect, based on a state’s demographic profile, to see a certain number of deaths every week. This number fluctuates seasonally and we can see from the CDC that COVID has caused an enormous number of deaths above what we would expect.

Death is one of the most reliable data points we can have. It is fairly easy to have a COVID case fly under the radar by simply not getting tested and going about your life. It is slightly harder to ignore a death. The death of an individual triggers an avalanche of legal ramifications and, as such, is closely tracked by multiple government agencies. If COVID was causing a huge and unexpected spike in deaths, we would be able to see it.

However, as Nate Silver tirelessly points out, Florida’s excess deaths have not been particularly high. They are about in the middle for deaths in the US.

In fact, if we look at total deaths by age bracket, we can see that Florida has done well specifically in the age bracket of the most vulnerable, those over the age of 65.

In this chart, I’m using Hawaii as a baseline since it is the state with the fewest deaths per capita. Florida has seen higher deaths than normal (high excess deaths) but, their excess deaths are slightly below the national average, just as their COVID deaths are slightly below the national average.

There is some evidence to suggest that Florida’s official COVID deaths number is an undercount of the true number of COVID deaths. But this is neither systemic within Florida nor unique to Florida. Nearly all states have excess deaths that exceed their formal COVID death rate, indicating that there is some percentage of pandemic deaths that were not formally captured. In fact, if we compare Michigan’s official COVID death rate to their excess deaths, they have a larger percentage of “un-reported” deaths than Florida has.

The “Florida is hiding their data” people never point this out and have no explanation for it. They rely on the isolation of Florida as a purely political entity and imply that any data gap is due to intentional data malfeasance. They have no coherent narrative for why Michigan has “hidden” more deaths than Florida has. It is only by intentionally avoiding the context of state-by-state data and relying on the ignorance of their readers that their narrative survives.

Looney Tunes: Daffy Duck in Hollywood

The early intentionally wacky Daffy Duck is a world away from the later Daffy who has wackiness thrust upon him through the circumstances and his own inflated sense of self. This is one of the earliest color Daffy features and the main thing it has going for it is Daffy’s frantic nature.

Daffy infiltrates a movie studio and disrupts the star director’s efforts to shoot a high-production romance. The gags are… ok. The short suffers from the fact that the director is little more than a foil to absorb Daffy’s antics, crying out “It’s ruined” at every setback but then moving quickly on to the next gag. The best bit is when Daffy replaces the director’s reel with a nonsensical hodge-podge of contradictory live action snippets that plays as an intentionally surreal news reel.

It’s interesting to watch the old 30’s Looney Tunes simply to see how much more creative and how much cleaner and tighter the Looney Tunes became in the 40’s and 50’s.