The COVID-19 pandemic tested healthcare systems, governments, and people around the world, but it also tested data science and AI. Visualization dashboards proliferated, the number of cases updated hourly, and predictive models abounded.
Everyone wondered if the models could predict what was happening. The problem was that the data was messy. Reporting was inconsistent, there were lags in reporting, there were geospatial disparities, and much uncertainty.
The pandemic became a real-world natural experiment that showed us that AI was only as good as the data it was trained on. And when the data became messy, AI showed its vulnerabilities.
Why the pandemic data was so messy
The pandemic data was messy because it was not collected in a controlled environment like a research study. Instead, it was collected during a pandemic. Different countries had different definitions of cases, the availability of testing changed from week to week, different hospital systems reported cases at different frequencies, and some datasets were reported on a lag.
It was like trying to build a puzzle with half the pieces. The data was not bad, but the data was also not perfect.
What the pandemic taught us about better training of AI models
One of the lessons the pandemic taught us is that we should not train AI models on raw data. When the data was messy, incomplete, and inconsistent, the AI models trained on that data often reflected that messiness.
This is not a problem with AI. This is a reminder that AI is a reflection of what we train it on. If the data is uncertain, biased, or missing, so will the AI model be. Therefore, training AI models is not just a matter of throwing more data at the model. It is also a matter of thinking carefully about the data itself.
Because of the pandemic, researchers and developers had to take a step back and examine the data pipeline. Was the data representative? Were there geographic disparities in the data? Were the reporting mechanisms consistent enough to analyze over time?
These questions became more relevant than the models themselves. My conclusion is simple. The pandemic taught us that bigger AI models do not always make for better AI. Sometimes, better AI comes from thinking more carefully about the data.
Why Context Is as Important as the Data Itself
Data is compelling, but without context, it can be misleading. Throughout the pandemic, we were bombarded with numbers of cases, tests, and hospitalizations. But those numbers weren’t always comparable across different areas.
For example, an increase in reported cases might mean there was an outbreak, or it might mean that more testing was done that week. A relatively flat data set might indicate that cases were truly stable, or it might mean that cases weren’t being reported as quickly or that the data was lagging. Data often doesn’t tell the whole story on its own.
The same issue applies to machine learning models. If they are trained only on the data without any understanding of how the data was gathered, they can recognize patterns that a human would see as anomalies.
It’s similar to reading a headline without the rest of the article, you get some information, but not the full story. The pandemic underscored for many machine learning practitioners that context isn’t a “nice to have”, it’s a necessary input for machine learning models that truly understand what they are looking at.

What the Pandemic Taught AI Practitioners
The pandemic changed the way that practitioners, researchers, and businesses approach machine learning. When machine learning models were trained on fast-changing, imperfect data sets, it became clear that the path to better machine learning wasn’t just through better algorithms, it was also about questioning the data itself.
Where is this data coming from? Is it complete? Are there any areas or communities that it’s missing? Practitioners started asking these questions more often and earlier in the process than they had before the pandemic.
In a lot of ways, this is a good thing. The experience pushed the machine learning community toward data validation, greater transparency, and a healthy dose of skepticism when a data set looks too good to be true.
If there’s one overall lesson here, it’s that in a crisis, flaws in a system are revealed, and once they’re visible, people can begin to address them. Today, better machine learning is less about keeping up with the hype and more about recognizing the realities behind the data.
Better AI Starts With Better Thinking About Data
The pandemic has imparted many lessons, but for practitioners building and using AI today, one of them is clear: the quality of thinking behind the data is just as important as the technology used to analyze it. Covid-19 showed that data could be incomplete, lagging, biased, or misinterpreted, and when that happens, even the best algorithms can fail.
That doesn’t mean that AI has failed, it’s just a reminder that good systems begin with good questions about the data itself. In my opinion, the key to better AI in the future won’t just be better or bigger models or faster computers, but people learning to approach data with a bit more curiosity, skepticism, and common sense.



