COVID-19 & Data Science

CSU Chico, Math 485 - Blog Assignment

Image credit: **Alissa Eckert, MS; Dan Hgins, MAMSig

The global response to the COVID-19 pandemic highlights the ability of data scientists to provide information and knowledge at a global scale in real time. The world we live in today is more connected than it has ever been and data flows instantaneously, creating a global stream of conciousness that any individiual may tap into at any time.

The difficulty is sorting through the mass amount of data available and understanding how or why some piece of information is pertinent to you at any given time. In the case of the COVID-19 virus, reports were coming out at by individuals in various corners of the internet, before any public confirmation was made by large institutions and governments.

In Decemeber 2019, an American highschool student was aware of the growing outbreak and created a website to track the on going situation. 1 The difficulty of finding unbiased facts in a simple format inspired him to act. He developed the website, https://ncov2019.live/ , to aggregate the data he was looking for.

…I noticed that it was really hard just to find the information. And there was a lot of just misinformation spreading. So I decided it would be kind of cool to create a website and just kind of make it like a central hub of information.
~ Avi Schiffmann

The data is generated by scraping the CDC and WHO websites and aggregates them into tables by country. The most important numbers (confirmed cases, deaths, recovered) are displayed in a mobile friendly format. The website is still growing with 30 million views to date.

Data scientists are rallying to help with the global COVID-19 pandemic response. Recently, an open source global COVID-19 dataset was made available containing over 30,000 research articles, with the the aim of aiding the medical communities’ ability to fight the pandemic 2. Kaggle announced a data-science challenge offering $1,000 rewards for those competing in the challenge's 10 different task areas 3. The potential of modern methods such as natural language processing when applied to the massive amount of scholarly research is being put to the test like never before and one can only hope the efforts bear fruit.

One thing this pandemic is making abundantly clear… we must share what we learn with others. A good data scientist helps us understand the world better and gives us the tools to do so ourselves. Robert Wood has done just that with an excellent in depth tutorial detailing how to work with COVID-19 data in R and produce meaningful insights through visualizations 4. Days Since Outbreak In this graph he models the number of confirmed casess against the comparative timelines of the U.S., Italy, and South Korea. In his tutorial he walks through the entire process while telling a story with the data and providing the code snippets to create the graph.

The social aspect of COVID-19 is not to be ignored, biologically or anthropogenically. It has gone viral, in every sense of the word. Social media facilitates the spread and knowledge of information, with twitter at the head of the pack. A team of researchers has been cataloging tweets related to the coronavirus and has made them publically available on github 5. In collaboration with a graduate level politcal science class the MATH485 class will be working with this data set for using natural language processing and sentiment analsysis to see how the pandemic is evolving in the twitter-sphere.

Ideally, this project will develop the skills required to work on more advanced projects such as investigating how landscape level factors may influence the spread of the pandemic.


  1. https://www.democracynow.org/2020/3/13/meet_the_17_year_old_behind ↩︎

  2. https://pages.semanticscholar.org/coronavirus-research ↩︎

  3. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge ↩︎

  4. https://towardsdatascience.com/an-in-depth-analysis-of-the-global-pandemic-of-covid-19-bd5a3bdea155 ↩︎

  5. https://arxiv.org/abs/2003.07372 ↩︎

Avatar
Irfan Ainuddin
Graduate Student

My research interests include soil genesis, soil mapping, soil education and outreach, soil fertility and nitrogen management.