Articoli (Persone, Business, Tecnologia)

Explorative Data Analysis of Covid-19 Data in Italy

Italy is, unfortunately, one of the most impacted countries by the pandemic disease, so I’m trying to find some insights in the available data about it.
I’m using data available from the official git repository of “Protezione Civile” that is updated daily, available here.The used code to perform this analysis can be found here.
If you want to run the notebook directly just follow this link: https://mybinder.org/v2/gl/acalax%2Fcovid-19-eda-italy/master?filepath=eda_italy.ipynb
The notebook will download the data, so you can always have updated graphs!
A disclaimer: I’m not attempting to predict or model anything, just looking at the data and build some graphs accordingly.
The analysis covers the whole country, but can be easily extended at regions and provinces too, just adapting the code.
Let’s begin.

The data

All the info about the data can be found in the README.md of the repository but can help to clarify that some of the features are correlated
Some are the sum of other ones, and nearly all the values are cumulative day after day.
In the code, I’m using the feature names in Italian without translating them
Let’s see in detail:
A person found positive can be in one of this condition:
  • ricoverati_con_sintomi (Hospitalised patients with symptoms)
  • terapia_intensiva (Intensive Care)
Their sum is totale_ospedalizzati (Total hospitalized patients)
  • isolamento_domiciliare (Home confinement)
The total sum is totale_positivi (Total amount of current positive cases)
The outcome can be:
  • dimessi_guariti (Recovered)
  • deceduti (Death)
The sum of all positives and the outcomes are total_casi (total amount of positive cases)
The counter of tests
  • tamponi (Test performed)
A new field was added yesterday, “nuovi_positivi”, with the daily new cases but I derive all the differences directly working on shifting and subtracting the data
Let’s start with a couple of graphs showing the overall situation of the total cases.
The images are related to data updated on 1 April.

and the outcomes respect the positive cases


This is how a pandemic looks like, with a huge growth of the positive cases and the relative outcomes, with deaths fortunately below the recovered.
Now, let’s see a breakdown of the cumulative counts for positive cases


Home confinements outmatch hospitalized after the 20th March, while intensive care grows.
Let’s see how the tests and the positive are related

Tests and positives are growing, with the hope the positives will “plateau”
Let’s derive now a couple of new features, the ratio between deaths and total cases (deaths / total positives) and the ratio between recovered and total positives (recovered / total positives)


This is one of the most “strange” trends, because deaths and recoveries are quite close while in other countries the gap is more significant.
Let’s perform now analysis of the delta day per day, meaning the difference in value between a day and the day before. This gives better insights into what happened and the current trends.


Hopefully, it seems a downward trend is happening
The breakdown of total cases


There are a home confinements huge spike on the 20th March and a bottom the same day for hospitalized

This is the delta outcomes breakdown
There are high values for recovered and the trend seems good.
Let’s see how tests and hospitalized are trending
Intensive care is a tiny fraction, let’s see more in detail

The trend seems good, hopefully because is less necessary.
Finally, a statistical analysis of all the relevant deltas.


This is what describes best what happened.
Adding the tests shrink the others data, but gives the idea of the effort performed, with more than 35K tests performed in a single day as max number (26/03)


Conclusion
These are just simple examples of possible reports that hope can help to comprehend more easily the magnitude of what is happening.
Personally I don’t have a clear insight, even because how the data is collected could affect the outcomes and because in this dataset, currently, there are missing features that could help, like statistics about age intervals of positives, recovers and deaths.
But I’m thinking too that use what is available is better than nothing, especially when we just use “as is”, without guessing anything.
Last thought: numbers can be incredibly aseptic but we must not forget they are involving lives, people that are suffering inside hospitals, families mourning beloved ones, persons in a strict lockdown waiting to know if they are sick or not.
And then there is everybody else, myself included, not captured by any data, but still living through this. We are really in this together and working together is what we need to overcome this situation.
Stay safe!
---------
Sono un Coach specializzato e IT Mentor, con 25 anni di esperienza nel settore IT. Se vuoi migliorare la parte Tech della tua Azienda o migliorare te stesso/a, sono qui per supportarti. Scopriamo insieme come
Tecnologia
Made on
Tilda