데이터과학 삼학년

Plotly 활용한 covid-19 데이터 시각화 본문

Data Visualization & DataBase

Plotly 활용한 covid-19 데이터 시각화

Dan-k 2020. 6. 19. 15:19
반응형

covid-19 데이터를 시각화로 훑어보자

  1. 기간 : 2020.01.22~2020.03.22

  2. data : 공공데이터를 bigquery에 로드하여 분석

  3. 시각화 툴 : plotly express 를 주로 사용

import numpy as np
import pandas as pd

import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objects as go

import cufflinks as cf
cf.go_offline(connected=True)

%matplotlib inline
%load_ext google.cloud.bigquery
 
%%bigquery df
with
data as (
SELECT
  _TABLE_SUFFIX as dt,
  *
FROM 
  `data.coronavirus.csse_covid19_*` 
WHERE 
  _TABLE_SUFFIX BETWEEN '20200122' AND '20200322'

)
SELECT * FROM data 
df.head()
Out[3]:
  dt Province_State Country_Region LastUpdate Confirmed Deaths Recovered Latitude Longitude
0 20200211 Zhejiang Mainland China 2020-02-11T12:53:02 1117 0 270 0.0 0.0
1 20200211 Jiangsu Mainland China 2020-02-11T08:13:06 515 0 93 0.0 0.0
2 20200211 Fujian Mainland China 2020-02-11T14:03:05 267 0 45 0.0 0.0
3 20200211 Shaanxi Mainland China 2020-02-11T06:03:16 219 0 32 0.0 0.0
4 20200211 Yunnan Mainland China 2020-02-11T09:23:04 153 0 20 0.0 0.0

국가별 확진자수

df_nation_confirmed = df.groupby(['Country_Region']).sum()[['Confirmed','Deaths']]
df_nation_confirmed = df_nation_confirmed.reset_index()
df_nation_confirmed.head()
 
  Country_Region Confirmed Deaths
0 Azerbaijan 1 0
1 Afghanistan 249 1
2 Albania 624 17
3 Algeria 991 77
4 Andorra 423 1
fig = px.bar(df_nation_confirmed, x='Country_Region', y="Confirmed")
fig.show()
 
 

국가별 사망자수

fig = px.bar(df_nation_confirmed, x='Country_Region', y="Deaths")
fig.show()
 

국가별 확진자 수, 사망자수 현황

fig = px.scatter_mapbox(df, lat='Latitude', lon='Longitude', hover_name='Country_Region', hover_data=['Confirmed','Deaths'],
                        color_discrete_sequence=["fuchsia"], zoom=2, height=600)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
 
 

한국의 확진자 분포 확인

df_korea = df[df.Country_Region=='South Korea']
df_korea.head()
 
  dt Province_State Country_Region LastUpdate Confirmed Deaths Recovered Latitude Longitude
13 20200211 0 South Korea 2020-02-11T02:53:02 28 0 3 0.0 0.0
370 20200206 0 South Korea 2020-02-06T02:53:03 23 0 0 0.0 0.0
435 20200209 0 South Korea 2020-02-09T02:33:02 25 0 3 0.0 0.0
740 20200128 0 South Korea 1/28/20 23:00 4 0 0 0.0 0.0
1367 20200202 0 South Korea 2020-02-02T02:23:13 15 0 0 0.0 0.0
fig = px.box(df_korea, y="Confirmed")
fig.show()
 
 

일별 국가별 확진자 수

%%bigquery df_dt
with
data as (
SELECT
  _TABLE_SUFFIX as dt,
  *
FROM 
  `data.coronavirus.csse_covid19_*` 
WHERE 
  _TABLE_SUFFIX BETWEEN '20200122' AND '20200322'

)
SELECT PARSE_DATETIME('%Y%m%d',CAST(dt AS STRING)) AS dt,Country_Region, SUM(Confirmed) as Confirmed, Sum(Deaths) as Deaths
FROM data 
GROUP BY dt, Country_Region
ORDER BY Confirmed
 
df_dt.head()
 
  dt Country_Region Confirmed Deaths
0 2020-01-23 Philippines 0 0
1 2020-03-16 occupied Palestinian territory 0 0
2 2020-03-15 occupied Palestinian territory 0 0
3 2020-03-22 Guernsey 0 0
4 2020-03-21 Republic of the Congo 0 0
fig = px.bar(df_dt, x='dt',y="Confirmed",hover_name='Country_Region',color='Country_Region')
fig.show()
 

bar plot을 이용하면 일별 국가별 누적 확진자처럼 한 날짜에 카테고리별 누적값을 추가할 수 있다.

fig = px.line_3d(df_dt, x='dt', y='Country_Region', z="Confirmed",hover_name='Country_Region',color='Country_Region')
fig.show()
 

 

728x90
반응형
LIST
Comments