1500字范文 > opengl 贴图教程_如何使用叶贴图可视化空气污染-深入教程

opengl 贴图教程_如何使用叶贴图可视化空气污染-深入教程

时间：2019-02-23 23:31:34

opengl 贴图教程

In my previous story on forecasting air pollution, I looked into using recurrent neural networks (RNN and LSTM) to forecast air pollution in Belgium. As a small side project, I thought it would be interesting toplot the air pollution over time on a map. The Folium package is a great tool for doing that.

在我以前关于预测空气污染的故事中，我研究了使用递归神经网络(RNN和LSTM)来预测比利时的空气污染。作为一个小型项目，我认为在地图上绘制随时间变化的空气污染会很有趣。 Folium软件包是执行此操作的绝佳工具。

We will plot the quantities of 6 air pollutants in Belgium:

我们将绘制比利时6种空气污染物的数量：

Ozone (O3)

臭氧(O3)

Nitrogen Dioxide (NO2)

二氧化氮(NO2)

Carbon Monoxide (CO)

一氧化碳(CO)

Sulphur Dioxide (SO2)

二氧化硫(SO2)

Particulate Matter (PM10)

颗粒物(PM10)

Benzene (C6H6)

苯(C6H6)

The data is downloaded from the website of the European Environment Agency (EEA). If you want to use data from other European countries, I encourage you to visit their website. It has datasets for many EU countries and is very well documented.

数据是从欧洲环境局(EEA)的网站下载的。如果您想使用其他欧洲国家/地区的数据，建议您访问他们的网站。它具有许多欧盟国家/地区的数据集，并且有据可查。

The datasets we will use are:

我们将使用的数据集是：

BE_<pollutant_ID>_–_aggregated_timeseries.csv

BE_<污染物ID>_–_aggregated_timeseries.csv

BE_–_metadata.csvBE_–_metadata.csv

The pollutant IDs are described in the EEA’s vocabulary of air pollutants.

EEA的空气污染物词汇中描述了污染物ID。

1 = Sulphur Dioxide1 =二氧化硫 5 = Particulate Matter5 =颗粒物 7 = Ozone7 =臭氧 8 = Nitrogen Dioxide8 =二氧化氮 10 = Carbon Monoxide10 =一氧化碳 20 = Benzene20 =苯

项目设置 (Project set-up)

导入包 (Importing packages)

from pathlib import Pathimport pandas as pdimport numpy as npimport seaborn as snsimport foliumfrom folium.plugins import TimestampedGeoJsonproject_dir = Path('/Users/bertcarremans/Data Science/Projecten/air_pollution_forecasting')

空气污染物 (Air pollutants)

We’ll make a dictionary of the air pollutants and their dataset number, scientific notation, name, and bin edges. The bin edges are based on the scale on this Wikipedia page.

我们将制作一个有关空气污染物及其数据集编号，科学符号，名称和垃圾箱边缘的字典。容器边缘基于此Wikipedia页面上的比例。

pollutants = {1: {'notation' : 'SO2','name' :'Sulphur dioxide','bin_edges' : np.array([15,30,45,60,80,100,125,165,250])},5: {'notation' : 'PM10','name' :'Particulate matter < 10 µm','bin_edges' : np.array([10,20,30,40,50,70,100,150,200])},7: {'notation' : 'O3','name' :'Ozone','bin_edges' : np.array([30,50,70,90,110,145,180,240,360])},8: {'notation' : 'NO2','name' :'Nitrogen dioxide','bin_edges' : np.array([25,45,60,80,110,150,200,270,400])},10: {'notation' : 'CO','name' :'Carbon monoxide','bin_edges' : np.array([1.4,2.1,2.8,3.6,4.5,5.2,6.6,8.4,13.7])},20: {'notation' : 'C6H6','name' :'Benzene','bin_edges' : np.array([0.5,1.0,1.25,1.5,2.75,3.5,5.0,7.5,10.0])}}

加载元数据 (Loading the metadata)

In the metadata, we have thecoordinatesfor every SamplingPoint. We’ll need that information to plot the SamplingPoints on the map.

在元数据中，我们具有每个SamplingPoint的坐标。我们需要该信息才能在地图上绘制采样点。

meta = pd.read_csv(project_dir / 'data/raw/BE_-_metadata.csv', sep='\t')

色标 (Color scale)

There are 10 bin edges for which we will use a different color. These colors were created with ColorBrewer.

我们将使用10种bin边缘使用不同的颜色。这些颜色是使用ColorBrewer创建的。

color_scale = np.array(['#053061','#2166ac','#4393c3','#92c5de','#d1e5f0','#fddbc7','#f4a582','#d6604d','#b2182b','#67001f'])sns.palplot(sns.color_palette(color_scale))

资料准备 (Data Preparation)

加载时间序列数据 (Loading the time series data)

We convert the date variables to datetime. That way we can easily use them later to slice the Pandas DataFrame.

我们将日期变量转换为datetime。这样，我们以后就可以轻松地使用它们来切片Pandas DataFrame。

def load_data(pollutant_ID):print('> Loading data...')date_vars = ['DatetimeBegin','DatetimeEnd']filename = 'data/raw/BE_' + str(pollutant_ID) + '_-_aggregated_timeseries.csv'agg_ts = pd.read_csv(project_dir / filename, sep='\t', parse_dates=date_vars, date_parser=pd.to_datetime)return agg_ts

数据清理 (Data cleaning)

We’ll do some basic cleaning of the data:

我们将对数据进行一些基本清理：

Keep only records with DataAggregationProcss of P1D to have daily data仅保留P1D的DataAggregationProcss记录以获取每日数据 Remove records with UnitOfAirPollutionLevel of count删除具有UnitOfAirPollution计数级别的记录 Remove variables redundant for the visualization删除可视化的冗余变量 Remove SamplingPoints which have less than 1000 measurement days删除少于1000个测量天的采样点 Insert missing dates and imputing the AirpollutionLevel with the value of the next valid date插入缺失的日期并使用下一个有效日期的值来估算AirpollutionLevel

def clean_data(df):print('> Cleaning data...')df = df.loc[df.DataAggregationProcess=='P1D', :] df = df.loc[df.UnitOfAirPollutionLevel!='count', :]ser_avail_days = df.groupby('SamplingPoint').nunique()['DatetimeBegin']df = df.loc[df.SamplingPoint.isin(ser_avail_days[ser_avail_days.values >= 1000].index), :]vars_to_drop = ['AirPollutant','AirPollutantCode','Countrycode','Namespace','TimeCoverage','Validity','Verification','AirQualityStation','AirQualityStationEoICode','DataAggregationProcess','UnitOfAirPollutionLevel', 'DatetimeEnd', 'AirQualityNetwork','DataCapture', 'DataCoverage']df.drop(columns=vars_to_drop, axis='columns', inplace=True)dates = list(pd.period_range(min(df.DatetimeBegin), max(df.DatetimeBegin), freq='D').values)samplingpoints = list(df.SamplingPoint.unique())new_idx = []for sp in samplingpoints:for d in dates:new_idx.append((sp, np.datetime64(d)))df.set_index(keys=['SamplingPoint', 'DatetimeBegin'], inplace=True)df.sort_index(inplace=True)df = df.reindex(new_idx)df['AirPollutionLevel'] = df.groupby(level=0).AirPollutionLevel.bfill().fillna(0)return df

绘制随着时间的空气污染 (Plotting air pollution over time)

Loading all of the dates for all sampling points would be too heavy for the map. Therefore, we willresamplethe data by taking the last day of each month.

加载所有采样点的所有日期对于地图来说太重了。因此，我们将采用每个月的最后一天对数据进行重新采样。

Note: The bin edges that we use in this notebook should normally be applied on (semi-)hourly averages for O3, NO2 and CO. In the datasets we are using in this notebook, we have only daily averages. As this notebook is only to illustrate how to plot time series data on a map, we will continue with the daily averages. On the EEA website, you can, however, download hourly averages as well.

注意：我们在本笔记本中使用的垃圾箱边缘通常应应用于O3，NO2和CO的(半)小时平均值。在本笔记本中使用的数据集中，我们只有每日平均值。由于本笔记本只是为了说明如何在地图上绘制时间序列数据，因此我们将继续每日平均值。但是，您也可以在EEA网站上下载每小时平均值。

def color_coding(poll, bin_edges): idx = np.digitize(poll, bin_edges, right=True)return color_scale[idx]def prepare_data(df, pollutant_ID):print('> Preparing data...')df = df.reset_index().merge(meta, how='inner', on='SamplingPoint').set_index('DatetimeBegin')df = df.loc[:, ['SamplingPoint','Latitude', 'Longitude', 'AirPollutionLevel']]df = df.groupby('SamplingPoint', group_keys=False).resample(rule='M').last().reset_index()df['color'] = df.AirPollutionLevel.apply(color_coding, bin_edges=pollutants[pollutant_ID]['bin_edges'])return df

To show the pollution evolving over time, we will use theTimestampedGeoJsonFolium plugin. This plugin requires GeoJSON input features. In order to convert the data of the dataframe, I created a small functioncreate_geojson_featuresthat does that.

为了显示污染随时间的变化，我们将使用TimestampedGeoJsonFolium插件。此插件需要GeoJSON输入功能。为了转换数据帧的数据，我创建了一个小函数create_geojson_features来实现。

def create_geojson_features(df):print('> Creating GeoJSON features...')features = []for _, row in df.iterrows():feature = {'type': 'Feature','geometry': {'type':'Point', 'coordinates':[row['Longitude'],row['Latitude']]},'properties': {'time': row['DatetimeBegin'].date().__str__(),'style': {'color' : row['color']},'icon': 'circle','iconstyle':{'fillColor': row['color'],'fillOpacity': 0.8,'stroke': 'true','radius': 7}}}features.append(feature)return features

After that, the input features are created and we can create a map to add them to. The TimestampedGeoJson plugin provides some neat options for the time slider, which are self-explanatory.

之后，将创建输入要素，我们可以创建地图以将其添加到。 TimestampedGeoJson插件为时间滑块提供了一些简洁的选项，这是不言自明的。

def make_map(features):print('> Making map...')coords_belgium=[50.5039, 4.4699]pollution_map = folium.Map(location=coords_belgium, control_scale=True, zoom_start=8)TimestampedGeoJson({'type': 'FeatureCollection','features': features}, period='P1M', add_last_point=True, auto_play=False, loop=False, max_speed=1, loop_button=True, date_options='YYYY/MM', time_slider_drag_update=True).add_to(pollution_map)print('> Done.')return pollution_mapdef plot_pollutant(pollutant_ID):print('Mapping {} pollution in Belgium in -'.format(pollutants[pollutant_ID]['name']))df = load_data(pollutant_ID)df = clean_data(df)df = prepare_data(df, pollutant_ID)features = create_geojson_features(df)return make_map(features), df

Below are the maps per air pollutant. You can click on the image to go to a web page with the interactive map. By clicking on theplaybutton, you can see the evolution of the air pollutant over time.

以下是每种空气污染物的地图。您可以单击图像以进入带有交互式地图的网页。通过单击播放按钮，您可以查看空气污染物随时间的演变。

二氧化硫 (Sulphur dioxide)

pollution_map, df = plot_pollutant(1)pollution_map.save('../output/pollution_so2.html')pollution_map

颗粒物 (Particulate matter)

pollution_map, df = plot_pollutant(5)pollution_map.save('../output/pollution_pm.html')pollution_map

The other visualizations can be found at:

其他可视化效果可以在以下位置找到：

https://bertcarremans.github.io/air_pollution_viz/pollution_c6h6.html

https://bertcarremans.github.io/air_pollution_viz/pollution_co.html

https://bertcarremans.github.io/air_pollution_viz/pollution_no2.html

https://bertcarremans.github.io/air_pollution_viz/pollution_o3.html

结论 (Conclusion)

With this story, I want to demonstrate how easy it is to visualize time series data on a map with Folium. The maps for all the pollutants and the Jupyter notebook can be found on GitHub. Feel free to re-use it to map the air pollution in your home country.

通过这个故事，我想展示使用Folium在地图上可视化时间序列数据有多么容易。可以在GitHub上找到所有污染物和Jupyter笔记本的地图。随时重新使用它来绘制您所在国家的空气污染图。