1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > 用Python+matplotlib足球运动员的射门数据可视化(绘制散点图)

用Python+matplotlib足球运动员的射门数据可视化(绘制散点图)

时间:2024-02-17 10:34:37

相关推荐

用Python+matplotlib足球运动员的射门数据可视化(绘制散点图)

射门数据的可视化,本质上就是散点图,只是点的大小按期望进球值(预测进球概率)变化,提高了直观性和可视性。

一、联赛数据网

足球运动员的射门数据来自,进入主页,搜索姆巴佩“Mbappe”(见图1)。

图1联赛数据网主页搜索

进入基利安·姆巴佩(Kylian Mbappé)页面,姆巴佩的player_id=3423,所以他的页面网址是/player/3423。/网站提供自/赛季至现在的联赛数据(爬取网页为/player/{player_id},其中C罗的player_id为2371,梅西的player_id为2097,内马尔的player_id为2099,姆巴佩player_id为3423),包括射门位置(X, Y)、预期进球(进球概率)(xG)、射门结果(result)、射门方式(shotType)、赛季(season)。

射门结果(result)包括:被截(被球员拦截)、进球、射偏、救球(被守门员扑救)、柱射(射在门柱上)。

射门类型(shotType)包括:头球射门、左脚射门、右脚射门及身体其他部位射门。

射门结果Result分为五种:1)Goal(进球);2)Shoton post(射在门柱上);3)Savedshot(守门员守住了);4)Blockedshot(被拦截);5)Missedshot(射偏)。

姆巴佩的数据从/赛季开始,目录是、赛季(见图2)。

图2Kylian Mbappé页面

二、网页分析

单击鼠标右键查看原代码,发现有多个超长字符串变量在<script>...</script>标签中。

按顺序第四个<script>是射门数据(见图3)。

图3 页面代码(局部)

要抓取的是

<script>

var shotData = JSON.parse('...')

</script>

结构中引号中的内容。内容为JSON结构数据,注意:JSON是字符串形式,尽管很像字典,但不是Python字典,对Python就是字符串,但可以用json模块进行转换。

json.loads()==>将JSON字符串转为字典或字典列表

json.dumps()==>将字典或字典列表转为JSON字符串

JSON可以有两种表示结构:对象和数组

对象结构以"{"大括号开始,以"}"大括号结束。中间部分由以","来分割开键值对(key/value)代码表示如下:

{

key1:value1,

key2:value2,

...

}

其中:关键字需要是不变类型,比如:字符串;而值可以是其他任何数据,比如:字符串,数值,布尔值,对象或者是null。

数组结构以"["方括号开始,"]"方括号结束。中间部分用","分割对象。代码表示如下:

[

{

key1:value1,

key2:value2

},

{

key3:value3,

key4:value4

}

]

可用用Python的以字典为元素的列表表示(Python二维数据)。

三、数据提取与解码

本次爬取的网页用的是JSON数组结构,转换成Python结构后为列表,元素为字典。

截取变量中的头尾两小节数据(C罗的数据),列于下面作前期分析,从数据看是字符串形式的Python单字节十六进制数(十进制值大于32且小于128,ASCII码)+数据,需先转化为Python字节流,再解码为JSON串,然后用json.loads()转换为Python字典列表。

>>> a = r'\x5B\x7B\x22id\x22\x3A\x2232535\x22,\x22minute\x22\x3A\x2218\x22,\x22result\x22\x3A\x22SavedShot\x22,\x22X\x22\x3A\x220.845\x22,\x22Y\x22\x3A\x220.49900001525878906\x22,\x22xG\x22\x3A\x220.06659495085477829\x22,\x22player\x22\x3A\x22Cristiano\x20Ronaldo\x22,\x22h_a\x22\x3A\x22h\x22,\x22player_id\x22\x3A\x222371\x22,\x22situation\x22\x3A\x22SetPiece\x22,\x22season\x22\x3A\x22\x22,\x22shotType\x22\x3A\x22RightFoot\x22,\x22match_id\x22\x3A\x225834\x22,\x22h_team\x22\x3A\x22Real\x20Madrid\x22,\x22a_team\x22\x3A\x22Cordoba\x22,\x22h_goals\x22\x3A\x222\x22,\x22a_goals\x22\x3A\x220\x22,\x22date\x22\x3A\x22\x2D08\x2D25\x\x3A00\x3A00\x22,\x22player_assisted\x22\x3A\x22Luka\x20Modric\x22,\x22lastAction\x22\x3A\x22Pass\x22\x7D,\x7B\x22id\x22\x3A\x2242\x22,\x22minute\x22\x3A\x2223\x22,\x22result\x22\x3A\x22SavedShot\x22,\x22X\x22\x3A\x220.885\x22,\x22Y\x22\x3A\x220.5\x22,\x22xG\x22\x3A\x220.7612988352775574\x22,\x22player\x22\x3A\x22Cristiano\x20Ronaldo\x22,\x22h_a\x22\x3A\x22h\x22,\x22player_id\x22\x3A\x222371\x22,\x22situation\x22\x3A\x22Penalty\x22,\x22season\x22\x3A\x22\x22,\x22shotType\x22\x3A\x22RightFoot\x22,\x22match_id\x22\x3A\x2215790\x22,\x22h_team\x22\x3A\x22Juventus\x22,\x22a_team\x22\x3A\x22Inter\x22,\x22h_goals\x22\x3A\x223\x22,\x22a_goals\x22\x3A\x222\x22,\x22date\x22\x3A\x22\x2D05\x2D15\x\x3A00\x3A00\x22,\x22player_assisted\x22\x3Anull,\x22lastAction\x22\x3A\x22Standard\x22\x7D\x5D'

>>> b = eval("b'" + a + "'") # 将字符串放入b'...'中,用eval()转换为字节流

>>> b

b'[{"id":"32535","minute":"18","result":"SavedShot","X":"0.845","Y":"0.49900001525878906","xG":"0.06659495085477829","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"SetPiece","season":"","shotType":"RightFoot","match_id":"5834","h_team":"RealMadrid","a_team":"Cordoba","h_goals":"2","a_goals":"0","date":"-08-2519:00:00","player_assisted":"Luka Modric","lastAction":"Pass"},{"id":"42","minute":"23","result":"SavedShot","X":"0.885","Y":"0.5","xG":"0.7612988352775574","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"Penalty","season":"","shotType":"RightFoot","match_id":"15790","h_team":"Juventus","a_team":"Inter","h_goals":"3","a_goals":"2","date":"-05-1516:00:00","player_assisted":null,"lastAction":"Standard"}]'

>>> type(b) # 测试结果为字节流

<class 'bytes'>

>>> b.decode()# decode()解码为字符串,因为是ASCII码所有编码都兼容

'[{"id":"32535","minute":"18","result":"SavedShot","X":"0.845","Y":"0.49900001525878906","xG":"0.06659495085477829","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"SetPiece","season":"","shotType":"RightFoot","match_id":"5834","h_team":"RealMadrid","a_team":"Cordoba","h_goals":"2","a_goals":"0","date":"-08-2519:00:00","player_assisted":"LukaModric","lastAction":"Pass"},{"id":"42","minute":"23","result":"SavedShot","X":"0.885","Y":"0.5","xG":"0.7612988352775574","player":"CristianoRonaldo","h_a":"h","player_id":"2371","situation":"Penalty","season":"","shotType":"RightFoot","match_id":"15790","h_team":"Juventus","a_team":"Inter","h_goals":"3","a_goals":"2","date":"-05-1516:00:00","player_assisted":null,"lastAction":"Standard"}]'

其中重要数据包含射门位置(X、Y)、预期进球(xG)、射门结果(result)、赛季(season)。预期进球即预测进球概念,xG=1则100%进球,X、Y为相对值,值介于0~1,matplotlib绘图则是0~100,所以要放大100倍,result=Goal为进球,season=表示/赛季。

>>> import json # 导入json模块

>>> json.loads(b.decode())# JSON数据转换为字典列表

[{'id':'32535', 'minute': '18', 'result': 'SavedShot', 'X': '0.845', 'Y':'0.49900001525878906', 'xG': '0.06659495085477829', 'player': 'Cristiano Ronaldo','h_a': 'h', 'player_id': '2371', 'situation': 'SetPiece', 'season': '','shotType': 'RightFoot', 'match_id': '5834', 'h_team': 'Real Madrid', 'a_team':'Cordoba', 'h_goals': '2', 'a_goals': '0', 'date': '-08-25 19:00:00','player_assisted': 'Luka Modric', 'lastAction': 'Pass'}, {'id': '42','minute': '23', 'result': 'SavedShot', 'X': '0.885', 'Y': '0.5', 'xG':'0.7612988352775574', 'player': 'Cristiano Ronaldo', 'h_a': 'h', 'player_id':'2371', 'situation': 'Penalty', 'season': '', 'shotType': 'RightFoot','match_id': '15790', 'h_team': 'Juventus', 'a_team': 'Inter', 'h_goals': '3','a_goals': '2', 'date': '-05-15 16:00:00', 'player_assisted': None,'lastAction': 'Standard'}]

>>> json.loads(b) # 其实不解码也能转换为字典列表

[{'id':'32535', 'minute': '18', 'result': 'SavedShot', 'X': '0.845', 'Y':'0.49900001525878906', 'xG': '0.06659495085477829', 'player': 'CristianoRonaldo', 'h_a': 'h', 'player_id': '2371', 'situation': 'SetPiece', 'season':'', 'shotType': 'RightFoot', 'match_id': '5834', 'h_team': 'Real Madrid','a_team': 'Cordoba', 'h_goals': '2', 'a_goals': '0', 'date': '-08-2519:00:00', 'player_assisted': 'Luka Modric', 'lastAction': 'Pass'}, {'id':'42', 'minute': '23', 'result': 'SavedShot', 'X': '0.885', 'Y': '0.5', 'xG':'0.7612988352775574', 'player': 'Cristiano Ronaldo', 'h_a': 'h', 'player_id':'2371', 'situation': 'Penalty', 'season': '', 'shotType': 'RightFoot','match_id': '15790', 'h_team': 'Juventus', 'a_team': 'Inter', 'h_goals': '3','a_goals': '2', 'date': '-05-15 16:00:00', 'player_assisted': None,'lastAction': 'Standard'}]

>>> type(json.loads(b)) # 结果为列表

<class 'list'>

好了!有了上面的分析和基础知识后,就要开始爬网页,爬网页用requests模块的get()方法,从网页中提取<script>...</script>标签的内容用BeautifulSoup4模块的BeautifulSoup类的find_all()方法。

四、matplotlib中的绘制散点图——scatter()方法

pyplot模块中的scatter()函数用于绘制散点图,其语法格式如下:

matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, camp=None,

norm=None, vmin=None,vmax=None,alpha=None, linewidths=None,

verts=None, edgecolors=None, hold=None,data=None,**kwargs)

式中常用的参数含义如下:

x,y:表示 x 轴和 y 轴对应的数据。

s:指定点的大小。若传入的是一维数组,则表示每个点的大小。

c:指定散点的颜色,若传入的是一维数组,则表示每个点的颜色。

marker:表示绘制的散点类型(控制点的形状),见表1。

alpha:控制点的透明度,接受0~1之间的小数。在数据量大的时候设置较小的alpha值,然后调整一下s值,这样产生重叠效果使得数据的聚集特征会很好地显示出来。

cmap:调整渐变色或者颜色列表的种类。

表1marker设置与对应符号及说明

五、完整代码

完整代码如下:

############################################## 设计 Zhang Ruilin 创建 -01-10 18:35 ## 修订 -12-28 10:13 ## Matplotlib 绘制足球运动员的射门数据分布图 ##############################################import requests# 爬网页工具from bs4 import BeautifulSoup# 分析网页、提取信息工具import json# JSON转字典、字典转JSONimport pandas as pd# 大数据处理工具import matplotlib.pyplot as plt# 类似matlab的绘图工具包import numpy as np# 科学计算数学函数库import matplotlib as mplimport mplsoccer# 绘制足球场工具# 基利安·姆巴佩(Kylian Mbappé)的player-id为3423url = '/player/3423'# 请求数据html = requests.get(url)# 爬取网页# 解析处理数据soup_parse = BeautifulSoup(html.content, 'lxml')# 提取内容scripts = soup_parse.find_all('script')# 查找script标签返回一个列表类型 strings = scripts[3].string# 取含shotsData变量的结果,转字符串_start = strings.index("('")+2# 起点为JSON.parse('后的字符_end = strings.index("')")# 终止为\x5D')的'前,不含“'”json_data = strings[_start:_end]# 截取变量中''之间部分(JSON数据)json_data = eval("b'"+json_data+"'")# 将十六进制字符串\xYY转为字节流data = json.loads(json_data)# 转换为字典列表# 处理数据, 包含射门位置(X,Y)、预期进球(xG)、射门结果(result)、赛季(season)x, y, xg, result, season = [], [], [], [], []for _dic in data:# 提取X、Y、xG、result、seasonx.append(_dic['X'])y.append(_dic['Y'])xg.append(_dic['xG'])result.append(_dic['result'])season.append(_dic['season'])columns = ['X', 'Y', 'xG', 'Result', 'Season']df_data = pd.DataFrame([x, y, xg, result, season], index=columns)df_data = df_data.T # 对数据进行行列交换(转置)df_data = df_data.apply(pd.to_numeric, errors='ignore')# 将数值字符串转换为数值型df_data['X'] = df_data['X'].apply(lambda x: x*100)# 放大100倍,得到最终结果df_data['Y'] = df_data['Y'].apply(lambda x: x*100)# 原数据为相对数据0~1# df_data.to_csv(r'd:/Mbappé_shooting.csv')# 保存为文件background, text_color = 'lightgray', 'black'# 定义背景色(浅灰色)、文字色(黑色)mpl.rcParams['text.color'] = text_color# 设置文字颜色mpl.rcParams['font.sans-serif'] = ['simsun']# 设置默认字体为宋体mpl.rcParams['legend.fontsize'] = 15# 图例字号15磅fig, ax = plt.subplots(figsize=(7, 5.6))# 新建画布7×5.6英寸ax.axis('off')# 关闭坐标轴(不显示坐标轴)fig.set_facecolor(background)# 用背景色填充pitch = mplsoccer.VerticalPitch(half=True, pitch_type='opta', line_zorder=3,pitch_color='grass')# 画垂直方向半个足球场axes = fig.add_axes((0.05, 0.06, 0.9, 0.9))# 绘图范围。左下角(0.05, 0.06),axes.patch.set_facecolor(background)# ↑宽、高各为90%pitch.draw(ax=axes)season=# 设置赛季。范围~运行年-1df = df_data.loc[df_data['Season'] == season]# 筛选指定赛季数据# 某赛季, 球员射门位置未得分散点图(df['Result']!='Goal'), 青色,透明度0.5pitch.scatter(df[df['Result'] != 'Goal']['X'], df[df['Result'] != 'Goal']['Y'],s=np.sqrt(df[df['Result'] != 'Goal']['xG'])*100, marker='o', alpha=0.5,edgecolor='black', facecolor='cyan', ax=axes, label='未进球')# 某赛季, 球员射门位置得分散点图(df['Result']=='Goal'), 深红色,透明度0.7pitch.scatter(df[df['Result'] == 'Goal']['X'], df[df['Result'] == 'Goal']['Y'],s=np.sqrt(df[df['Result'] == 'Goal']['xG'])*100,marker='o', alpha=0.7,edgecolor='black', facecolor='crimson', ax=axes, label='进球得分')axes.legend(loc='lower right')# 添加图例# 输出文字axes.text(25, 64, f"预期进球:{sum(df['xG']):.2f}", weight='bold', size=14)# 期望进球df['xG']之和axes.text(25, 61, f"得分次数:{len(df[df['Result'] == 'Goal'])}",weight='bold', size=14)# 条件df['Result'] == 'Goal'的行数axes.text(25, 58, f"射门次数:{len(df)}", weight='bold', size=14)# 本赛季数据行数axes.text(95, 60, f'{season}-{season+1}赛季', weight='bold', size=18)plt.show()

执行结果如图4所示。

图4Kylian Mbappé射门位置分布图

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。