Ex2 - Filtering and Sorting Data
示例2-筛选和排序数据
This time we are going to pull data directly from the internet. 这次我们将直接从互联网上提取数据。
Step 1. Import the necessary libraries
步骤1. 导入必要的库
import pandas as pdimport numpy as np
Step 2. Import the dataset from thisaddress.
第二步从这个地址导入数据集。
Step 3. Assign it to a variable called euro12.
第三步,将它分配给一个名为 euro12的变量
euro12=pd.read_csv(r'C:\Users\HP\Desktop\Euro__stats_TEAM.csv')euro12.head()
Step 4. Select only the Goal column.
第四步,只选目标栏。
euro12['Goals']
Step 5. How many team participated in the Euro?
第五步,有多少队伍参加了欧洲杯?
euro12['Team'].nunique()
Step 6. What is the number of columns in the dataset?
第6步: 数据集中的列数是多少?
euro12.shape[1]
Step 7. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline
第七步。只查看列团队,黄牌和红牌,并将它们分配到一个称为纪律的数据框架
discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']] # 方法一 discipline.head()
discipline = euro12.loc[:, ['Team', 'Yellow Cards', 'Red Cards']]# 方法二discipline.head()
Step 8. Sort the teams by Red Cards, then to Yellow Cards
第八步,按红牌分组,然后按黄牌分组
discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending=False)
Step 9. Calculate the mean Yellow Cards given per Team
第九步,计算每队的平均黄牌数
discipline.groupby('Team').agg({'Yellow Cards': 'sum'}).mean()
Step 10. Filter teams that scored more than 6 goals
第十步,筛选进球超过6球的球队
scored= euro12['Goals'] > 6scored.head(6)
euro12.loc[scored, :]
Step 11. Select the teams that start with G
第11步,选择 G 开头的队伍
isG = euro12['Team'].str[0] == "G"isG.head()
euro12.loc[isG,:]
Step 12. Select the first 7 columns
第12步,选择前7列
euro12.iloc[:,0:7].head()
Step 13. Select all columns except the last 3.
第13步,选择除最后3列之外的所有列。
euro12.iloc[:,:-3].head()
Step 14. Present only the Shooting Accuracy from England, Italy and Russia
步骤14。只展示来自英格兰、意大利和俄罗斯的射击精度
a = (euro12['Team'] == "England") | (euro12['Team'] == "Italy") | (euro12['Team'] == "Russia")a.head()
euro12.loc[a,"Shooting Accuracy"]