1. sklearn中的集成算法
sklearn中的集成算法模块ensemble2.预测代码及结果
%matplotlib inlinefrom sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import load_winefrom sklearn.model_selection import train_test_split, cross_val_scoreimport matplotlib.pyplot as plt# 获取数据集wine = load_wine()# 划分数据集x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3)# 建模clf = DecisionTreeClassifier(random_state=0) # 决策树自带随机性,如果不设置随机种子,每次运行的结果会不一样(固定生成这一颗树)rfc = RandomForestClassifier(random_state=0) # 固定生成这一片森林# 训练clf = clf.fit(x_train, y_train)rfc = rfc.fit(x_test, y_test)# 分类准确率score_c = clf.score(x_test, y_test)score_r = rfc.score(x_test, y_test)print("Single Tree:{}".format(score_c),"Random Forest:{}".format(score_r))
结果:
可以看出随机森林的表现要比决策树好。
3. 随机森林和决策树在一组交叉验证下的效果对比
# 画出随机森林和决策树在一组交叉验证下的效果对比rfc = RandomForestClassifier(n_estimators=25)rfc_s = cross_val_score(rfc, wine.data, wine.target, cv=10)clf = DecisionTreeClassifier()clf_s = cross_val_score(clf, wine.data, wine.target, cv=10)plt.plot(range(1,11), rfc_s, label = 'RandomForest')plt.plot(range(1,11), clf_s, label = 'Decision Tree')plt.legend()plt.show()
结果:
横坐标为10折交叉验证的每次交叉验证的次数,纵坐标为预测的准确率,即accuracy。
3.2 随机森林和决策树在十组交叉验证下的效果对比
# 画出随机森林和决策树在十组交叉验证下的效果对比rfc_l = []clf_l = []for i in range(10):rfc = RandomForestClassifier(n_estimators=25)rfc_s = cross_val_score(rfc, wine.data, wine.target, cv=10).mean()rfc_l.append(rfc_s)clf = DecisionTreeClassifier()clf_s = cross_val_score(clf, wine.data, wine.target, cv=10).mean()clf_l.append(clf_s)plt.plot(range(1,11), rfc_l, label = 'RandomForest')plt.plot(range(1,11), clf_l, label = 'Decision Tree')plt.legend()plt.show()
结果:
横坐标为交叉验证的组数,纵坐标为每一次10折交叉验证的结果的均值。
【PS】这是我看看sklearn菜菜的视频学习笔记~