1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > 第4章 最基础的分类算法-k近邻算法 kNN 学习笔记 中

第4章 最基础的分类算法-k近邻算法 kNN 学习笔记 中

时间:2023-04-16 16:55:13

相关推荐

第4章 最基础的分类算法-k近邻算法 kNN 学习笔记 中

目录

4-5 超参数05-Hyper-Parameters

4-6 网格搜索与k近邻算法中更多超参数

4-5 超参数05-Hyper-Parameters

random_state=666 随机种子,保证每次运行的结果一样

best_score = 0.0best_k = -1for k in range(1, 11):knn_clf = KNeighborsClassifier(n_neighbors=k)knn_clf.fit(X_train, y_train)score = knn_clf.score(X_test, y_test)if score > best_score:best_k = kbest_score = scoreprint("best_k =", best_k)print("best_score =", best_score)

如果最好的值在边界上,则有可能好的值在边界外面,如果是10,则要对10以上的一些数计算

只计了投票数,没有权重,近的则权重大一点,比较合理

权重是距离的倒数

各有一票,则是平票, 解决平票的情况

sklearn.neighbors.KNeighborsClassifier — scikit-learn 1.0 documentation

官方文档的说明

best_score = 0.0best_k = -1best_method = ""for method in ["uniform", "distance"]:for k in range(1, 11):knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)knn_clf.fit(X_train, y_train)score = knn_clf.score(X_test, y_test)if score > best_score:best_k = kbest_score = scorebest_method = methodprint("best_method =", best_method)print("best_k =", best_k)print("best_score =", best_score)

()----》| |

有一定的一致性两者在数学上,对其进行推广

p = 1为莫达顿距离, 2为欧拉距离 又是一个超参数

best_score = 0.0best_k = -1best_p = -1for k in range(1, 11):for p in range(1, 6):knn_clf = KNeighborsClassifier(n_neighbors=k, weights="distance", p=p)knn_clf.fit(X_train, y_train)score = knn_clf.score(X_test, y_test)if score > best_score:best_k = kbest_p = pbest_score = scoreprint("best_k =", best_k)print("best_p =", best_p)print("best_score =", best_score)

distance和p有关,而uniform则和p无关

4-6 网格搜索与k近邻算法中更多超参数

param_grid = [{'weights': ['uniform'], 'n_neighbors': [i for i in range(1, 11)]},{'weights': ['distance'],'n_neighbors': [i for i in range(1, 11)], 'p': [i for i in range(1, 6)]}]

uniform 10

weights 10*5=50

数组,里面是字典,定义探索参数的集合

knn_clf = KNeighborsClassifier()

10+50= 60种不同的结果

两次运行weights可以不同,因为使用的CV交叉验证,这个和算法有关

n_jobs指定使用的计算机核数,并行运算,-1使用所有的核

运行没有什么输出, verbose越大则输出的信息越详细,输出的信息就是使用verbose的意义

鸢尾花的分类案例

import seaborn as snsfrom matplotlib.colors import ListedColormapimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import datasetsfrom sklearn.neighbors import KNeighborsClassifieriris = datasets.load_iris()X = iris.data[:,:2]# X = iris.datay = iris.targetfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.15, random_state = 6)# Create color mapscmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])cmap_bold = ['darkorange', 'c', 'darkblue']h = .02 # step size in the meshdef drawBoundary(knn_clf,n_neighbors,weights):# Plot the decision boundary. For that, we will assign a color to each# point in the mesh [x_min, x_max]x[y_min, y_max].x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))Z = knn_clf.predict(np.c_[xx.ravel(), yy.ravel()])# Put the result into a color plotZ = Z.reshape(xx.shape)plt.figure(figsize=(8, 6))plt.contourf(xx, yy, Z, cmap=cmap_light)# plt.contour(xx, yy, Z, cmap=cmap_light)#Plot also the training pointssns.scatterplot(x=X[:, 0], y=X[:, 1], hue=iris.target_names[y],palette=cmap_bold, alpha=1.0, edgecolor="black")plt.xlim(xx.min(), xx.max())plt.ylim(yy.min(), yy.max())plt.title("3-Class classification (k = %i, weights = '%s')"% (n_neighbors, weights))plt.xlabel(iris.feature_names[0])plt.ylabel(iris.feature_names[1])plt.show() # 当有多个图片要显示时只能一张显示后关了才能显示第二张# 自己实现的网格搜索best_score = 0.0best_k = -1best_method = ""for method in ["uniform", "distance"]:for k in range(1, 18):knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)knn_clf.fit(X_train, y_train)score = knn_clf.score(X_test, y_test)if score > best_score:best_k = kbest_score = scorebest_method = method# drawBoundary(knn_clf, best_k, best_method)# 如果这个绘制函数放在drawBoundary函数里当有多个图片要显示时只能一张显示后关了才能显示第二张# 但把这句放在下面这儿就不会# plt.show() # 在drawBoundary后面一定要有这句不然图像绘不出来,单步调试时也只会显示一部分,但程序运行完后就不显示print("best_method =", best_method)print("best_k =", best_k)print("best_score =", best_score)# 采用系统自带的网格搜索param_grid = [{'weights': ['uniform'],'n_neighbors': [i for i in range(1, 18)]},{'weights': ['distance'],'n_neighbors': [i for i in range(1, 18)],'p': [i for i in range(1, 6)]}]from sklearn.model_selection import GridSearchCVclf = KNeighborsClassifier()clf.fit(X_train, y_train)grid_srearch = GridSearchCV(clf, param_grid, n_jobs = -1, verbose = -1)grid_srearch.fit(X_train, y_train)print(10*"-------------")print("best:%f using %s" % (grid_srearch.best_score_,grid_srearch.best_params_))# print(grid_srearch.best_params_['n_neighbors'])# print(grid_srearch.best_params_['weights'])# print(grid_srearch.best_estimator_)# means = grid_srearch.cv_results_['mean_test_score']# params = grid_srearch.cv_results_['params']## for mean, param in zip(means,params):#print("%f with: %r" % (mean,param))drawBoundary(grid_srearch.best_estimator_, grid_srearch.best_params_['n_neighbors'], grid_srearch.best_params_['weights'])

pandas读取数据

pandas在excel中读取的数据类型与numpy的数据类型是不一样

pandas是DataFrame,numpy是array

excel表格数据

其他超参数

sklearn.neighbors.DistanceMetric — scikit-learn 1.0 documentation

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。