1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > 应用预测建模第六章线性回归习题6.1【主成分分析 模型的最优参数选择与模型对比

应用预测建模第六章线性回归习题6.1【主成分分析 模型的最优参数选择与模型对比

时间:2022-08-24 07:50:31

相关推荐

应用预测建模第六章线性回归习题6.1【主成分分析 模型的最优参数选择与模型对比

模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网

语言:R语言

参考书:应用预测建模 Applied Predictive Modeling () by Max Kuhn and Kjell Johnson,林荟等译

案例:

( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?

( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?

( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?

( e)解释你将使用哪个模型来预测样品的脂肪含量。

载入数据

library(caret)#载入数据data(tecator)head(absorp)head(endpoints)

> #载入数据> data(tecator)> head(absorp)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11][1,] 2.61776 2.61814 2.61859 2.61912 2.61981 2.62071 2.62186 2.62334 2.62511 2.62722 2.62964[2,] 2.83454 2.83871 2.84283 2.84705 2.85138 2.85587 2.86060 2.86566 2.87093 2.87661 2.88264[3,] 2.58284 2.58458 2.58629 2.58808 2.58996 2.59192 2.59401 2.59627 2.59873 2.60131 2.60414[4,] 2.82286 2.82460 2.82630 2.82814 2.83001 2.83192 2.83392 2.83606 2.83842 2.84097 2.84374[5,] 2.78813 2.78989 2.79167 2.79350 2.79538 2.79746 2.79984 2.80254 2.80553 2.80890 2.81272[6,] 3.00993 3.01540 3.02086 3.02634 3.03190 3.03756 3.04341 3.04955 3.05599 3.06274 3.06982[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22][1,] 2.63245 2.63565 2.63933 2.64353 2.64825 2.65350 2.65937 2.66585 2.67281 2.68008 2.68733[2,] 2.88898 2.89577 2.90308 2.91097 2.91953 2.92873 2.93863 2.94929 2.96072 2.97272 2.98493[3,] 2.60714 2.61029 2.61361 2.61714 2.62089 2.62486 2.62909 2.63361 2.63835 2.64330 2.64838[4,] 2.84664 2.84975 2.85307 2.85661 2.86038 2.86437 2.86860 2.87308 2.87789 2.88301 2.88832[5,] 2.81704 2.82184 2.82710 2.83294 2.83945 2.84664 2.85458 2.86331 2.87280 2.88291 2.89335[6,] 3.07724 3.08511 3.09343 3.10231 3.11185 3.12205 3.13294 3.14457 3.15703 3.17038 3.18429[,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33][1,] 2.69427 2.70073 2.70684 2.71281 2.71914 2.72628 2.73462 2.74416 2.75466 2.76568 2.77679[2,] 2.99690 3.00833 3.01920 3.02990 3.04101 3.05345 3.06777 3.08416 3.10221 3.12106 3.13983[3,] 2.65354 2.65870 2.66375 2.66880 2.67383 2.67892 2.68411 2.68937 2.69470 2.70012 2.70563[4,] 2.89374 2.89917 2.90457 2.90991 2.91521 2.92043 2.92565 2.93082 2.93604 2.94128 2.94658[5,] 2.90374 2.91371 2.92305 2.93187 2.94060 2.94986 2.96035 2.97241 2.98606 3.00097 3.01652[6,] 3.19840 3.21225 3.22552 3.23827 3.25084 3.26393 3.27851 3.29514 3.31401 3.33458 3.35591[,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44][1,] 2.78790 2.79949 2.81225 2.82706 2.84356 2.86106 2.87857 2.89497 2.90924 2.92085 2.93015[2,] 3.15810 3.17623 3.19519 3.21584 3.23747 3.25889 3.27835 3.29384 3.30362 3.30681 3.30393[3,] 2.71141 2.71775 2.72490 2.73344 2.74327 2.75433 2.76642 2.77931 2.79272 2.80649 2.82064[4,] 2.95202 2.95777 2.96419 2.97159 2.98045 2.99090 3.00284 3.01611 3.03048 3.04579 3.06194[5,] 3.03220 3.04793 3.06413 3.08153 3.10078 3.12185 3.14371 3.16510 3.18470 3.0 3.21477[6,] 3.37709 3.39772 3.41828 3.43974 3.46266 3.48663 3.51002 3.53087 3.54711 3.55699 3.55986[,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55][1,] 2.93846 2.94771 2.96019 2.97831 3.00306 3.03506 3.07428 3.11963 3.16868 3.21771 3.26254[2,] 3.29700 3.28925 3.28409 3.28505 3.29326 3.30923 3.33267 3.36251 3.39661 3.43188 3.46492[3,] 2.83541 2.85121 2.86872 2.88905 2.91289 2.94088 2.97325 3.00946 3.04780 3.08554 3.11947[4,] 3.07889 3.09686 3.11629 3.13775 3.16217 3.19068 3.22376 3.26172 3.30379 3.34793 3.39093[5,] 3.22544 3.23505 3.24586 3.26027 3.28063 3.30889 3.34543 3.39019 3.44198 3.49800 3.55407[6,] 3.55656 3.54937 3.54169 3.53692 3.53823 3.54760 3.56512 3.59043 3.62229 3.65830 3.69515[,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66][1,] 3.29988 3.32847 3.34899 3.36342 3.37379 3.38152 3.38741 3.39164 3.39418 3.39490 3.39366[2,] 3.49295 3.51458 3.53004 3.54067 3.54797 3.55306 3.55675 3.55921 3.56045 3.56034 3.55876[3,] 3.14696 3.16677 3.17938 3.18631 3.18924 3.18950 3.18801 3.18498 3.18039 3.17411 3.16611[4,] 3.42920 3.45998 3.48227 3.49687 3.50558 3.51026 3.51221 3.51215 3.51036 3.50682 3.50140[5,] 3.60534 3.64789 3.68011 3.70272 3.71815 3.72863 3.73574 3.74059 3.74357 3.74453 3.74336[6,] 3.72932 3.75803 3.78003 3.79560 3.80614 3.81313 3.81774 3.82079 3.82258 3.82301 3.82206[,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77][1,] 3.39045 3.38541 3.37869 3.37041 3.36073 3.34979 3.33769 3.32443 3.31013 3.29487 3.27891[2,] 3.55571 3.55132 3.54585 3.53950 3.53235 3.52442 3.51583 3.50668 3.49700 3.48683 3.47626[3,] 3.15641 3.14512 3.13241 3.11843 3.10329 3.08714 3.07014 3.05237 3.03393 3.01504 2.99569[4,] 3.49398 3.48457 3.47333 3.46041 3.44595 3.43005 3.41285 3.39450 3.37511 3.35482 3.33376[5,] 3.73991 3.73418 3.72638 3.71676 3.70553 3.69289 3.67900 3.66396 3.64785 3.63085 3.61305[6,] 3.81959 3.81557 3.81021 3.80375 3.79642 3.78835 3.77958 3.77024 3.76040 3.75005 3.73929[,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88][1,] 3.26232 3.24542 3.22828 3.21080 3.19287 3.17433 3.15503 3.13475 3.11339 3.09116 3.06850[2,] 3.46552 3.45501 3.44481 3.43477 3.42465 3.41419 3.40303 3.39082 3.37731 3.36265 3.34745[3,] 2.97612 2.95642 2.93660 2.91667 2.89655 2.87622 2.85563 2.83474 2.81361 2.79235 2.77113[4,] 3.31204 3.28986 3.26730 3.24442 3.22117 3.19757 3.17357 3.14915 3.12429 3.09908 3.07366[5,] 3.59463 3.57582 3.55695 3.53796 3.51880 3.49936 3.47938 3.45869 3.43711 3.41458 3.39129[6,] 3.72831 3.71738 3.70681 3.69664 3.68659 3.67649 3.66611 3.65503 3.64283 3.62938 3.61483[,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99][1,] 3.04596 3.02393 3.00247 2.98145 2.96072 2.94013 2.91978 2.89966 2.87964 2.85960 2.83940[2,] 3.33245 3.31818 3.30473 3.29186 3.27921 3.26655 3.25369 3.24045 3.22659 3.21181 3.19600[3,] 2.75015 2.72956 2.70934 2.68951 2.67009 2.65112 2.63262 2.61461 2.59718 2.58034 2.56404[4,] 3.04825 3.02308 2.99820 2.97367 2.94951 2.92576 2.90251 2.87988 2.85794 2.83672 2.81617[5,] 3.36772 3.34450 3.32201 3.30025 3.27907 3.25831 3.23784 3.21765 3.19766 3.17770 3.15770[6,] 3.59990 3.58535 3.57163 3.55877 3.54651 3.53442 3.52221 3.50972 3.49682 3.48325 3.46870[,100][1,] 2.81920[2,] 3.17942[3,] 2.54816[4,] 2.79622[5,] 3.13753[6,] 3.45307> head(endpoints)[,1] [,2] [,3][1,] 60.5 22.5 16.7[2,] 46.0 40.1 13.5[3,] 71.0 8.4 20.5[4,] 72.8 5.9 20.7[5,] 58.3 25.5 15.5[6,] 44.0 42.7 13.7

( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?

主成分分析

应用主成分分析时,注意以下五点:

可使用样本协方差阵或相关系数矩阵为出发点来进行分析,但大都以相关系数矩阵为主;为使方差达到最大,通常主成分分析是不加以转轴的;成分的保留(可有三种方法参考,1:保留特征值大于1的主成分;2:碎石图,在图形变化最大处之上的主成分均可保留;3:平行分析,将真实数据的特征值与模拟数据的特征值进行比较,保留真实数据的特征值大于模拟数据的特征值的主成分);在实际研究里,研究者如果用不超过三或五个成分就能解释变异的80%,就算令人满意;使用成分得分后,会使各变量的方差为最大,而且各变量之间会彼此独立正交。

PCA=princomp(absorp,cor=T)#cor=T时,输入矩阵为相关系数矩阵,每个元素是0<=x<=1的,对角线为1。help(princomp)

一共有100个成分。查看成分的重要性(每个成分的标准差,方差贡献率,累计方差贡献率)。

可见,第一个成分解释了98.62%的方差。

> summary(PCA)Importance of components:Comp.1Comp.2Comp.3Comp.4 Comp.5 Comp.6Standard deviation9.9310721 0.984736121 0.528511377 0.338274841 8.037979e-02 5.123077e-02Proportion of Variance 0.9862619 0.009697052 0.002793243 0.001144299 6.460911e-05 2.624591e-05Cumulative Proportion 0.9862619 0.995958978 0.998752221 0.999896520 9.999611e-01 9.999874e-01

每个成分的标准差即为特征值开根号的结果,可以选择保留特征值大于1的主成分 。

本例中,保留第一个成分。

> PCA$sdevComp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 9.931072e+00 9.847361e-01 5.285114e-01 3.382748e-01 8.037979e-02 5.123077e-02 2.680884e-02 Comp.8 Comp.9Comp.10Comp.11Comp.12Comp.13Comp.14 1.960880e-02 8.564232e-03 6.739417e-03 4.441898e-03 3.360852e-03 1.867188e-03 1.376574e-03

由碎石图可以看出,第二个主成分及之后图线变化趋于平稳,因此可以选择第一个主成分

screeplot(PCA,type="lines")

结论:这些数据的有效维度为1维。

( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?

进行数据预处理

#数据预处理summary(absorp)summary(endpoints)#有缺失值的变量所在的位置NAcol <- which(colSums(is.na(absorp))>0)#本例无缺失值#计算偏度library(e1071)summary(apply(absorp,2,skewness) )#矩阵转为数据框absorp<-as.data.frame(absorp)#box-cox变换boxcox = function(x){trans = BoxCoxTrans(x)x_bc = predict( trans, x )}absorp.trans<-apply( absorp, 2, boxcox )absorp.trans<-as.data.frame(absorp.trans)#偏度减小summary(apply(absorp.trans,2,skewness) )#响应变量fat<-endpoints[,2]

> #计算偏度> library(e1071)> summary(apply(absorp,2,skewness) )Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8260 0.8432 0.8946 0.9027 0.9667 0.9976 > #偏度减小> summary(apply(absorp.trans,2,skewness) )Min. 1st Qu. MedianMean 3rd Qu.Max. 0.005178 0.021773 0.034949 0.034739 0.046840 0.066915

进行建模。

由于本数据集为小样本(215),且建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择),因此重抽样方法采取自助法。

#本次的重抽样方法用自助法,因为案例为小样本,建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择)#设定随机数种子,这样重抽样数据集可以重复set.seed(100)indx <- createResample(fat,times = 50, list = TRUE)ctrl <- trainControl(method = "boot",number=50,index = indx)

线性回归

#进行线性回归set.seed(100)lmFit1 <- train(x = absorp.trans, y = fat,method = "lm",trControl = ctrl,preProc = c("center", "scale"))lmFit1

> lmFit1Linear Regression 215 samples100 predictorsPre-processing: centered (100), scaled (100) Resampling: Bootstrapped (50 reps) Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... Resampling results:RMSERsquared MAE 2.706693 0.9560697 1.87457Tuning parameter 'intercept' was held constant at a value of TRUE

最小二乘回归PLS

#PLSset.seed(100)plsTune <- train(x = absorp.trans, y = fat,method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelplstuneGrid = expand.grid(ncomp = 1:50),#设定成分数trControl = ctrl,preProc = c("center", "scale"))plsTune

PLS调优参数:最佳成分数为20

> plsTunePartial Least Squares 215 samples100 predictorsPre-processing: centered (100), scaled (100) Resampling: Bootstrapped (50 reps) Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... Resampling results across tuning parameters:ncomp RMSE Rsquared MAE111.183174 0.2396493 9.08812428.686219 0.5457929 6.92979735.436814 0.8171801 4.01712644.719667 0.8639848 3.63722253.183214 0.9391108 2.43277863.113421 0.9417685 2.40798372.981929 0.9478784 2.20336482.803669 0.9537681 2.00477092.658721 0.9578675 1.851331102.486691 0.9635027 1.724633112.291300 0.9688312 1.594113122.148954 0.9725424 1.527899132.046004 0.9746878 1.462254142.019919 0.9752486 1.425622151.909752 0.9777992 1.336137161.760398 0.9808560 1.224250171.666462 0.9829127 1.171777181.590492 0.9845054 1.134067191.567033 0.9849643 1.128898201.534394 0.9855824 1.113200211.560273 0.9850988 1.119986221.566204 0.9849703 1.115952231.553964 0.9851720 1.105929241.591527 0.9845027 1.118788251.625377 0.9838303 1.135157261.658889 0.9831096 1.157611271.683492 0.9824806 1.172554281.744393 0.9811685 1.208383291.795215 0.9800030 1.244794301.848273 0.9788338 1.286987311.883307 0.9780175 1.318409321.938951 0.9767456 1.360510331.986740 0.9755708 1.392391342.023170 0.9746550 1.422097352.071566 0.9734367 1.453087362.112281 0.9724229 1.478971372.146216 0.9715039 1.500828382.175165 0.9708326 1.520763392.192173 0.9705042 1.536536402.222708 0.9697809 1.557308412.246722 0.9692350 1.575675422.256637 0.9689731 1.587785432.274497 0.9685442 1.604375442.306888 0.9677469 1.627217452.329405 0.9671802 1.644553462.359832 0.9663816 1.662377472.374701 0.9659943 1.673540482.404638 0.965 1.691980492.433347 0.9643316 1.709945502.449268 0.9638598 1.721255RMSE was used to select the optimal model using the smallest value.The final value used for the model was ncomp = 20.

稳健回归

#稳健回归set.seed(100)rlmFit <- train(x = absorp.trans, y = fat,method = "rlm",trControl = ctrl,preProc = c("center", "scale"))rlmFit

有截距,方法为huber

> rlmFitRobust Linear Model 215 samples100 predictorsPre-processing: centered (100), scaled (100) Resampling: Bootstrapped (50 reps) Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... Resampling results across tuning parameters:intercept psi RMSE Rsquared MAEFALSEpsi.huber18.145830 0.9560697 17.940133FALSEpsi.hampel 18.145830 0.9560697 17.940133FALSEpsi.bisquare 18.146153 0.9560864 17.940487TRUEpsi.huber3.25 0.9385840 2.215829TRUEpsi.hampel 25.311035 0.3743814 17.164686TRUEpsi.bisquare 76.651845 0.4210313 54.553360RMSE was used to select the optimal model using the smallest value.The final values used for the model were intercept = TRUE and psi = psi.huber.

岭回归

#岭回归#用train函数选择岭回归的最佳参数#设定正则化参数 取值范围为0-0.1,中间取15个值ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))set.seed(100)ridgeTune <- train(x = absorp.trans, y = fat,method = "ridge", #岭回归tuneGrid = ridgeGrid,trControl = ctrl,preProc = c("center", "scale"))ridgeTune

岭回归调优参数:正则化参数lambda为0,即不进行正则化

> ridgeTuneRidge Regression 215 samples100 predictorsPre-processing: centered (100), scaled (100) Resampling: Bootstrapped (50 reps) Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... Resampling results across tuning parameters:lambda RMSERsquared MAE0.000000000 2.706699 0.9560687 1.8745790.007142857 3.643592 0.9213753 2.7489340.014285714 4.052245 0.9031794 3.0116700.021428571 4.316364 0.8907687 3.1753790.028571429 4.511024 0.8814221 3.3005250.035714286 4.668275 0.8737664 3.4064870.042857143 4.803098 0.8670998 3.5022200.050000000 4.923200 0.8610413 3.5897880.057142857 5.032872 0.8553722 3.6726770.064285714 5.134671 0.8499616 3.7516020.071428571 5.230211 0.8447286 3.8268110.078571429 5.320565 0.8396221 3.8989180.085714286 5.406487 0.8346093 3.9679950.092857143 5.488526 0.8296690 4.0343240.100000000 5.567103 0.8247876 4.099350RMSE was used to select the optimal model using the smallest value.The final value used for the model was lambda = 0.

弹性网

#enet弹性网#弹性网模型同时具有岭回归罚参数和lasso 罚参数#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)#fraction为lasso罚参数,取值范围为0.05-1,取了20个值enetGrid <- expand.grid(lambda = seq(0, .1, length = 15), fraction = seq(.05, 1, length = 20))set.seed(100)enetTune <- train(x = absorp.trans, y = fat,method = "enet", #弹性网 elastic nettuneGrid = enetGrid,trControl = ctrl,preProc = c("center", "scale"))enetTune

弹性网的调优参数:岭回归罚lambda为0,lasso罚为0.0526

> enetTuneElasticnet 215 samples100 predictorsPre-processing: centered (100), scaled (100) Resampling: Bootstrapped (50 reps) Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... Resampling results across tuning parameters:lambda fraction RMSE Rsquared MAE0.000000000 0.00000000 12.719745 NaN 10.7471910.000000000 0.05263158 1.505096 0.9861744 1.0723240.000000000 0.10526316 1.611281 0.9839983 1.1331290.000000000 0.15789474 1.713409 0.9818137 1.350.000000000 0.21052632 1.787748 0.9802273 1.2576520.000000000 0.26315789 1.844759 0.9790006 1.3034760.000000000 0.31578947 1.897734 0.9778433 1.3443800.000000000 0.36842105 1.952677 0.9766278 1.3834940.000000000 0.42105263 2.008177 0.9753659 1.4245760.000000000 0.47368421 2.060634 0.9741002 1.4633390.000000000 0.52631579 2.113615 0.9727729 1.5025240.000000000 0.57894737 2.170547 0.9713402 1.5409150.000000000 0.63157895 2.232797 0.9697258 1.5817970.000000000 0.68421053 2.294018 0.9680902 1.6208180.000000000 0.73684211 2.357063 0.9663678 1.6604600.000000000 0.78947368 2.423098 0.9645125 1.7004070.000000000 0.84210526 2.490069 0.9625804 1.7407590.000000000 0.89473684 2.560683 0.9604961 1.7842470.000000000 0.94736842 2.632378 0.9583456 1.8287120.000000000 1.00000000 2.706699 0.9560687 1.8745790.007142857 0.00000000 12.719745 NaN 10.7471910.007142857 0.05263158 10.350630 0.3671638 8.4389780.007142857 0.10526316 9.476790 0.4998352 7.7286730.007142857 0.15789474 8.639364 0.6119787 7.0450300.007142857 0.21052632 7.846910 0.6977900 6.3788260.007142857 0.26315789 7.111837 0.7588304 5.7448520.007142857 0.31578947 6.465549 0.7991165 5.1680810.007142857 0.36842105 5.913752 0.8265441 4.6560870.007142857 0.42105263 5.440989 0.8490667 4.2295220.007142857 0.47368421 5.027732 0.8674093 3.8589510.007142857 0.52631579 4.688027 0.8811803 3.5508590.007142857 0.57894737 4.415170 0.8914804 3.3079660.007142857 0.63157895 4.206019 0.8992059 3.1361130.007142857 0.68421053 4.051948 0.9049141 3.0238970.007142857 0.73684211 3.935221 0.9093661 2.9454590.007142857 0.78947368 3.848531 0.9128567 2.8884260.007142857 0.84210526 3.779217 0.9157310 2.8419660.007142857 0.89473684 3.722209 0.9181026 2.8030510.007142857 0.94736842 3.676363 0.9200050 2.7715920.007142857 1.00000000 3.643592 0.9213753 2.7489340.014285714 0.00000000 12.719745 NaN 10.7471910.014285714 0.05263158 10.502349 0.3428964 8.5586950.014285714 0.10526316 9.773519 0.4546769 7.9611950.014285714 0.15789474 9.069145 0.5553162 7.3903260.014285714 0.21052632 8.394427 0.6389904 6.8328860.014285714 0.26315789 7.761244 0.7038771 6.2961830.014285714 0.31578947 7.168269 0.7528274 5.7858710.014285714 0.36842105 6.630843 0.7880543 5.3099860.014285714 0.42105263 6.163194 0.8129957 4.8812570.014285714 0.47368421 5.767428 0.8322562 4.5123570.014285714 0.52631579 5.416732 0.8490222 4.1943300.014285714 0.57894737 5.106516 0.8627297 3.9080380.014285714 0.63157895 4.841776 0.8735009 3.6602740.014285714 0.68421053 4.631100 0.8814848 3.4646790.014285714 0.73684211 4.459273 0.8877800 3.3125590.014285714 0.78947368 4.326597 0.8925717 3.2027470.014285714 0.84210526 4.226622 0.8961885 3.1294340.014285714 0.89473684 4.151838 0.8990589 3.0773070.014285714 0.94736842 4.095699 0.9013489 3.0397110.014285714 1.00000000 4.052245 0.9031794 3.0116700.021428571 0.00000000 12.719745 NaN 10.7471910.021428571 0.05263158 10.568050 0.3321870 8.6078460.021428571 0.10526316 9.903277 0.4339021 8.0598250.021428571 0.15789474 9.258979 0.5277223 7.5369740.021428571 0.21052632 8.643469 0.6077921 7.0316430.021428571 0.26315789 8.056217 0.6732191 6.5384040.021428571 0.31578947 7.500502 0.7247124 6.0633810.021428571 0.36842105 6.989114 0.7634905 5.670.021428571 0.42105263 6.534448 0.7915351 5.2119650.021428571 0.47368421 6.143486 0.8120566 4.8489220.021428571 0.52631579 5.799866 0.8294819 4.5278030.021428571 0.57894737 5.491306 0.8443804 4.2426450.021428571 0.63157895 5.225475 0.8562590 3.9926740.021428571 0.68421053 4.995680 0.8657523 3.7732730.021428571 0.73684211 4.802370 0.8732078 3.5901650.021428571 0.78947368 4.648047 0.8788861 3.4464890.021428571 0.84210526 4.525632 0.8832665 3.3388550.021428571 0.89473684 4.430319 0.8866321 3.2600020.021428571 0.94736842 4.364849 0.8889374 3.2100780.021428571 1.00000000 4.316364 0.8907687 3.1753790.028571429 0.00000000 12.719745 NaN 10.7471910.028571429 0.05263158 10.605392 0.3259594 8.6335460.028571429 0.10526316 9.976188 0.4217673 8.1127060.028571429 0.15789474 9.368227 0.5108284 7.6181460.028571429 0.21052632 8.784915 0.5884503 7.1396890.028571429 0.26315789 8.224602 0.6535386 6.6708140.028571429 0.31578947 7.696412 0.7055287 6.2210960.028571429 0.36842105 7.207055 0.7456827 5.7994050.028571429 0.42105263 6.764782 0.7757525 5.4079320.028571429 0.47368421 6.376330 0.7980554 5.0517090.028571429 0.52631579 6.038847 0.8158720 4.7340500.028571429 0.57894737 5.739066 0.8312751 4.4529230.028571429 0.63157895 5.476325 0.8437891 4.2055950.028571429 0.68421053 5.242242 0.8541133 3.9812190.028571429 0.73684211 5.041635 0.8623182 3.7867110.028571429 0.78947368 4.879378 0.8684836 3.6311710.028571429 0.84210526 4.745253 0.8734010 3.5047120.028571429 0.89473684 4.641796 0.8770309 3.4106860.028571429 0.94736842 4.567118 0.8795336 3.3459360.028571429 1.00000000 4.511024 0.8814221 3.3005250.035714286 0.00000000 12.719745 NaN 10.7471910.035714286 0.05263158 10.630389 0.3216962 8.6491420.035714286 0.10526316 10.025364 0.4132908 8.1463980.035714286 0.15789474 9.441794 0.4988722 7.6706470.035714286 0.21052632 8.878988 0.5746934 7.2082900.035714286 0.26315789 8.339491 0.6389087 6.7569220.035714286 0.31578947 7.831674 0.6908389 6.3255320.035714286 0.36842105 7.359193 0.7316532 5.9194040.035714286 0.42105263 6.926296 0.7631330 5.5391670.035714286 0.47368421 6.543396 0.7867501 5.1914190.035714286 0.52631579 6.210901 0.8051696 4.8797790.035714286 0.57894737 5.921095 0.8207037 4.6051580.035714286 0.63157895 5.658841 0.8338811 4.3565540.035714286 0.68421053 5.425099 0.8447735 4.1320490.035714286 0.73684211 5.224377 0.8533910 3.9357460.035714286 0.78947368 5.056868 0.8600396 3.7720630.035714286 0.84210526 4.916623 0.8653073 3.6366310.035714286 0.89473684 4.810863 0.8690600 3.5363020.035714286 0.94736842 4.729634 0.8717975 3.4612350.035714286 1.00000000 4.668275 0.8737664 3.4064870.042857143 0.00000000 12.719745 NaN 10.7471910.042857143 0.05263158 10.649006 0.3183598 8.6594080.042857143 0.10526316 10.063732 0.4065052 8.1714190.042857143 0.15789474 9.498332 0.4893522 7.7093200.042857143 0.21052632 8.951603 0.5635681 7.2590620.042857143 0.26315789 8.429932 0.6267147 6.8228510.042857143 0.31578947 7.938112 0.6784272 6.4058750.042857143 0.36842105 7.478275 0.7197554 6.0107010.042857143 0.42105263 7.053938 0.7521797 5.6391660.042857143 0.47368421 6.677302 0.7767562 5.2994040.042857143 0.52631579 6.352896 0.7955833 4.9968250.042857143 0.57894737 6.066895 0.8115599 4.7257890.042857143 0.63157895 5.807544 0.8252168 4.4779260.042857143 0.68421053 5.574963 0.8365873 4.2536070.042857143 0.73684211 5.375833 0.8455274 4.0584610.042857143 0.78947368 5.204039 0.8526667 3.8888740.042857143 0.84210526 5.061502 0.8581731 3.7484930.042857143 0.89473684 4.954026 0.8620665 3.6446320.042857143 0.94736842 4.868042 0.8650243 3.5625850.042857143 1.00000000 4.803098 0.8670998 3.5022200.050000000 0.00000000 12.719745 NaN 10.7471910.050000000 0.05263158 10.664511 0.3155007 8.6670790.050000000 0.10526316 10.096683 0.4006075 8.1923190.050000000 0.15789474 9.546087 0.4811327 7.7410900.050000000 0.21052632 9.014028 0.5537204 7.3021580.050000000 0.26315789 8.507619 0.6158259 6.8784130.050000000 0.31578947 8.028660 0.6673112 6.4721900.050000000 0.36842105 7.579264 0.7090114 6.0862930.050000000 0.42105263 7.164037 0.7420437 5.7231390.050000000 0.47368421 6.794556 0.7672864 5.3913970.050000000 0.52631579 6.476421 0.7866019 5.0963950.050000000 0.57894737 6.193359 0.8030704 4.8290240.050000000 0.63157895 5.936877 0.8171967 4.5839140.050000000 0.68421053 5.707621 0.8288982 4.3614430.050000000 0.73684211 5.508621 0.8382434 4.1660450.050000000 0.78947368 5.334012 0.8458269 3.9929270.050000000 0.84210526 5.191385 0.8515252 3.8514080.050000000 0.89473684 5.081095 0.8556455 3.7420930.050000000 0.94736842 4.991350 0.8588172 3.6550830.050000000 1.00000000 4.923200 0.8610413 3.5897880.057142857 0.00000000 12.719745 NaN 10.7471910.057142857 0.05263158 10.676963 0.3130233 8.6727580.057142857 0.10526316 10.125993 0.3953287 8.2105360.057142857 0.15789474 9.588444 0.4737441 7.7688320.057142857 0.21052632 9.069954 0.5447150 7.3399950.057142857 0.26315789 8.576760 0.6058303 6.9265220.057142857 0.31578947 8.109112 0.6570301 6.5303000.057142857 0.36842105 7.668838 0.6990048 6.1523850.057142857 0.42105263 7.262463 0.7324665 5.7971810.057142857 0.47368421 6.901056 0.7581593 5.4738700.057142857 0.52631579 6.586697 0.7780815 5.1838510.057142857 0.57894737 6.307299 0.7949700 4.9211660.057142857 0.63157895 6.053480 0.8095358 4.6791310.057142857 0.68421053 5.828208 0.8215064 4.4600880.057142857 0.73684211 5.628373 0.8313213 4.2632050.057142857 0.78947368 5.452644 0.8392831 4.0882330.057142857 0.84210526 5.310478 0.8451827 3.9466650.057142857 0.89473684 5.197039 0.8495755 3.8333970.057142857 0.94736842 5.104287 0.8529609 3.7419330.057142857 1.00000000 5.032872 0.8553722 3.6726770.064285714 0.00000000 12.719745 NaN 10.7471910.064285714 0.05263158 10.686288 0.3108834 8.6769530.064285714 0.10526316 10.152606 0.3905288 8.2267720.064285714 0.15789474 9.626873 0.4669835 7.7935710.064285714 0.21052632 9.120859 0.5363797 7.3737400.064285714 0.26315789 8.639160 0.5965806 6.9694170.064285714 0.31578947 8.181790 0.6474532 6.5821300.064285714 0.36842105 7.750265 0.6895483 6.2119520.064285714 0.42105263 7.352325 0.7233143 5.8642930.064285714 0.47368421 6.998582 0.7493784 5.5484920.064285714 0.52631579 6.687170 0.7698967 5.2625440.064285714 0.57894737 6.411479 0.7871715 5.0045140.064285714 0.63157895 6.161145 0.8020894 4.7663870.064285714 0.68421053 5.938954 0.8143581 4.5510330.064285714 0.73684211 5.738487 0.8246301 4.3530210.064285714 0.78947368 5.562997 0.8329097 4.1779920.064285714 0.84210526 5.420938 0.8390481 4.0362340.064285714 0.89473684 5.304599 0.8437340 3.9194020.064285714 0.94736842 5.209377 0.8473351 3.8247810.064285714 1.00000000 5.134671 0.8499616 3.751602[ reached getOption("max.print") -- omitted 100 rows ]RMSE was used to select the optimal model using the smallest value.The final values used for the model were fraction = 0.05263158 and lambda = 0.

( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?

比较模型结果:

caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。

对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为自助法,设定抽取了50次,因此每个算法的最终模型都有50个结果。

#模型比较#resamples函数可以分析和可视化重抽样的结果resamp <- resamples( list(lm=lmFit1,rlm=rlmFit,pls=plsTune,ridge=ridgeTune,enet=enetTune) )summary(resamp)

可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而rlm的RMSE均值最高,代表预测效果最差。

> summary(resamp)Call:summary.resamples(object = resamp)Models: lm, rlm, pls, ridge, enet Number of resamples: 50 MAE Min. 1st Qu. MedianMean 3rd Qu.Max. NA'slm 1.3090300 1.6211047 1.879747 1.874570 2.024030 2.989562 0rlm 1.4045612 1.973 2.183430 2.215829 2.456707 3.715148 0pls 0.8375685 1.0323470 1.101890 1.113200 1.180808 1.440820 0ridge 1.3089927 1.6211901 1.879508 1.874579 2.024011 2.990832 0enet 0.8195073 0.9752678 1.076620 1.072324 1.156561 1.291045 0RMSE Min. 1st Qu. MedianMean 3rd Qu.Max. NA'slm 1.705850 2.286390 2.678425 2.706693 3.014897 4.458380 0rlm 1.886688 2.693024 3.180843 3.25 3.576437 5.347496 0pls 1.068864 1.370754 1.503226 1.534394 1.646459 2.127759 0ridge 1.705678 2.286423 2.678400 2.706699 3.014607 4.460421 0enet 1.109904 1.346236 1.484470 1.505096 1.661772 2.002391 0Rsquared Min. 1st Qu. MedianMean 3rd Qu.Max. NA'slm 0.8969856 0.9481050 0.9575495 0.9560697 0.9703579 0.9829345 0rlm 0.8266755 0.9283558 0.9418637 0.9385840 0.9597848 0.9779785 0pls 0.9708559 0.9823688 0.9869358 0.9855824 0.9888411 0.9926647 0ridge 0.8969050 0.9481031 0.9575498 0.9560687 0.9703573 0.9829340 0enet 0.9744586 0.9833999 0.9869512 0.9861744 0.9887381 0.9930981 0

绘图,每一个模型的RMSE均值的置信区间。

dotplot( resamp, metric="RMSE" )

由图可直观看出,enet与pls预测效果最优且两者效果相近,rlm预测效果最差。

用diff函数进行对比。

summary(diff(resamp))

上三角:estimates of the difference,差值

下三角:P值

从RMSE角度进行考虑,联系差值与P值,可以得出结论:rlm显著地比其他模型差,enet显著地比其他模型好。

> summary(diff(resamp))Call:summary.diff.resamples(object = diff(resamp))p-value adjustment: bonferroni Upper diagonal: estimates of the differenceLower diagonal: p-value for H0: difference = 0MAE lm rlm pls ridgeenetlm -3.413e-01 7.614e-01 -8.713e-06 8.022e-01rlm 1.605e-11 1.103e+00 3.413e-01 1.144e+00pls < 2.2e-16 < 2.2e-16 -7.614e-01 4.088e-02ridge 1 1.591e-11 < 2.2e-16 8.023e-01enet < 2.2e-16 < 2.2e-16 1.420e-05 < 2.2e-16 RMSE lm rlm pls ridgeenetlm -5.134e-01 1.172e+00 -6.918e-06 1.202e+00rlm 2.097e-11 1.686e+00 5.134e-01 1.715e+00pls < 2.2e-16 < 2.2e-16 -1.172e+00 2.930e-02ridge 1.0000 2.082e-11 < 2.2e-16 1.202e+00enet < 2.2e-16 < 2.2e-16 0.2861< 2.2e-16 Rsquared lm rlm pls ridgeenetlm1.749e-02 -2.951e-02 9.420e-07 -3.010e-02rlm 5.945e-09 -4.700e-02 -1.748e-02 -4.759e-02pls 1.224e-14 1.451e-14 2.951e-02 -5.920e-04ridge 1.0000 5.896e-09 1.238e-14 -3.011e-02enet 1.679e-15 5.609e-15 0.33681.698e-15

( e)解释你将使用哪个模型来预测样品的脂肪含量。

选择弹性网,因为此模型显著地比其他模型好,RMSE最低。

应用预测建模第六章线性回归习题6.1【主成分分析 模型的最优参数选择与模型对比 多元线性回归 稳健回归 偏最小二乘回归 岭回归 lasso回归 弹性网】

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。