1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > 企业级用户画像:开发RFM模型实例

企业级用户画像:开发RFM模型实例

时间:2024-01-12 19:50:45

相关推荐

企业级用户画像:开发RFM模型实例

絮叨两句:

博主是一名数据分析实习生,利用博客记录自己所学的知识,也希望能帮助到正在学习的同学们

人的一生中会遇到各种各样的困难和折磨,逃避是解决不了问题的,唯有以乐观的精神去迎接生活的挑战

少年易老学难成,一寸光阴不可轻。

最喜欢的一句话:今日事,今日毕

本篇文章为大家分享如何开发RFM模型实例,之前文章有介绍过什么是RFM,具体了解请点击:使用大数据去挖掘每个用户的客户价值-RFM

开发RFM使用的是K-Means算法,K-Means算法入门案例:

如何了解K-Means聚类算法?[内含鸢尾花案例]

前期准备工作企业级360°全方位用户画像:标签开发(前期准备工作)

需求分析

用户的客户价值肯定是有等级的:

超高价值高价值中上价值中价值中下价值低价值超低价值

客户价值标签规则:inType=HBase##zkHosts=192.168.10.20##zkPort=2181##hbaseTable=tbl_orders##family=detail##selectFields=memberId,orderSn,orderAmount,finishTime

代码

package cn.itcast.userprofile.up24.newexcavateimport cn.itcast.userprofile.up24.public.PublicStaticCodeimport org.apache.spark.ml.clustering.KMeansimport org.apache.spark.ml.feature.{MinMaxScaler, VectorAssembler}import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}import scala.collection.immutableobject RFM extends PublicStaticCode{override def SetAppName: String = "RFM"override def Four_Name: String = "客户价值"override def compilerAdapterFactory(spark: SparkSession, five: DataFrame, tblUser: DataFrame): DataFrame = {/*** +------+----+* |tagsId|rule|* +------+----+* |38 |1 |* |39 |2 |* |40 |3 |* |41 |4 |* |42 |5 |* |43 |6 |* |44 |7 |* +------+----+*/// five.show(false)/*** orderSn订单号* orderAmount订单总金额,等于商品总金额+运费* finishTime订单完成时间* +---------+-------------------+-----------+----------+* |memberId |orderSn |orderAmount|finishTime|* +---------+-------------------+-----------+----------+* |13823431 |ts_792756751164275 |2479.45 |1564415022|* |4035167 |D14090106121770839 |2449.00 |1565687310|* |4035291 |D14090112394810659 |1099.42 |1564681801|* |4035041 |fx_787749561729045 |1999.00 |1565799378|* |13823285 |D1409214435903 |2488.00 |1565062072|*/// tblUser.show(false)/*** Desc 客户价值模型-RFM:* R值:最近一次消费(Recency) 最近一次消费,最后一次订单距今时间* F值:消费频率(Frequency) 消费频率,订单总数量* M值:消费金额(Monetary) 消费金额,订单总金额**/import spark.implicits._import scala.collection.JavaConversions._import org.apache.spark.sql.functions._//0.定义常量字符串,避免后续拼写错误val recencyStr = "recency"val frequencyStr = "frequency"val monetaryStr = "monetary"val featureStr = "feature"val predictStr = "predict"//1.按用户id进行聚合获取客户RFM//客户价值模型-RFM://Rencency:最近一次消费,最后一次订单距今时间//Frequency:消费频率,订单总数量//Monetary:消费金额,订单总金额/*** datediff(end: Column, start: Column) 两日期间隔天数 date_sub(start: Column, days: Int) 指定日期之前n天* from_unixtime(ut: Column, f: String) 时间戳转字符串格式*///date_sub(current_timestamp(),361) 为什么要这样写! HBase里存储的数据已经很长时间了var recencyAggColumn=datediff(date_sub(current_timestamp(),361),from_unixtime(max("finishTime"))) as recencyStrvar frequencyAggColumn=count("orderSn") as frequencyStrvar monetaryAggColum=sum("orderAmount") as monetaryStrval rfm_Result: DataFrame = tblUser.groupBy("memberId").agg(recencyAggColumn, frequencyAggColumn, monetaryAggColum)// rfm_Result.show(false)/*** +---------+-------+---------+------------------+* |memberId |recency|frequency|monetary|* +---------+-------+---------+------------------+* |13822725 |61|116|179298.34 |* |13823083 |61|132|233524.17 |* |138230919|61|125|240061.56999999998|* |13823681 |61|108|169746.1|* |4033473 |61|142|251930.92 |* |13822841 |61|113|205931.91 |* |13823153 |61|133|250698.57 |*///2.为RFM打分//R: 1-3天=5分,4-6天=4分,7-9天=3分,10-15天=2分,大于16天=1分//F: ≥200=5分,150-199=4分,100-149=3分,50-99=2分,1-49=1分//M: ≥20w=5分,10-19w=4分,5-9w=3分,1-4w=2分,<1w=1分var recencyScore=when((col(recencyStr) >= 1) && (col(recencyStr) <= 3), 5).when((col(recencyStr)>=4) && (col(recencyStr)<=6),4).when((col(recencyStr)>=7) && (col(recencyStr)<=9),3).when((col(recencyStr)>=10) && (col(recencyStr)<=15),2).when(col(recencyStr)>16 ,1).as(recencyStr)val frequencyScore = when(col(frequencyStr) >= 200, 5).when((col(frequencyStr) >= 150) && (col(frequencyStr) <= 199), 4).when((col(frequencyStr) >= 100) && (col(frequencyStr) <= 149), 3).when((col(frequencyStr) >= 50) && (col(frequencyStr) <= 99), 2).when((col(frequencyStr) >= 1) && (col(frequencyStr) <= 49), 1).as(frequencyStr)val monetaryScore = when(col(monetaryStr) >= 200000, 5).when(col(monetaryStr).between(100000, 199999), 4).when(col(monetaryStr).between(50000, 99999), 3).when(col(monetaryStr).between(10000, 49999), 2).when(col(monetaryStr) <= 9999, 1).as(monetaryStr)val rfm_Socre: DataFrame = rfm_Result.select('memberId, recencyScore, frequencyScore, monetaryScore)// rfm_Socre.show(10,false)/*** +---------+-------+---------+--------+* |memberId |recency|frequency|monetary|* +---------+-------+---------+--------+* |13822725 |1|3 |4 |* |13823083 |1|3 |5 |* |138230919|1|3 |5 |* |13823681 |1|3 |4 |* |4033473 |1|3 |5 |* |13822841 |1|3 |5 |* |13823153 |1|3 |5 |* |13823431 |1|3 |4 |* |4033348 |1|3 |5 |* |4033483 |1|3 |4 |* +---------+-------+---------+--------+*///3.聚类//为方便后续模型进行特征输入,需要部分列的数据转换为特征向量,并统一命名,VectorAssembler类就可以完成这一任务。//VectorAssembler是一个transformer,将多列数据转化为单列的向量列val vectorAss: DataFrame = new VectorAssembler().setInputCols(Array(recencyStr, frequencyStr, monetaryStr)).setOutputCol(featureStr).transform(rfm_Socre)//vectorAss.show()/*** +---------+-------+---------+--------+-------------+* | memberId|recency|frequency|monetary|feature|* +---------+-------+---------+--------+-------------+* | 13822725|1| 3| 4|[1.0,3.0,4.0]|* | 13823083|1| 3| 5|[1.0,3.0,5.0]|* |138230919|1| 3| 5|[1.0,3.0,5.0]|* | 13823681|1| 3| 4|[1.0,3.0,4.0]|* | 4033473|1| 3| 5|[1.0,3.0,5.0]|* | 13822841|1| 3| 5|[1.0,3.0,5.0]|* | 13823153|1| 3| 5|[1.0,3.0,5.0]|* | 13823431|1| 3| 4|[1.0,3.0,4.0]|*//*** 数据归一化* 把数据映射到0-1范围之内,更加便捷快速* 可以使用,也可以不用*/val minMaxScalerModel = new MinMaxScaler().setInputCol(featureStr).setOutputCol(featureStr + "Out").fit(vectorAss)val scalerDF = minMaxScalerModel.transform(vectorAss)// scalerDF.show()/*** +---------+-------+---------+--------+-------------+--------------+* | memberId|recency|frequency|monetary|feature| featureOut|* +---------+-------+---------+--------+-------------+--------------+* | 13822725|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 13823083|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* |138230919|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 13823681|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 4033473|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 13822841|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 13823153|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 13823431|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 4033348|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 4033483|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 4033575|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 4034191|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 4034923|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 13823077|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* |138230937|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 4034761|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 4035131|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* | 13822847|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* |138230911|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* | 4034221|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|* +---------+-------+---------+--------+-------------+--------------+*/val kMeans = new KMeans().setK(7).setSeed(10) //可重复的随机种子.setMaxIter(10) //最大迭代次数.setFeaturesCol(featureStr+"Out") //特征列.setPredictionCol(predictStr) //预测结果列//4.训练模型val model = kMeans.fit(scalerDF)//5.预测val result: DataFrame = model.transform(scalerDF)// result.show(10)/*** +---------+-------+---------+--------+-------------+--------------+-------+* | memberId|recency|frequency|monetary|feature| featureOut|predict|* +---------+-------+---------+--------+-------------+--------------+-------+* | 13822725|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|1|* | 13823083|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|0|* |138230919|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|0|* | 13823681|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|1|* | 4033473|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|0|* | 13822841|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|0|* | 13823153|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|0|* | 13823431|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|1|* | 4033348|1| 3| 5|[1.0,3.0,5.0]| [0.5,0.5,1.0]|0|* | 4033483|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|1|* +---------+-------+---------+--------+-------------+--------------+-------+*///6.测试时看下聚类效果val ds: Dataset[Row] = result.groupBy(predictStr).agg(max(col(recencyStr) + col(frequencyStr) + col(monetaryStr)), min(col(recencyStr) + col(frequencyStr) + col(monetaryStr))).sort(col(predictStr).asc)// ds.show()/*** +-------+---------------------------------------+---------------------------------------+* |predict|max(((recency + frequency) + monetary))|min(((recency + frequency) + monetary))|* +-------+---------------------------------------+---------------------------------------+* |0| 9| 9|* |1| 8| 8|* |2| 3| 3|* |3| 7| 7|* |4| 11| 10|* |5| 5| 4|* |6| 8| 7|* +-------+---------------------------------------+---------------------------------------+*///问题: 每一个簇的ID是无序的,但是我们将分类簇和rule进行对应的时候,需要有序//7.按质心排序,质心大,该类用户价值大//[(质心id, 质心值)]val centers: immutable.Seq[(Int, Double)] = for (i <- model.clusterCenters.indices) yield (i, model.clusterCenters(i).toArray.sum)val sortedCenter: immutable.Seq[(Int, Double)] = centers.sortBy(_._2).reverse// sortedCenter.foreach(println)/*** (4,2.2596153846153846)* (0,2.0)* (1,1.75)* (6,1.625)* (3,1.5)* (5,0.8333333333333333)* (2,0.5)*///[(质心id, rule值)]val centerIdAndRule = for (i <- sortedCenter.indices) yield (sortedCenter(i)._1, i + 1)// centerIdAndRule.foreach(println)/*** (4,1)* (0,2)* (1,3)* (6,4)* (3,5)* (5,6)* (2,7)*/val centerDf: DataFrame = centerIdAndRule.toDF(predictStr, "rule")// centerDf.show()/*** +-------+----+* |predict|rule|* +-------+----+* |4| 1|* |0| 2|* |1| 3|* |6| 4|* |3| 5|* |5| 6|* |2| 7|* +-------+----+*//*** 将预测的结果和五级标签进行匹配*/val ruleTag: DataFrame = centerDf.join(five, "rule")// result.show()/*** +----+-------+------+* |rule|predict|tagsId|* +----+-------+------+* | 1|4| 38|* | 2|0| 39|* | 3|1| 40|* | 4|6| 41|* | 5|3| 42|* | 6|5| 43|* | 7|2| 44|* +----+-------+------+*/val perdict: DataFrame = ruleTag.select(predictStr, "tagsId")// 方法一:直接与K-Means计算出的结果进行Join// var new_tag= perdict.join(result,predictStr)//new_tag.show()/*** +-------+------+---------+-------+---------+--------+-------------+--------------+* |predict|tagsId| memberId|recency|frequency|monetary|feature| featureOut|* +-------+------+---------+-------+---------+--------+-------------+--------------+* |1| 40| 13822725|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 13823681|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 13823431|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4033483|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4034923|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40|138230937|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4035131|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4034603|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40|138230903|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4035125|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4035101|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4033585|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 13823023|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4034927|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 13823069|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4034205|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 4035183|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 69|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 13823439|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* |1| 40| 13823085|1| 3| 4|[1.0,3.0,4.0]|[0.5,0.5,0.75]|* +-------+------+---------+-------+---------+--------+-------------+--------------+*///方法二:val perMap = perdict.map(t => {val pre = t.getAs(predictStr).toStringval tag = t.getAs("tagsId").toString(pre, tag)}).collect().toMapprintln(perMap)//Map(4 -> 38, 5 -> 43, 6 -> 41, 1 -> 40, 0 -> 39, 2 -> 44, 3 -> 42)var predictUdf=udf((perdict:String)=>{var tag=perMap(perdict)tag})val new_Tag = result.select('memberId as "userId", predictUdf('predict) as "tagsId")new_Tag.show()/*** +---------+------+* | memberId|tagsId|* +---------+------+* | 13822725| 40|* | 13823083| 39|* |138230919| 39|* | 13823681| 40|* | 4033473| 39|* | 13822841| 39|* | 13823153| 39|* | 13823431| 40|* | 4033348| 39|* | 4033483| 40|* | 4033575| 39|* | 4034191| 39|* | 4034923| 40|* | 13823077| 39|* |138230937| 40|* | 4034761| 39|* | 4035131| 40|* | 13822847| 39|* |138230911| 39|* | 4034221| 39|* +---------+------+*/new_Tag}def main(args: Array[String]): Unit = {startMain()}}

以上就是RFM模型实例的开发

若有什么正确的地方还请及时反馈,博主及时更正

如能帮助到你,希望能点个赞支持一下谢谢!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。