1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > mapreduce程序本地运行 单词统计案例

mapreduce程序本地运行 单词统计案例

时间:2020-08-23 10:11:55

相关推荐

mapreduce程序本地运行 单词统计案例

mapreduce程序本地运行单词统计案例,输入输出数据放在本地

集群模式运行:/weixin_43614067/article/details/108400938

本地提交给集群中运行:/weixin_43614067/article/details/108401227

统计单词文本,word.txt(位于C:\Users\Think\Desktop\input\word.txt)

Stray birds of summer come to my window to sing and fly away And yellow leaves of autumnwhich have no songs flutter and fall there with asign O Troupe of little vagrants of the world leave your footprints in mywords The world puts off its mask of vastness to its lover It becomes small as one song as one kiss of the eternal It is the tears of the earth that keep her smiles in bloom The mighty desert is burning for the love of a blade of grass who shakes her head and laughs and flies away If you shed tears when you miss the sun you also miss the stars The sands in your way beg for your song and your movement dancing water Will you carry the burden of their lameless Her wishful face haunts my dreams like the rain at night Once we dreamt that we were strangers We wake up to find that we were dear to each other

Mapper端

package com.bjsxt.wc;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{@Overrideprotected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {//LongWritable为输入文本内容的每一行起始位置,value为每一行内容String line = value.toString();String[] words = line.split(" ");for (String word : words) {context.write(new Text(word), new IntWritable(1));}}}

Reducer端

package com.bjsxt.wc;import java.io.IOException;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.io.ImmutableBytesWritable;import org.apache.hadoop.hbase.mapreduce.TableReducer;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> {@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {Integer count = 0;for (IntWritable value : values) {count += value.get();}context.write(key, new IntWritable(count));}}

Runner端

package com.bjsxt.wc;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WCRunner {public static void main(String[] args) throws Exception {//创建配置对象Configuration conf = new Configuration();//创建Job对象Job job = Job.getInstance(conf, "wordCount");//设置mapper类job.setMapperClass(WCMapper.class);//设置 Reduce类job.setReducerClass(WCReducer.class);//设置运行job类job.setJarByClass(WCRunner.class);//设置map输出的key,value类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);//设置reduce输出的key,value类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);//设置输入路径金额输出路径/*FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));*/FileInputFormat.setInputPaths(job, new Path("C:\Users\Think\Desktop\input\word.txt"));FileOutputFormat.setOutputPath(job, new Path("C:\Users\Think\Desktop\output"))long startTime = System.currentTimeMillis();try {//提交jobboolean b = job.waitForCompletion(true);if (b) {System.out.println("单词统计完成!");}} finally {// 结束的毫秒数long endTime = System.currentTimeMillis();System.out.println("Job<" + job.getJobName() + ">是否执行成功:" + job.isSuccessful() + "; 开始时间:" + startTime + "; 结束时间:" + endTime + "; 用时:" + (endTime - startTime) + "ms");}}}

注:使用

FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

运行需传入参数

结果如下图:

如果输入目录和输出目录一直,会报如下异常

在这里插入代码片`-09-03 16:37:03,938 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1129)) - session.id is deprecated. Instead, use dfs.metrics.session-id-09-03 16:37:03,942 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/C:/Users/Think/Desktop/input already existsat org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:267)at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:140)at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)at com.bjsxt.wc.WCRunner.main(WCRunner.java:77)Process finished with exit code 1`

解决办法:输入文本存在目录和输出目录不一致即可

mapreduce程序本地运行单词统计案例,输入输出数据放在hdfs中,增加修改如下配置

//本地运行,读取hdfs数据,并将数据提到hdfsconf.set("fs.defaultFS", "hdfs://node001:8020");FileInputFormat.setInputPaths(job, new Path("hdfs://node001:8020/wordcount/input"));FileOutputFormat.setOutputPath(job, new Path("hdfs://node001:8020/wordcount/output"));

集群模式运行:/weixin_43614067/article/details/108400938

本地提交给集群中运行:/weixin_43614067/article/details/108401227

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。