hadoop第一个程序WordCount.java的编译运行过程

java是hadoop开发的标准官方语言,本文下载了官方的WordCount.java并对其进行了编译和打包,然后使用测试数据运行了该hadoop程序。

 

这里假定已经装好了hadoop的环境,在Linux下运行hadoop命令能够正常执行;

下载java版本的WordCount.java程序。

 

将WordCount.java复制到linux下的一个目录,这里我复制到/home/crazyant/hadoop_wordcount

[crazyant@dev.mechine hadoop_wordcount]$ ll

total 4

-rwxr--r--  1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

在该目录(/home/crazyant/hadoop_wordcount)下创建wordcount_classes目录,用于存放编译WordCount.java生成的class文件。

[crazyant@dev.mechine hadoop_wordcount]$ mkdir wordcount_classes

[crazyant@dev.mechine hadoop_wordcount]$ ll

total 8

drwxrwxr-x  2 crazyant crazyant 4096 Aug 16 20:07 wordcount_classes

-rwxr--r--  1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

编译WordCount.java文件,其中-classpath选项表示要引用hadoop官方的包,-d选项表示要将编译后的class文件生成的目标目录。

[crazyant@dev.mechine hadoop_wordcount]$ javac -classpath /home/crazyant/app/hadoop/hadoop-2-core.jar -d wordcount_classes WordCount.java

[crazyant@dev.mechine hadoop_wordcount]$ ll -R

.:

total 8

drwxrwxr-x  3 crazyant crazyant 4096 Aug 16 20:09 wordcount_classes

-rwxr--r--  1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

 

./wordcount_classes:

total 4

drwxrwxr-x  3 crazyant crazyant 4096 Aug 16 20:09 org

 

./wordcount_classes/org:

total 4

drwxrwxr-x  2 crazyant crazyant 4096 Aug 16 20:09 myorg

 

./wordcount_classes/org/myorg:

total 12

-rw-rw-r--  1 crazyant crazyant 1546 Aug 16 20:09 WordCount.class

-rw-rw-r--  1 crazyant crazyant 1938 Aug 16 20:09 WordCount$Map.class

-rw-rw-r--  1 crazyant crazyant 1611 Aug 16 20:09 WordCount$Reduce.class

然后将编译后的class文件打包:

[crazyant@dev.mechine hadoop_wordcount]$ jar -cvf wordcount.jar -C wordcount_classes/ .

added manifest

adding: org/(in = 0) (out= 0)(stored 0%)

adding: org/myorg/(in = 0) (out= 0)(stored 0%)

adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)

adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 649)(deflated 59%)

adding: org/myorg/WordCount.class(in = 1546) (out= 749)(deflated 51%)

[crazyant@dev.mechine hadoop_wordcount]$ ll

total 12

drwxrwxr-x  3 crazyant crazyant 4096 Aug 16 20:09 wordcount_classes

-rw-rw-r--  1 crazyant crazyant 3169 Aug 16 20:11 wordcount.jar

-rwxr--r--  1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

 

在本地用echo生成一个文件,用于输入数据:

[crazyant@dev.mechine hadoop_wordcount]$ echo "hello world, hello crazyant, i am the ant, i am your brother" > inputfile

[crazyant@dev.mechine hadoop_wordcount]$ more inputfile

hello world, hello crazyant, i am the ant, i am your brother

在hadoop上建立一个目录,里面建立输入文件的目录

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -mkdir /app/word_count/input

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count

Found 1 items

drwxr-xr-x   3 czt czt          0 2013-08-16 20:16 /app/word_count/input

 

将本地刚刚写的的inputfile上传到hadoop上的input目录

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -put inputfile /app/word_count/input

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count/input

Found 1 items

-rw-r--r--   3 czt czt         61 2013-08-16 20:18 /app/word_count/input/inputfile

 

运行jar,以建立的Input目录作为输入参数

[crazyant@dev.mechine hadoop_wordcount]$ hadoop jar wordcount.jar org.myorg.WordCount /app/word_count/input /app/word_count/output

13/08/16 20:19:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/08/16 20:19:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library

13/08/16 20:19:40 INFO compress.LzoCodec: Successfully loaded & initialized native-lzo library

13/08/16 20:19:40 INFO compress.LzmaCodec: Successfully loaded & initialized native-lzma library

13/08/16 20:19:40 INFO compress.QuickLzCodec: Successfully loaded & initialized native-quicklz library

13/08/16 20:19:40 INFO mapred.FileInputFormat: Total input paths to process : 1

13/08/16 20:19:41 INFO mapred.JobClient: splits size : 61

13/08/16 20:19:41 INFO mapred.JobClient: Running job: job_20130813122541_105844

13/08/16 20:19:43 INFO mapred.JobClient:  map 0% reduce 0%

13/08/16 20:19:57 INFO mapred.JobClient:  map 24% reduce 0%

13/08/16 20:20:07 INFO mapred.JobClient:  map 93% reduce 0%

13/08/16 20:20:16 INFO mapred.JobClient:  map 100% reduce 1%

13/08/16 20:20:26 INFO mapred.JobClient:  map 100% reduce 61%

13/08/16 20:20:36 INFO mapred.JobClient:  map 100% reduce 89%

13/08/16 20:20:47 INFO mapred.JobClient:  map 100% reduce 96%

13/08/16 20:20:57 INFO mapred.JobClient:  map 100% reduce 98%

13/08/16 20:21:00 INFO mapred.JobClient: Updating completed job! Ignoring ...

13/08/16 20:21:00 INFO mapred.JobClient: Updating completed job! Ignoring ...

13/08/16 20:21:00 INFO mapred.JobClient: Job complete: job_20130813122541_105844

13/08/16 20:21:00 INFO mapred.JobClient: Counters: 19

13/08/16 20:21:00 INFO mapred.JobClient:   File Systems

13/08/16 20:21:00 INFO mapred.JobClient:     HDFS bytes read=1951

13/08/16 20:21:00 INFO mapred.JobClient:     HDFS bytes written=68

13/08/16 20:21:00 INFO mapred.JobClient:     Local bytes read=5174715

13/08/16 20:21:00 INFO mapred.JobClient:     Local bytes written=256814

13/08/16 20:21:00 INFO mapred.JobClient:   Job Counters

13/08/16 20:21:00 INFO mapred.JobClient:     Launched reduce tasks=100

13/08/16 20:21:00 INFO mapred.JobClient:     Rack-local map tasks=61

13/08/16 20:21:00 INFO mapred.JobClient:     ORIGINAL_REDUCES=100

13/08/16 20:21:00 INFO mapred.JobClient:     Launched map tasks=61

13/08/16 20:21:00 INFO mapred.JobClient:     MISS_SCHEDULED_REDUCES=15

13/08/16 20:21:00 INFO mapred.JobClient:   TASK_STATISTICS

13/08/16 20:21:00 INFO mapred.JobClient:     Total Map Slot Time=34

13/08/16 20:21:00 INFO mapred.JobClient:     Attempt_0 Map Task Count=61

13/08/16 20:21:00 INFO mapred.JobClient:     Total Reduce Slot Time=892

13/08/16 20:21:00 INFO mapred.JobClient:   Map-Reduce Framework

13/08/16 20:21:00 INFO mapred.JobClient:     Reduce input groups=9

13/08/16 20:21:00 INFO mapred.JobClient:     Combine output records=0

13/08/16 20:21:00 INFO mapred.JobClient:     Map input records=1

13/08/16 20:21:00 INFO mapred.JobClient:     Reduce output records=9

13/08/16 20:21:00 INFO mapred.JobClient:     Map input bytes=61

13/08/16 20:21:00 INFO mapred.JobClient:     Combine input records=0

13/08/16 20:21:00 INFO mapred.JobClient:     Reduce input records=9

查看output目录是否有结果

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count/output                                                    Found 100 items

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00000

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00001

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00002

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00003

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00004

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00005

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00006

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00007

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00008

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00009

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00010

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00011

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00012

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00013

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00014

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00015

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00016

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00017

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00018

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00019

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00020

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00021

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00022

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00023

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00024

-rw-r--r--   3 czt czt          0 2013-08-16 20:20 /app/word_count/output/part-00025

 

将该目录下所有文本文件合并后下载到本地

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -getmerge /app/word_count/output wordcount_result

[crazyant@dev.mechine hadoop_wordcount]$ ls

inputfile  wordcount_classes  wordcount.jar  WordCount.java  wordcount_result

查看一下下载下来的计算结果

[crazyant@dev.mechine hadoop_wordcount]$ more wordcount_result

i       2

your    1

crazyant,       1

brother 1

hello   2

am      2

world,  1

the     1

ant,    1

 

统计结果正确;

 

参考文章:http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Example%3A+WordCount+v1.0

相关推荐

2 thoughts on “hadoop第一个程序WordCount.java的编译运行过程”

Leave a Comment