hadoop map/reduce job 실행

2019. 7. 11. 19:37

1. test 프로그램

1) mapper

hdfs 파일시스템안에 데이터 파일을 읽어 단어별로 분리한후 해당 단어별(key)로 1값(value) 부여

import org.apache.Hadoop.io.IntWritable;
import org.apache.Hadoop.io.LongWritable;
import org.apache.Hadoop.io.Text;
import org.apache.Hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final IntWritable ONE = new IntWritable(1);

    @Override
    protected void map(LongWritable offset, Text line, Context context)
            throws IOException, InterruptedException {
        String[] result = line.toString().split("   ");

        for (String word : result) {
            context.write(new Text(word), ONE);
        }
    }
}

2) reducer

각 key별로 분류된 데이터를 읽어 key별 count 수행한후 저장

import org.apache.Hadoop.io.IntWritable;
import org.apache.Hadoop.io.Text;
import org.apache.Hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    protected void reduce(Text key, Iterable values, Context context)
            throws IOException, InterruptedException {

        int count = 0;
        for (IntWritable current : values) {
            count += current.get();

        }
        context.write(key, new IntWritable(count));
    }

}

3) Driver

실행하기 위해 Driver 코드를 작성하여 mapper와 reducer 지정, input 파일과 output 디렉토리 지정

import org.apache.Hadoop.conf.Configuration;
import org.apache.Hadoop.conf.Configured;
import org.apache.Hadoop.fs.Path;
import org.apache.Hadoop.io.IntWritable;
import org.apache.Hadoop.io.Text;
import org.apache.Hadoop.mapreduce.Job;
import org.apache.Hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.Hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.Hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.Hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.Hadoop.util.Tool;
import org.apache.Hadoop.util.ToolRunner;

public class TestDriver extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), (Tool) new TestDriver(), args);
        System.exit(res);
    }

    public int run(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "WordCount");

        job.setJarByClass(TestDriver.class);

        if (args.length < 2) {
            System.out.println("Jar requires 2 paramaters :  \""
                    + job.getJar()
                    + " input_path output_path");
            return 1;
        }

        job.setMapperClass(WordcountMapper.class);

        job.setReducerClass(WordcountReducer.class);

        job.setCombinerClass(WordcountReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);

        job.setOutputFormatClass(TextOutputFormat.class);

        Path filePath = new Path(args[0]);
        FileInputFormat.setInputPaths(job, filePath);

        Path outputPath = new Path(args[1]);
        FileOutputFormat.setOutputPath(job, outputPath);

        job.waitForCompletion(true);
        return 0;
    }
}

4) compile

export CLASSPATH="하둡 라이브러리 패스지정"

(ex : export CLASSPATH=`hadoop classpath` )

javac -d . WordcountMapper.java WordcountReducer.java TestDriver.java

5) jar 생성

vim Manifest.txt

>> Main-Class: TestDriver

jar cfm test.jar Manifest.txt *.class

(packge 정의시(test?) : jar cfm test.jar Manifest.txt test)

2. task 실행

test.jar 파일안에 map/reduce 클래스 수행 (input : /tmp/input_data, output 위치 : /tmp/output)

$HADOOP_HOME/bin/hadoop jar test.jar test.TestDriver /tmp/input_data /tmp/output

'NoSQL > Hadoop' 카테고리의 다른 글

map/reduce optimization (0)	2019.07.17
hadoop os tunning (0)	2019.07.16
yarn admin (0)	2019.07.09
yarn log (0)	2019.07.09
yarn command (0)	2019.07.09

세모데

hadoop map/reduce job 실행

'NoSQL > Hadoop' 카테고리의 다른 글

+ Recent posts

티스토리툴바