hadoop streaming

2017. 3. 6. 18:36

java 이외에 다른 프로그램으로 map/reduce 프로그램을 작성하고자 할때 사용 (ex, ruby, python...)

본질적으로 text processing에 적합함

1. ruby

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar \

-files test_map.rb, test_reduce.rb \

-input /tmp/test \

-output /tmp/output \

-mapper test_map.rb \

-combiner test_reduce.rb \

-reducer test_reduce.rb

( files : hadoop cluster에 배포하고 싶은 file )

2. python

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar \

-files test_map.py, test_reduce.py \

-input /tmp/test \

-output /tmp/output \

-mapper test_map.py \

-combiner test_reduce.py \

-reducer test_reduce.py

1) map

#!/usr/bin/env python

import re

import sys

for line in sys.stdin:

val = line.strip()

(year, temp, q) = (val[15:19], val[87:92], val[92:93])

if (temp != "+9999" and re.match("[01459]", q)):

print "%s\t%s" % (year, temp)

2. reduce

#!/usr/bin/env python

import sys

(last_key, max_val) = (None, -sys.maxint)

for line in sys.stdin:

(key, val) = line.strip().split("\t")

if last_key and last_key != key:

print "%s\t%s" % (last_key, max_val)

(last_key, max_val) = (key, int(val))

else:

(last_key, max_val) = (key, max(max_val, int(val)))

if last_key:

print "%s\t%s" % (last_key, max_val)

'NoSQL > Hadoop' 카테고리의 다른 글

yarn 구조 (0)	2017.03.08
hadoop read & write (0)	2017.03.06
hadoop locality (0)	2017.03.06
hadoop distcp (0)	2017.03.02
Hadoop streaming (0)	2016.07.01

세모데

hadoop streaming

'NoSQL > Hadoop' 카테고리의 다른 글

+ Recent posts

티스토리툴바