1. hadoop command 라인 실행
hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-*streaming*.jar
-file /test/mapper.py -mapper /test/mapper.py
-file /test/reducer.py -reducer /test/reducer.py
-input hdfs://nn:8020/hdfs/test/input/
-output hdfs://nn:8020/hdfs/test/output/
2. python streaming을 아래와 같이 oozie job으로 변경
<action name="TestAction">
<map-reduce>
<job-tracker>nn1:8032</job-tracker>
<name-node>hdfs://nn:8020</name-node>
<prepare>
<delete path="hdfs://nn:8020/hdfs/test/output"/>
</prepare>
<streaming>
<mapper>python mapper.py</mapper>
<reducer>python reducer.py</reducer>
</streaming>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/hdfs/test/input</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/hdfs/test/output</value>
</property>
</configuration>
<file>wfDir/mapper.py#mapper.py</file>
<file>wfDir/redcer.py#reducer.py</file>
</map-reduce>
<ok to="success"/>
<error to="fail"/>
</action>
'NoSQL > oozie' 카테고리의 다른 글
oozie bundle (0) | 2017.03.03 |
---|---|
oozie current() versus latest() comparison (0) | 2017.03.03 |
oozie decision (0) | 2017.03.02 |
oozie hive job (0) | 2017.03.02 |
oozie workflow and coordinator (0) | 2017.03.02 |