1. hadoop command 라인 실행

   hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-*streaming*.jar

      -file /test/mapper.py -mapper /test/mapper.py

      -file /test/reducer.py -reducer /test/reducer.py

      -input hdfs://nn:8020/hdfs/test/input/

      -output hdfs://nn:8020/hdfs/test/output/


2. python streaming을 아래와 같이 oozie job으로 변경

<action name="TestAction">

<map-reduce>

<job-tracker>nn1:8032</job-tracker>

<name-node>hdfs://nn:8020</name-node>

<prepare>

<delete path="hdfs://nn:8020/hdfs/test/output"/>

</prepare>

<streaming>

<mapper>python mapper.py</mapper>

<reducer>python reducer.py</reducer>

</streaming>

<configuration>

<property>

<name>mapred.input.dir</name>

<value>/hdfs/test/input</value>

</property>

<property>

<name>mapred.output.dir</name>

<value>/hdfs/test/output</value>

</property>

</configuration>

<file>wfDir/mapper.py#mapper.py</file>

<file>wfDir/redcer.py#reducer.py</file>

</map-reduce>

<ok to="success"/>

<error to="fail"/>

</action>


'NoSQL > oozie' 카테고리의 다른 글

oozie bundle  (0) 2017.03.03
oozie current() versus latest() comparison  (0) 2017.03.03
oozie decision  (0) 2017.03.02
oozie hive job  (0) 2017.03.02
oozie workflow and coordinator  (0) 2017.03.02

+ Recent posts