hive custom udf

2017. 7. 29. 14:35

hive에서 지원되는 built-in function 이외에 업무와 연관된 custom function이 필요할 경우 아래와 같은

순서로 진행을 함

1. IDE 또는 mvn에서 새로운 프로젝트 생성

2. 해당 프로젝트 lib 디렉토리 안에 아래 jar 추가

1) hadoop-comm-{haoop version}

ex) hadoop-common-2.6.0

2) hive-exec-{hive version}

ex) hive-exec-1.2.1

3. 아래와 같이 udf class 상속하여 custom class 작성

package com.test.customudf;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class TestString extends UDF {

public Text evaluate(final Text text) {

if (text == null) {

return null;

}

StringBuilder sBuilder = new StringBuilder(text.toString());

String reverse = sBuilder.reverse().toString();

return new Text(reverse);

}

4. 프로젝트 compile 및 jar로 export

5. hive가 돌고 있는 서버로 해당 jar copy

6. 해당 custom jar hive 추가

hive > add jar /tmp/customudf.jar

7. 등록되었는지 확인

hive > list jars;

8. hive에서 func 생성

hive> create function reverse as 'com.test.customudf.TestString'

9. 해당 func 사용

hive> select reverse('eriweirweir');

* beeline(hive2)에서 사용시

1. jar upload

hdfs dfs -put customudf.jar /tmp/

2. beeline 접속

beeline -u jdbc:hive2://localhost:10000/default

3. udf 생성

create function reverse as 'com.test.customudf.TestString' using jar 'hdfs://localhost/tmp/customudf.jar';

세모데