Hive 파티션

2016. 6. 9. 11:52

1. 파티션

큰 규모에 데이터를 물리적으로 나누어서 성능을 개선하고자 하는 역할로 파티션을

사용함

create [external] table [if not exists] [database_name.]table_name

[(column_name data_type [ COMMENT column_comment], ...)]

[PARTITIONED BY (column_name data_type [COMMENT column_comment], ...)];

1) partitioning a managed table

- conutry 기준 파티션

create table customer

(

id string,

name string,

sex string,

state string

) partitoned by (country string);

생성시 디렉토리 구조

/usr/hive/warehause

/test.db

/customer

/country=KR

- conutry, state 기준 파티션

create table customer

(

id string,

name string,

sex string

) partitoned by (country string);

* 파티션키는 table 컬럼에 포함시키지 못함.

생성시 디렉토리 구조

/usr/hive/warehause

/test.db

/customer

/country=KR

/state=TB

hive> set hive.mapred.mode=strict;

hive> select * from customer; --- strict모드에서는 table full scan이 안됨

hive> set hive.mapred.mode=nonstrict;

hive> select * from customer;

hive> show partitions customer --- 파티션 테이블 정보 all list

show partitions customer partition(country = 'KR') -- 해당 country

hive> describe customer --- 파티션 테이블 컬럼 정보 출력

-- ALTER 구문

alter table table_name add [IF NOT EXISTS] PARTITION partition_spec

[LOCATION 'loc1'] partition_spec [LOCATION 'loc2'] ...;

partiton_spec:

: (partition_column = partition_column_value, partition_column = partition_colum_value, ...)

alter table test add partition (dt='20160530') location '/test/20160530'

-- RENAME 구문

alter table table_name PARTITION partition_spec RENAME TO partition partition_spec;

-- EXCHANGE 구문

alter table tablen1 exchange partition (partition_spec) with table tablen2;

ex) alter table tablen1 exchange partition (ct='1') with table tablen2;

(table@ct=1 테이블이 없어야 하고, tablen1과 tablen2 구조가 같아야 함)

-- DROP 구문

alter table tablen1 drop [if exists] partition partition_spec[, partition partition_spec, ...][ignore protection][purge];

-- LOAD 구문

load data [local] inpath 'filepath' [overwrite] into table tablename

[partition (partcolumn1=v1, partcolum2=v2 ...)]

-- INSERT 구문

insert overwrite table tablename [ partition (partcolumn1=v1, ...)]

select ........ from tablename;

insert into table tablename [ partition (partcolumn=v1

select ........ from tablename;

* dynamic partition을 사용할때문 insert구문에서 partiotn 컬럼만 정의해서 사용

set hive.exec.dynamic.partition = true;

set hive.exec.dynamic.partition.mode = nonstrict;

2) partitioning an external table

파티션을 삭제할때 data file이 삭제 되지 않을것을 제외하고 managed랑 동일

create external table customer

(

id string,

name string,

sex string,

state string

) partitoned by (country string);

-- ALTER 구문

alter table customer add partition(country='US') location '/user/hive/~~~'

-- refresh to add new partition

msck repair table test;

3) Bucketing

파티션키를 unique한 컬럼을 선택하면 파티션 개수가 많아져서 잘 동작하지 않는

문제를 해결하기 위해 특정 개수만큼 지정할수 있도록 제공해 주는 기능 (hash)

Bucket number = hash_function(bucketing_column) mod num_buckets

set hive.enforce.bucketing=true (default : false)

create [external] table [db_name.]tablename

[(col_name datatype [comment col_comment], ...)]

clustered by (col_name datatype [comment col_comment], ...)

into n buckets;

* clustered by 파티션키 정의

create table customer_bucket

(

id string,

name string,

sex string,

state string

) clustered by (id) into 10 buckets;

insert into customer_bucket select * from customer;

* bucket 방식에 파티션 테이블은 파티션 컬럼이 파티션 테이블에 포함 가능하며,

load data는 사용할수 없으므로 insert 구문을 사용하면 됨.

'NoSQL > Hive' 카테고리의 다른 글

Hive 구성 (0)	2016.06.13
Hive metastore (0)	2016.06.13
Hive 구성 (0)	2016.06.10
Hive Data Types (0)	2016.06.09
Troubleshooting hive (0)	2016.06.08

세모데

Hive 파티션

'NoSQL > Hive' 카테고리의 다른 글

+ Recent posts

티스토리툴바