elasticsearch 사용법

2016. 6. 10. 23:47

1. Elasticsearch 용어

1) index

- collection of different types of document under on logical namespace

(rdb에서 schema와 같은 역할)

- shard의 수 및 replica를 설정

- multitenant 지원 하고 자유로이 생성 및 삭제가능

2) type

- logical collection of documents like the same entity

(rdb에서 table와 같은 역할)

- table과 같은 domain objects 표현 (client, company, user...)

3) document

- logical unit that represents the instance of an entity

(rdb에서 row와 같은 역할)

- json object

4) field

- multiple fields that are organized as JSON key / value pairs.

(rdb에서 columns와 같은 역할)

2. Elasticsearch Rest API

CRUD (create, update, retrieve, delete)을 API로 지원

1) document 생성

curl -XPUT http://localhost:9200/test/users/1?pretty -d '{

"id" : "test1",

"firstname" : "first",

"lastname" : "name",

"roles" : [ "admin", "guest"]

http://<es-host>:<port>/<index_name>/<type_name>/<id>

2) document 읽기

curl -XGET http://localhost:9200/test/users/1?pretty

document metadata와 document 내용을 return

3) document update

curl -XPOST http://localhost:9200/test/users/1/_update?pretty -d {

"doc": {

"age" : 20

}

document은 변경되지 않기 때문에 get-modify-update 과정이 발생함

4) document mapping

curl -XPOST http://localhost:9200/test/users/_mapping?pretty

type mapping에 대한 정보

{

"test" : {

"mappings" : {

"users" : {

"properties" {

"id" : {

"type": "long"

- mapping 정보 수정

curl -XPUT http://localhost:9200/test/users/_mapping?pretty -d '{

"properties" {

"first": {

"type" : "string"

"index" : "not_analyzed" --> disable the analysis

}

"address" : {

"type" : "object",

"properties": {

"city" : {

"type" : "string"

}

"region" : {

"type" : "string"

}

* 들어오는 데이터에 따라 date type을 자동으로 맵핑 설정

curl -XPUT http://localhost:9200/test1/type1/_mapping -d '{

"dynamic_date_format" : ["MM/dd/yy HH:mm"]

- mapping templates

template를 생성하여 같은 이름 패턴을 사용하는 index에 자동 적용

curl -XPUT http://localhost:9200/_template/test-template?pretty -d '{

"template" : "test*",

"mappings" : {

"users" : {

}

5) document delete

-- 전체 지우기

curl -XDELETE 'http://localhost:9200/_all' or 'http://localhost:9200/*'

-- index 삭제

curl -XDELETE 'http://localhost:9200/test'

-- 특정 type document 삭제

curl -XDELETE 'http://localhost:9200/test/cup/_query -d

"query" : {

"bool" : {

"must" : [

{ "match_all" : {} }

]

}

3. Date types

1) string

2) byte, short, int, long

3) float, double

4) boolean

5) date

6) array, nested loops, ipv4, geo points, geo shape

* 1~ 5 core types

4. index processing

Analyzers

1) custom analyzer 생성 및 적용

index가 생성될때 custom analyzer 생성하여 적용할수 있음

(whitespace, keyword. lowercase, standard, n-gram etc)

curl -XPUT http://localhost:9200/test?pretty -d '{

"settings" : {

"analysis" : {

"analyzer": {

"test_analyzer": {

"tokenizer" : "whitespace",

"filter": [ "simple", "stop", "lowercase", "snowball"]

}

5. elasticsearch searching 방식

1) uri search

curl -XGET http://localhost:9200/test/users/_search?pretty=true&q=first:park

2) match all

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query" : {

"match_all" : {}

}

3) term

search term에 정확히 매치되는 document만 search

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query" : {

"term" : {

"role" : {

"value" : "guest"

}

"size" : 10

4) Boolean

must, must_not, should 사용하여 and와 or 연산을 구현하여 search

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query" : {

"bool": {

"must" : [

{

"term" : {

"city" : {

"value" : "seoul"

}

"should": [

"terms": {

"role": ["guest"]

}

]

}

5) match

or 연산자처럼 동작하는 boolean type

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query": {

"match" : {

"last" : {

"query" : "test data"

}

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query": {

"match" : {

"last" : {

"query" : "test data",

"type" : "phrase"

}

6) range

gt, gte, lt, lte를 사용하여 range 검색

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query": {

"range" : {

"age" : {

"gte" : 15,

"lte" : 30

}

7) wildcard

not_analyzed 필드 사용

( * : all characters match, ? : single character match)

curl -XPOST http://localhost:9200/test/users/_search?pertty -d '{

"query" : {

"wildcard" : {

"first" : {

"value" : "k*"

}

8) filers (query보다 filter가 경우에 따라 빠름)

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query" : {

"filtered" : {

"filter" : }

}

9) exists

non-null인 컬럼에 일치하는 값이 존재하는지 체크

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query" : {

"filtered" : {

"filter" {

"exists" : {

"field" : "test"

}

10) geo distance

특정 위치(지역)안에 document 검색

curl -XPOST http://localhost:9200/test/users/_search?petty -d '{

"query" : {

"filtered" : {

"query" : {

"match_all" : {}

"filter" : {

"geo_distance" : {

"distance" : "50km",

"region" : {

"lat" : 33.10,

"lon" : 55.23

}

5. aggregations

[표기법]

"aggregations" : {

"<aggregation_name>" : {

"<aggregation_type>" : {

<aggregation_body>

}

[,"meta" : { [<meta_data_body>] } ]?

[,"aggregations" : { [<sub_aggregation>]+ } ]?

}

[,"<aggregation_name_2>" : { ... } ]*

}

3가지 형태에 aggregation 지원

1) Bucketing : SQL의 Group by와 같으며, 특정 기준으로 buckets에 수행

- terms aggregation

top n 결과를 가져오기 위해 size 사용

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"aggs" : {

"cand_by_region" : {

"field" : "region",

"size" : 3

}

* 하위 N 결과 (count로 ascending..)

"terms" : {

"field" : "region",

"size" : 3,

"order": {

"_count" : "asc"

}

*terms aggregation 사용시 해당 필드를 not_analyzed로 설정

- histograms

consistent interval를 가지고 aggreation 수행

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"aggs" : {

"cand_exp" : {

"histogram" : {

"field" : "work time",

"interval" : 5,

"min_doc_count" : 0

}

- range

numeric or date type으로 정확한 ranges aggreation 수행

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"aggs" : {

"cand_exp" : {

"range" : {

"field" : "work time",

"ranges" : [

{

"from" : 1,

"to" : 4

{

}

- geo distance

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"query" : {

"term" {

"role" : {

"value" : "admin"

}

"aggs" : {

"test_agg" : {

"geo_distance" : {

"field" : "geo",

"origin" : "34 23",

"unit" : "km",

"ranges" : [

{

"from" : 10,

"to" : 20

{

"from" : 20,

"to" : 50

}

]

}

2) Metrics : SQL의 aggregate 함수와 유사 (simple, count, average, sum..)

- sub-aggregations

select city, role, avg(work_time)

from users c left join work on id = cid

group by city, role;

위에 rdb select문을 elasticsearch로 구성

curl -XPOST http://localhost:9200/test/users/_search?pretty -d '{

"aggs" : {

"groupby_city" : {

"terms" : {

"field" : "city",

"size" : 5

"aggs" : {

"groupby_role" : {

"terms" : {

"field" : "role",

"size" : 5

}

"aggs" : {

"average" : {

"avg" : {

"field" : "work_time"

}

"size" : 1

3) Pipeline

다른 aggregations 작업에 추가적으로 aggeraton을 적용하여 결과 생성

the_sum 적용후 추가적으로 the_movavq 적용후 결과생성

{

"my_date_histo":{

"date_histogram":{

"field":"timestamp",

"interval":"day"

"aggs":{

"the_sum":{

"sum":{ "field": "lemmings" }

"the_movavg":{

"moving_avg":{ "buckets_path": "the_sum" }

}

기타) value 스크립트 사용법

{

"aggs" : {

...

"aggs" : {

"avg_corrected_grade" : {

"avg" : {

"field" : "grade",

"script" : {

"inline": "_value * correction",

"params" : {

"correction" : 1.2

}

6. index

1) index 생성

curl -XPUT 'http://localhost:9200/twitter/' -d '{

"settings" : {

"index" : {

"number_of_shards" : 3,

"number_of_replicas" : 2

}

2) index 맵핑

curl -XPOST localhost:9200/test -d '{

"settings" : {

"number_of_shards" : 1

"mappings" : {

"type1" : {

"properties" : {

"field1" : { "type" : "string", "index" : "not_analyzed" }

}

3) index Alias

curl -XPUT localhost:9200/test -d '{

"aliases" : {

"alias_1" : {},

"alias_2" : {

"filter" : {

"term" : {"user" : "kimchy" }

"routing" : "kimchy"

}

4) index create date

index 생성시 create_date가 자동으로 생성되나 아래와 같이 수동으로 설정가능

curl -XPUT localhost:9200/test -d '{

"creation_date" : 1407751337000

5) index delete

twitter 인덱스 삭제.

curl -XDELETE 'http://localhost:9200/twitter/'

6) index get

curl -XGET 'http://localhost:9200/twitter/'

curl -XGET 'http://localhost:9200/twitter/_settings,_mappings' -- filter, meta 정보 get

7) index exists

curl -XHEAD -i 'http://localhost:9200/twitter' -- 인덱스

curl -XHEAD -i 'http://localhost:9200/twitter/tweet' -- type 존재체크

8) index open/close

curl -XPOST 'localhost:9200/my_index/_close'

curl -XPOST 'localhost:9200/my_index/_open'

9) index put mapping

-- 신규 tweet type과 message 필드 추가

PUT twitter

{

"mappings": {

"tweet": {

"properties": {

"message": {

"type": "string"

}

-- user 타입에 name 필드 추가

PUT twitter/_mapping/user

{

"properties": {

"name": {

"type": "string"

}

10) index template

PUT /_template/template_1

{

"template": "te*",

"settings": {

"number_of_shards": 1

"mappings": {

"type1": {

"_source": {

"enabled": false

"properties": {

"host_name": {

"type": "string",

"index": "not_analyzed"

"created_at": {

"type": "date",

"format": "EEE MMM dd HH:mm:ss Z YYYY"

}

인덱스 생성시 적용

curl -XDELETE localhost:9200/_template/template_1 -- 삭제

curl -XGET localhost:9200/_template/template_1 -- 정보get

11) index 기타

- index stats

curl localhost:9200/index1,index2/_stats

- index segments

curl -XGET 'http://localhost:9200/test1,test2/_segments'

- index recovery

index recovery 정보 표시

curl -XGET http://localhost:9200/index1,index2/_recovery?pretty&human

- index shard 정보 표시

curl -XGET 'http://localhost:9200/test1,test2/_shard_stores'

- index clear cache

curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_cache/clear'

- index flush

메모리에 완료된 트랜잭션 데이터를 디스크로 이동

curl -XPOST 'http://localhost:9200/twitter/_flush'

- index merge

curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_forcemerge'

- index cat

verbose형태로 master 인덱스 내용 cat

curl 'localhost:9200/_cat/master?v'

'NoSQL > Elasticsearch' 카테고리의 다른 글

elasticsearch optimize api (0)	2016.06.15
elasticsearch warmers (0)	2016.06.15
elasticsearch cluster 상태확인 (0)	2016.06.14
elasticsearch 구성 (0)	2016.06.13
es-hadoop (0)	2016.06.07

세모데

elasticsearch 사용법

'NoSQL > Elasticsearch' 카테고리의 다른 글

+ Recent posts

티스토리툴바