Elasticsearch scan and scroll - add to new index

问题

Elasticsearch and command line programming noobie question.

I have elasticsearch set up locally on my computer and want to pull documents from a server that uses a different version of es using the scan and scroll api and add them into my index. I am having trouble figuring out how to do this with the bulk api for es.

Right now in my testing phase I am just pulling a few documents from the server using the following code (which works):

   http MY-OLD-ES.com:9200/INDEX/TYPE/_search?size=1000 | jq   .hits.hits[] -c | while read x; do id="`echo "$x" | jq -r ._id`"; index="`echo "$x" | jq -r ._index`"; type="`echo "$x" | jq -r ._type`"; doc="`echo "$x" | jq ._source`"; http put "localhost:9200/junk-$index/$type/$id" <<<"$doc"; done

Any tips on how scan and scroll works (noob and a bit confused). So far know I can scroll and get a scroll id, but I'm unclear what to do with the scroll id. If I call

http get http://MY-OLD-ES.com:9200/my_index/_search?scroll=1m&search_type=scan&size=10

I'll receive a scroll id. Can this be piped in and parsed the same way? Additionally, I believe I'll need a while loop to tell it to keep requesting. How exactly should I go about this?

Thanks!

回答1:

The scan and scroll documentation explains it pretty clearly. After you get the scroll_id (a long base64 encoded string), you pass it in with the body of the request. With curl the request would looks something like this:

curl -XGET 'http://MY-OLD-ES.com:9200/_search/scroll?scroll=1m' -d '
c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0 
NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy
UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW
xfaGl0czoxOw==
'

Notice that while the first request to open the scroll was to /my_index/_search, the second request to read the data was to /_search/scroll. Each time you call that, passing the ?scroll=1m querystring, it refreshes the timeout before the scroll is automatically closed.

There are two more things to be aware of:

The size you pass when opening the scroll applies to each shard, so you will get size multiplied by the number of shards in your index on each request.
Each request to /_search/scroll will return a new scroll_id which you must pass on the next call to get the next batch of results. You can't just keep calling with the same scroll_id.

It is complete when no hits are returned in the scroll request.

来源：https://stackoverflow.com/questions/28844530/elasticsearch-scan-and-scroll-add-to-new-index

标签

bash

http

ElasticSearch

kibana-4