Indexing Attachment file to elastic search

左心房为你撑大大i 提交于 2019-11-30 23:45:09

http://es-cn.medcl.net/tutorials/2011/07/18/attachment-type-in-action.html

#!/bin/sh

coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file
curl -X POST "localhost:9200/test/attachment/" -d @json.file

First, you don't specify whether you have the attachment plugin installed. If not, you can do so with:

./bin/plugin -install mapper-attachments

You will need to restart ElasticSearch for it to load the plugin.

Then, as you do above, you map a field to have type attachment:

curl -XPUT 'http://127.0.0.1:9200/foo/?pretty=1'  -d '
{
   "mappings" : {
      "doc" : {
         "properties" : {
            "file" : {
               "type" : "attachment"
            }
         }
      }
   }
}
'

When you try to index a document, you need to encode the contents of your file in Base64. You could do this on the command line using the base64 command line utility. However, to be legal JSON, you also need to encode new lines, which you can do by piping the output from base64 through Perl:

curl -XPOST 'http://127.0.0.1:9200/foo/doc?pretty=1'  -d '
{
   "file" : '`base64 /path/to/file | perl -pe 's/\n/\\n/g'`'
}
'

Now you can search your file:

curl -XGET 'http://127.0.0.1:9200/foo/doc/_search?pretty=1'  -d '
{
   "query" : {
      "text" : {
         "file" : "text to look for"
      }
   }
}
'

See ElasticSearch attachment type for more.

This is a complete shell script implementation:

file_path='/path/to/file'
file=$(base64 $file_path | perl -pe 's/\n/\\n/g')
curl -XPUT "http://eshost.com:9200/index/type/" -d '{
    "file" : "content" : "'$file'"
}'

There is an alternative solution - plugin at http://elasticwarehouse.org. You can upload binary file using _ewupload?, read newly generated ID and update your different index with this reference.

Install plugin:

plugin -install elasticwarehouseplugin -u http://elasticwarehouse.org/elasticwarehouse/elasticsearch-elasticwarehouseplugin-1.2.2-1.7.0-with-dependencies.zip

Restart cluster, then:

curl -XPOST "http://127.0.0.1:9200/_ewupload?folder=/myfolder&filename=mybinaryfile.bin" --data-binary @mybinaryfile.bin

Sample response:

{"id":"nWvrczBcSEywHRBBBwfy2g","version":1,"created":true}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!