is there any way to import a json file(contains 100 documents) in elasticsearch server.?

前端 未结 9 1696
孤街浪徒
孤街浪徒 2020-12-04 10:39

Is there any way to import a JSON file (contains 100 documents) in elasticsearch server? I want to import a big json file into es-server..

相关标签:
9条回答
  • 2020-12-04 10:45

    Stream2es is the easiest way IMO.

    e.g. assuming a file "some.json" containing a list of JSON documents, one per line:

    curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es
    cat some.json | ./stream2es stdin --target "http://localhost:9200/my_index/my_type
    
    0 讨论(0)
  • 2020-12-04 10:47

    you can use Elasticsearch Gatherer Plugin

    The gatherer plugin for Elasticsearch is a framework for scalable data fetching and indexing. Content adapters are implemented in gatherer zip archives which are a special kind of plugins distributable over Elasticsearch nodes. They can receive job requests and execute them in local queues. Job states are maintained in a special index.

    This plugin is under development.

    Milestone 1 - deploy gatherer zips to nodes

    Milestone 2 - job specification and execution

    Milestone 3 - porting JDBC river to JDBC gatherer

    Milestone 4 - gatherer job distribution by load/queue length/node name, cron jobs

    Milestone 5 - more gatherers, more content adapters

    reference https://github.com/jprante/elasticsearch-gatherer

    0 讨论(0)
  • 2020-12-04 10:50

    One way is to create a bash script that does a bulk insert:

    curl -XPOST http://127.0.0.1:9200/myindexname/type/_bulk?pretty=true --data-binary @myjsonfile.json
    

    After you run the insert, run this command to get the count:

    curl http://127.0.0.1:9200/myindexname/type/_count
    
    0 讨论(0)
  • 2020-12-04 10:51

    Import no, but you can index the documents by using the ES API.

    You can use the index api to load each line (using some kind of code to read the file and make the curl calls) or the index bulk api to load them all. Assuming your data file can be formatted to work with it.

    Read more here : ES API

    A simple shell script would do the trick if you comfortable with shell something like this maybe (not tested):

    while read line
    do
    curl -XPOST 'http://localhost:9200/<indexname>/<typeofdoc>/' -d "$line"
    done <myfile.json
    

    Peronally, I would probably use Python either pyes or the elastic-search client.

    pyes on github
    elastic search python client

    Stream2es is also very useful for quickly loading data into es and may have a way to simply stream a file in. (I have not tested a file but have used it to load wikipedia doc for es perf testing)

    0 讨论(0)
  • 2020-12-04 10:52

    jq is a lightweight and flexible command-line JSON processor.

    Usage:

    cat file.json | jq -c '.[] | {"index": {"_index": "bookmarks", "_type": "bookmark", "_id": .id}}, .' | curl -XPOST localhost:9200/_bulk --data-binary @-

    We’re taking the file file.json and piping its contents to jq first with the -c flag to construct compact output. Here’s the nugget: We’re taking advantage of the fact that jq can construct not only one but multiple objects per line of input. For each line, we’re creating the control JSON Elasticsearch needs (with the ID from our original object) and creating a second line that is just our original JSON object (.).

    At this point we have our JSON formatted the way Elasticsearch’s bulk API expects it, so we just pipe it to curl which POSTs it to Elasticsearch!

    Credit goes to Kevin Marsh

    0 讨论(0)
  • 2020-12-04 10:55

    I'm sure someone wants this so I'll make it easy to find.

    FYI - This is using Node.js (essentially as a batch script) on the same server as the brand new ES instance. Ran it on 2 files with 4000 items each and it only took about 12 seconds on my shared virtual server. YMMV

    var elasticsearch = require('elasticsearch'),
        fs = require('fs'),
        pubs = JSON.parse(fs.readFileSync(__dirname + '/pubs.json')), // name of my first file to parse
        forms = JSON.parse(fs.readFileSync(__dirname + '/forms.json')); // and the second set
    var client = new elasticsearch.Client({  // default is fine for me, change as you see fit
      host: 'localhost:9200',
      log: 'trace'
    });
    
    for (var i = 0; i < pubs.length; i++ ) {
      client.create({
        index: "epubs", // name your index
        type: "pub", // describe the data thats getting created
        id: i, // increment ID every iteration - I already sorted mine but not a requirement
        body: pubs[i] // *** THIS ASSUMES YOUR DATA FILE IS FORMATTED LIKE SO: [{prop: val, prop2: val2}, {prop:...}, {prop:...}] - I converted mine from a CSV so pubs[i] is the current object {prop:..., prop2:...}
      }, function(error, response) {
        if (error) {
          console.error(error);
          return;
        }
        else {
        console.log(response);  //  I don't recommend this but I like having my console flooded with stuff.  It looks cool.  Like I'm compiling a kernel really fast.
        }
      });
    }
    
    for (var a = 0; a < forms.length; a++ ) {  // Same stuff here, just slight changes in type and variables
      client.create({
        index: "epubs",
        type: "form",
        id: a,
        body: forms[a]
      }, function(error, response) {
        if (error) {
          console.error(error);
          return;
        }
        else {
        console.log(response);
        }
      });
    }
    

    Hope I can help more than just myself with this. Not rocket science but may save someone 10 minutes.

    Cheers

    0 讨论(0)
提交回复
热议问题