MongoDB 4.x Real Time Sync to ElasticSearch 6.x +

喜夏-厌秋 提交于 2021-02-19 02:56:06

问题


I'm trying to find an easy way to sync data in mongoDB 4.x, to elasticsearch 6.x . My use case is for partial text search that is supported by elasticsearch but no supported by mongodb. MongoDB is the primary database for my applications.

All solutions i found seem outdated and only support older version of mongoDB / elasticsearch. These include mongodb-connector, mongodb river

What is the best tool to use so that any changes (CRUD) to data in mongoDB is automatically synced to elasticsearch?


回答1:


There is tool called Monstache for migrating teh MOngoDB data to ElasticSearch in real time. This tool supports the latest MongoDB Version.

Its sync daemon written in Go that continously indexes your MongoDB collections into Elasticsearch. Monstache gives you the ability to use Elasticsearch to do complex searches and aggregations of your MongoDB data and easily build realtime Kibana visualizations and dashboards.

Its a go daemon that syncs mongodb to elasticsearch in realtime. Its the Monstache. Its available at : Monstache The monstache required mongodb to run in replication mode.

Below the initial setup to configure and use it.

Step 1:

C:\Program Files\MongoDB\Server\4.0\bin>mongod --smallfiles --oplogSize 50 --replSet test

Step 2 :

C:\Program Files\MongoDB\Server\4.0\bin>mongo

C:\Program Files\MongoDB\Server\4.0\bin>mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Server has startup warnings:
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Start the server with --bind_ip <address> to specify which IP
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          addresses it should serve responses from, or with --bind_ip_all to
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          bind to all interfaces. If this behavior is desired, start the
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
MongoDB Enterprise test:PRIMARY>

Step 3 : Verify the replication.

MongoDB Enterprise test:PRIMARY> rs.status();
{
        "set" : "test",
        "date" : ISODate("2019-01-18T11:39:00.380Z"),
        "myState" : 1,
        "term" : NumberLong(2),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                }
        },
        "lastStableCheckpointTimestamp" : Timestamp(1547811517, 1),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "localhost:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 736,
                        "optime" : {
                                "ts" : Timestamp(1547811537, 1),
                                "t" : NumberLong(2)
                        },
                        "optimeDate" : ISODate("2019-01-18T11:38:57Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1547810805, 1),
                        "electionDate" : ISODate("2019-01-18T11:26:45Z"),
                        "configVersion" : 1,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1547811537, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1547811537, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
MongoDB Enterprise test:PRIMARY>

Step 4. Download the "https://github.com/rwynn/monstache/releases". Unzip the download and adjust your PATH variable to include the path to the folder for your platform. GO to cmd and type "monstache -v" # 4.13.1 Monstache uses the TOML format for its configuration. Configure the file for migration named config.toml

Step 5.

My config.toml -->

mongo-url = "mongodb://127.0.0.1:27017/?replicaSet=test"
elasticsearch-urls = ["http://localhost:9200"]

direct-read-namespaces = [ "admin.users" ]

gzip = true
stats = true
index-stats = true

elasticsearch-max-conns = 4
elasticsearch-max-seconds = 5
elasticsearch-max-bytes = 8000000 

dropped-collections = false
dropped-databases = false

resume = true
resume-write-unsafe = true
resume-name = "default"
index-files = false
file-highlighting = false
verbose = true
exit-after-direct-reads = false

index-as-update=true
index-oplog-time=true

Step 6.

D:\15-1-19>monstache -f config.toml




回答2:


if you work with docker you can get this tutorial

https://github.com/ziedtuihri/Monstache_Elasticsearch_Mongodb

Monstache is a sync daemon written in Go that continuously indexes your MongoDB collections into Elasticsearch. Monstache gives you the ability to use Elasticsearch to do complex searches and aggregations of your MongoDB data and easily build realtime Kibana visualizations and dashboards. documentation for Monstache :
https://rwynn.github.io/monstache-site/
github :
https://github.com/rwynn/monstache

docker-compose.yml

version: '2.3'
networks:
  test:
    driver: bridge

services:
  db:
    image: mongo:3.0.2
    expose:
      - "27017"
    container_name: mongodb
    volumes:
      - ./mongodb:/data/db
      - ./mongodb_config:/data/configdb
    ports:
      - "27018:27017"
    command: mongod --smallfiles --replSet rs0
    networks:
      - test

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.7
    container_name: elasticsearch
    volumes:
      - ./elastic:/usr/share/elasticsearch/data
      - ./elastic/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - 9200:9200
    command: elasticsearch -Enetwork.host=_local_,_site_ -Enetwork.publish_host=_local_
    healthcheck:
      test: "wget -q -O - http://localhost:9200/_cat/health"
      interval: 1s
      timeout: 30s
      retries: 300
    ulimits:
      nproc: 65536
      nofile:
        soft: 65536
        hard: 65536
      memlock:
        soft: -1
        hard: -1
    networks:
      - test

  monstache:
    image: rwynn/monstache:rel4
    expose:
      - "8080"
    ports:
      - "8080:8080"
    container_name: monstache
    command: -mongo-url=mongodb://db:27017 -elasticsearch-url=http://elasticsearch:9200 -direct-read-namespace=Product_DB.Product -direct-read-split-max=2
    links:
      - elasticsearch
      - db
    depends_on:
      db:
        condition: service_started
      elasticsearch:
        condition: service_healthy
    networks:
      - test

replicaset.sh

#!/bin/bash

# this configuration is so important 
echo "Starting replica set initialize"
until mongo --host 192.168.144.2 --eval "print(\"waited for connection\")"
do
    sleep 2
done
echo "Connection finished"
echo "Creating replica set"
mongo --host 192.168.144.2 <<EOF
rs.initiate(
  {
    _id : 'rs0',
    members: [
      { _id : 0, host : "db:27017", priority : 1 }
    ]
  }
)
EOF
echo "replica set created"

1) run this commande en terminal $ sysctl -w vm.max_map_count=262144

if you work on a server i don't know if is necessary

2)run en terminal docker-compose build

3) run en terminal $ docker-compose up -d

don't down your container.

$ docker ps

copy the Ipadress of mongo db image

$ docker inspect id_of_mongo_image

copy the IPAddress and set it in replicaset.sh and run replicaset.sh

$ ./replicaset.sh

on terminal you shoulf see => replica set created

$ docker-compose down

4)run en terminal $ docker-compose up

finally .......

Replication in MongoDB

A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.
The primary node receives all write operations. A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary.
View the replica set configuration. Use rs.conf()

replica set allow you to indexes your MongoDB collections into Elasticsearch en real time synchronization.



来源:https://stackoverflow.com/questions/56661890/mongodb-4-x-real-time-sync-to-elasticsearch-6-x

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!