How to use marklogic database for real time processing of data

*爱你&永不变心* 提交于 2019-12-12 14:23:45

问题


I am trying to evaluate marklogic for real time processing of the data. Earlier i have used kafka and storm for real time handling of data and after processing inserted to database. I am new to marklogic, so can anybody tell me is there anything available in marklogic which i can use for real time handling of data and after getting the data process it and then insert it into marklogic database.


回答1:


MarkLogic is extremely scalable and has features like triggers, Alerting and CPF for which you can build your logic to decide what to do with incoming content. But a few notes to get you started:

MarkLogic is a share-nothing architecture, so CPU and HTTP servers on each node are independent, so you have to keep that in mind when you consider how to balance incoming messages.

MarkLogic also does not stream to disk.

MarkLogic can connect via a great HTTP client, but I do not believe there are any capabilities out-of-the-box to append content to an open connection (this is related to why it also has no ftp capability, I believe).


So, I point these items out so you understand that you are dealing with a different type of system. So the approach is just not the same. In fact, with the use of pre-commit triggers or just an http-based application mixed with super-fast features like reverse queries and designing your solution to match how MarkLogic works, handling huge amounts of data for real-time processing can be a great solution. There is one large implementation that I worked on for which MarkLogic is happily receiving and processing large volumes of messages form an upstream WebSphere message broker. Some messages are handled internally and others are passed on to Splunk and other systems.


I answered your question in a high-level way because it's not really asking a detailed question - and MarkLogic is a large, robust solution for which you really need to get an overview of on your own. If you have the time, there is a 1-day free training course that covers the fundamentals -whihc will allow you to better understand the product and assess it for you needs.

BTW: ALL training for MarkLogic is free. Here is the link to the fundamentals course: http://www.marklogic.com/training-courses/marklogic-fundamentals/ This one can also be take on your own time (self-paced)




回答2:


Also, please take a look at the MarkLogic Java Client API which should be usable from within Storm or Kafka. Perhaps that offers you a way to continue doing the real-time processing you're used to then inserting the data into MarkLogic using the Java API.




回答3:


There is an open source Kafka Sink Connect for MarkLogic. Please take a look at https://github.com/sanjuthomas/kafka-connect-marklogic

You may be able to use Kafka as a buffer when you stream high-velocity data to MarkLogic. If MarkLogic's write throughput is acceptable, then you can transform/process the data during the ingestion time using a custom REST endpoint. I wouldn't consider the previous generation trigger and CPF based transformation as a scalable solution, more importantly, debugging a CPF pipeline issues is not something you wanted to do when you have other matured stream processing framework/tools available in the open source world.



来源:https://stackoverflow.com/questions/37720088/how-to-use-marklogic-database-for-real-time-processing-of-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!