neo4j | 易学教程

TinkerPop集成Hadoop+Spark

阅读更多关于 TinkerPop集成Hadoop+Spark

前言前面介绍了 TinkerPop集成Neo4j 的配置方法，并且实现了HA操作。这里有一个突出问题就是不管是使用Neo4j，还是自带的TinkerGraph都不可避免的面临一个问题——大数据量场景，也即分布式问题。鉴于此， Tinkerpop 还提供了和Hadoop+Spark的集成解决方案，从而解决单节点问题。但是由于Spark中的数据一致性问题，不能修改数据，所以这种方案不能修改数据，也不能新增数据，只适合用来查询、计算，不得不说这是一个很大的缺点。如果有同学有更好的解决方法，欢迎在下面留言交流。另外，本文的所有操作同样都以Tinkerpop Server 3.4.4为例。 TinkerPop集成Hadoop+Spark 在 Tinkerpop官网中已经给出了和Hadoop+Spark的集成方法，但是有两个问题。第一，所有的操作都是基于console完成的，没有server的操作步骤；第二，在使用SparkGraphComputer时，master都是local模式，对于使用YARN作为资源管理器的时候，参照官网资料往往是调试不成功的。原因主要有三点： SparkGraphComputer会创建自己的SparkContext，而不是通过spark-submit获取配置信息。对于Spark运行在YARN上的模式，直到Tinkerpop 3.2.7/3.3.1版本之后才支持

Neo4j PostingsFormat with name 'BlockTreeOrds' does not exist

阅读更多关于 Neo4j PostingsFormat with name 'BlockTreeOrds' does not exist

问题 I tried to packaged my project. But when I run the jar file, I find a bug. Exception in thread "main" java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory, D:\f ... Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.storageengine.impl.recordstorage.RecordStorageEngine@5483163c' failed to initialize. Please see attached cause exception. ... Caused by: java.lang.IllegalArgumentException: An SPI class of type org

How to efficiently find multiple relationship size

阅读更多关于 How to efficiently find multiple relationship size

问题 We have a large graph (over 1 billion edges) that has multiple relationship types between nodes. In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query: MATCH (n)-[:REL_TYPE]-(m) WHERE size((n)-[]-(m))=1 AND id(n)>id(m) RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m) To demonstrate a similar result, the below sample code can run on

How to group or merge virtual relationship created using apoc.create.vRelationship among nodes in neo4j?

阅读更多关于 How to group or merge virtual relationship created using apoc.create.vRelationship among nodes in neo4j?

问题 There is a set of artists, from which some artists create a temporary group and organize a event in any city. After it different groups organize events in different city or same city as done by some other group. I want to query the data when artist A participates in the event then the events done in same city by Artist B in a series of Dates with below Cypher query but get duplicate virtual relationship for Artist A & Event and also for Event & City. MATCH seriesB = (bArtist:Artist)-[:HAS

Neo4J cypher: collect intermediate node properties (path)

阅读更多关于 Neo4J cypher: collect intermediate node properties (path)

问题 I have a data lineage related graph in Neo4J with variable length path containing intermediate nodes (tables): match p=(s)-[r:airflow_loads_to*]->(t) where s.database_name='hive' and s.schema_name='test' and s.name="source_table" return s.name,collect(nodes(p)),t.name Instead of returning the nodes between s.name and t.name as a path, I want to return an array of the name property of all nodes in the path (in the order of traversing) I probably have to use collect, but that is not possible on

How to set Neo4j conf in docker?

阅读更多关于 How to set Neo4j conf in docker?

问题 I used to run Neo4j separately and then my application interacted with it as required. Every time I fresh installed Neo4j, I had to go to /etc/neo4j/neo4j.conf and comment this one line: dbms.directories.import=/var/lib/neo4j/import by putting a # in start of it to make things work for me. By default this line wasn't commented. Anyways, I am moving to docker now, and I want to know how to change that line in docker environment? Here's my portion of neo4j in docker file. neo4j: container_name:

simple lookup takes several minutes despite using an index

阅读更多关于 simple lookup takes several minutes despite using an index

问题 I have a decently sized graph (~600 million nodes, 3.5 billion edges) that I imported into neo4j. The graph is also quite dense (median edge count around 10); though I'm not sure if that affects performance. For one type of node (:Authors) - there are roughly 200 million nodes of this type - I would like to run a query for a specific name, which is stored in the property normalizedName . Here is the (very simple) query: MATCH (a:AUTHOR) WHERE a.normalizedName = "jonathan smith" RETURN a As

simple lookup takes several minutes despite using an index

阅读更多关于 simple lookup takes several minutes despite using an index

How to use SQL-like GROUP BY in Cypher query language, in Neo4j?

阅读更多关于 How to use SQL-like GROUP BY in Cypher query language, in Neo4j?

问题 I want to find the number of all users in a company and the number of its men and women. My query is: start n=node:company(name:"comp") match n<-[:Members_In]-x, n<-[:Members_In]-y where x.Sex='Male' and y.Sex='Female' return n.name as companyName, count(distinct x) as NumOfMale, count(distinct y) as NumOfFemale" ); My query is correct, but I know I shouldn't use n<-[:Members_In]-y in the match clause. How can I get the number of male, number of female, and total number of users? 回答1: Peter

Neo4j java: Traversal from multiple start points

阅读更多关于 Neo4j java: Traversal from multiple start points

问题 my task in Neo4j 2.0 embedded is to find the paths from multiple nodes to the root of the tree, in which all nodes are located. Thus, if we assume I have start nodes A, B, and C, I'd like to find paths A-->...-->root B-->...-->root C-->...-->root For this task, I defined a TraversalDescription which works just fine when applied to each of the start nodes individually. Now I saw that the TraversalDescription's traverse method can not only take one start node but multiple. So I put all my start