I'm trying to understand Neo4j object cache by some investigation into it. My first impression of Object cache come from the slides in this link: http://www.slideshare.net/thobe/an-overview-of-neo4j-internals
Specifically the Node/Relationship object in cache should look like slide 9 or 15/42. To verify this, I wrote a simple server script using existing graph database contents. The way I do it is trying to look into the starting virtual address of the node/relationship object using sun.misc.Unsafe. The program for obtaining virtual address is from the following link: How can I get the memory location of a object in java?
public static long addressOf(Object o) throws Exception {
Object[] array = new Object[] { o };
long baseOffset = unsafe.arrayBaseOffset(Object[].class);
int addressSize = unsafe.addressSize();
long objectAddress;
switch (addressSize) {
case 4:
objectAddress = unsafe.getInt(array, baseOffset);
break;
case 8:
objectAddress = unsafe.getLong(array, baseOffset);
break;
default:
throw new Error("unsupported address size: " + addressSize);
}
return (objectAddress);
}
And in the neo4j server script (My main() class), I get node address by id and print out the address in the following way:
void checkAddr(){
nodeAddr(0);
nodeAddr(1);
nodeAddr(2);
}
void nodeAddr(int n){
Node oneNode = graphDb.getNodeById(n);
Node[] array1 = {oneNode};
try {
long address = UnsafeUtil.addressOf(array1);
System.out.println("Addess: " + address);
} catch (Exception e) {
e.printStackTrace();
}
}
To begin with, I tried with Soft cache provider, which is the default case. The addresses get printed out for node object 0, 1 and 2 is:
Addess: 4168500044 Addess: 4168502383 Addess: 4168502753
Therefore, Using second address - first address and third address - second address, I can know exactly how much space a node is taking. In this case, first node object takes 2339B and second take 370B.
Then, to see the impact of disabling object cache, I does the setting with NoCacheProvider:
setConfig(GraphDatabaseSettings.cache_type,NoCacheProvider.NAME)
The addresses get printed out is:
Addess: 4168488391 Addess: 4168490708 Addess: 4168491056
The offset, calculated similarly as in first case is: first node object takes 2317B and second takes 348B.
Here comes my problem:
Since I'm using the same graph and doing read only queries, why is the size of the same node object changing?
When I disabled the object cache, why is the address offset look the same as if there is object cache exists? For example, in the node store file, a single node takes 9 bytes, which is not the case in my experiment. If the way I'm getting node object is problematic, how can I obtain virtual address in a correct way? And is there any way I can know specifically where does the mmap node file resides in memory?
How could I know exactly what is stored in a node object. When I looked at Node.class at this link: https://github.com/neo4j/neo4j/blob/1.9.8/community/kernel/src/main/java/org/neo4j/graphdb/Node.java It doesn't seem that a node object should look the same way as it is in the presentation slides. Rather just a group of functions used by node object. Further is a node object brought into memory as a whole at once in both no-object-cache and with-object-cache occasion?
The Node
object is not what Neo4j stores in the "object cache", so you are not going to gain much insight into the caching of Neo4j by looking at those instances. The implementations of Node
that Neo4j gives you are instances of a class called NodeProxy
, and are as small as they can possibly be (two fields: internal id and reference to the database). These just serve as your handle of the node for performing operations around that node in the database. The objects stored in the "object cache" are instances of a class called NodeImpl
(and despite the name they do not implement the Node
interface). The NodeImpl
objects have the shape that's outlined on the 15th slide (with page number 9 within the slide) in that presentation. Well, it roughly has that shape, Neo4j has evolved since I made those slides.
Neo4j evolving has also changed the number of bytes that node records occupy on disk. Neo4j 2.0 and later have slightly larger node records than what those slides present. If you are interested in looking at the layout of those records, you should look at the NodeRecord
class, then start from NodeStore
class and "downwards" into its dependencies to find the memory mapping.
Besides looking at the wrong object for seeing the difference between different cache approaches in Neo4j your approach of measuring is flawed. Comparing the addresses of objects does not tell you anything about the size of those objects. The JVM makes no guarantees that two objects allocated one after the other (in time) will reside adjacently in memory, and even if the JVM did utilise such an allocation policy, Neo4j might have allocated multiple objects in between the allocations of the two objects you are comparing. Then there is the garbage collector, which might have moved the objects around in between you getting the address of one object and you getting the address of the next object. Thus looking at the addresses of objects in Java is pretty much never useful for anything. For a better approach at measuring the size of an object in Java, take a look at the Java Object Layout utility, or use the Instrumentation.getObjectSize(...)
method from a Java agent.
To answer you questions as stated:
The sizes of the node objects are not changing, their addresses are not guaranteed to be the same in between runs. As per my description above you cannot rely on object address to compute object size.
Since you are looking at
NodeProxy
objects they will look the same regardless of what caching strategy Neo4j uses. In order to look at theNodeImpl
objects you have to dig quite deep into the internals of Neo4j. Since it looks like you are using Neo4j 1.9 you would cast theGraphDatabaseService
instance that you have toGraphDatabaseAPI
(an interface that is internal to the implementation) then invoke thegetNodeManager()
method on that object. from theNodeManager
you can callgetNodeIfCached( node.getId() )
to get aNodeImpl
object. Please note that this API will not be compatible between versions of Neo4j, and using it is one of those "warranty void if seal broken" kind of situations...Look at the source code for
NodeImpl
instead. As to when and how data is brought into cache, Neo4j tries to be lazy about that, only loading the data you use. If you are getting the relationships of a node, those will be loaded into the cache, and if you are getting properties, those will be loaded into the cache. If you only get relationships, the properties will never be loaded and vice versa.
来源:https://stackoverflow.com/questions/25612552/understanding-of-neo4j-object-cache