Creating a metabolic pathway in Neo4j

前端 未结 3 1824
忘了有多久
忘了有多久 2020-12-18 07:42

I am attempting to create the glycolytic pathway shown in the image at the bottom of this question, in Neo4j, using these data:

glycolysis_bioentities.csv

3条回答
  •  情歌与酒
    2020-12-18 08:34

    If permitted, I'd like to post one more follow-on answer -- my reason being that currently there is very little out there on recreating metabolic pathways in Neo4j, and the following will provide a complete summary under this StackOverflow title/subject, "Creating a metabolic pathway in Neo4j".

    Like my Glycolysis pathway, above, I recreated in Neo4j the TCA (citric acid cycle | Kreb's cycle) pathway:

    [TCA cycle image source: https://metabolicpathways.stanford.edu/]

    An issue that arose during the creation of my TCA pathway graph was that the one of the nodes (the enzyme, "aconitase") was used twice, so during the graph creation MERGE merged the common node aconitase as a single entity, resulting in this layout,

    ... not this one, as desired,

    My solution to that issue was to create the "TCA graph" using node properties, to temporarily differentially-tag the affected source and target nodes (later removing those tags, after the graph was properly created).

    I also added a :Metabolism label, so that I could select the individual pathways (:Glycolysis | :TCA) or the complete metabolic network (:Metabolism), as desired.

    Lastly, I needed to connect the two pathways (:Glycolysis | :TCA) through their common node, pyruvate, which I was able to do through an APOC procedure (here, appended to the end of my glycolysis.cql (Cypher) script.

    Here are my CSV data files, *.cql Cypher scripts, script execution, and the resultant graph.

    glycolysis.csv:

    source,relation,target
    α-D-glucose,substrate_of,hexokinase
    hexokinase,yields,glucose 6-phosphate
    glucose 6-phosphate,substrate_of,glucose-6-phosphatase
    glucose-6-phosphatase,yields,α-D-glucose
    glucose 6-phosphate,substrate_of,phosphoglucose isomerase
    phosphoglucose isomerase,yields,fructose 6-phosphate
    fructose 6-phosphate,substrate_of,phosphofructokinase
    phosphofructokinase,yields,"fructose 1,6-bisphosphate"
    "fructose 1,6-bisphosphate",substrate_of,"fructose-bisphosphate aldolase, class I"
    "fructose-bisphosphate aldolase, class I",yields,D-glyceraldehyde 3-phosphate
    D-glyceraldehyde 3-phosphate,substrate_of,glyceraldehyde-3-phosphate dehydrogenase
    D-glyceraldehyde 3-phosphate,substrate_of,triosephosphate isomerase (TIM)
    triosephosphate isomerase (TIM),yields,dihydroxyacetone phosphate
    glyceraldehyde-3-phosphate dehydrogenase,yields,"1,3-bisphosphoglycerate"
    "1,3-bisphosphoglycerate",substrate_of,phosphoglycerate kinase
    phosphoglycerate kinase,yields,3-phosphoglycerate
    3-phosphoglycerate,substrate_of,phosphoglycerate mutase
    phosphoglycerate mutase,yields,2-phosphoglycerate
    2-phosphoglycerate,substrate_of,enolase
    enolase,yields,phosphoenolpyruvate
    phosphoenolpyruvate,substrate_of,pyruvate kinase
    pyruvate kinase,yields,pyruvate
    

    tca.csv:

    source,relation,target,tag1,tag2
    pyruvate,substrate_of,pyruvate dehydrogenase,,
    pyruvate dehydrogenase,yields,acetyl CoA,,
    acetyl CoA,substrate_of,citrate synthase,,
    oxaloacetate,substrate_of,citrate synthase,,
    citrate synthase,yields,citrate,,
    citrate,substrate_of,aconitase,,1
    aconitase,yields,cis-aconitate,1,
    cis-aconitate,substrate_of,aconitase,,2
    aconitase,yields,isocitrate,2,
    isocitrate,substrate_of,isocitrate dehydrogenase,,
    isocitrate dehydrogenase,yields,α-ketoglutarate,,
    α-ketoglutarate,substrate_of,α-ketoglutarate dehydrogenase,,
    α-ketoglutarate dehydrogenase,yields,succinyl-CoA,,
    succinyl-CoA,substrate_of,succinyl-CoA synthetase,,
    succinyl-CoA synthetase,yields,succinate,,
    succinate,substrate_of,succinate dehydrogenase,,
    succinate dehydrogenase,yields,fumarate,,
    fumarate,substrate_of,fumarase,,
    fumarase,yields,S-malate,,
    S-malate,substrate_of,malate dehydrogenase,,
    malate dehydrogenase,yields,oxaloacetate,,
    

    "tag1" and "tag"2 in "tsv.csv" are used to uniquely those source and target nodes, when they are created via the "tca.cql" script:

    tca.cql:

    // CREATE INDICES:
    CREATE INDEX ON :Metabolism(name);
    CREATE INDEX ON :TCA(name);
    
    // CREATE GRAPH:
    // USING PERIODIC COMMIT 5000
    LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/tca.csv" AS row
    MERGE (s:Metabolism:TCA {name: row.source, tag:COALESCE(row.tag1, '')})
    MERGE (t:Metabolism:TCA {name: row.target, tag:COALESCE(row.tag2, '')})
    WITH s, t, row
      CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
      REMOVE s.tag, t.tag
    RETURN COUNT(*);
    

    glycolysis.cql:

    // CREATE INDICES:
    CREATE INDEX ON :Metabolism(name);
    CREATE INDEX ON :Glycolysis(name);
    
    // CREATE GRAPH:
    //USING PERIODIC COMMIT 5000
    LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/glycolysis.csv" AS row
    MERGE (s:Metabolism:Glycolysis {name: row.source})
    MERGE (t:Metabolism:Glycolysis {name: row.target})
    WITH s, t, row
      CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
    RETURN COUNT(*);
    
    // MERGE COMMON NODE (GLYCOLYSIS: PYRUVATE; TCA: PYRUVATE):
    // As presented, run "tca.cql" first, then "glycolysis.cql"
    
    MATCH (g:Glycolysis), (t:TCA) WHERE g.name = t.name
    CALL apoc.refactor.mergeNodes([g,t]) YIELD node
      RETURN node;
    

    Script execution:

    $ cat tca.cql |  cypher-shell -u *** -p ***
      COUNT(*)
      21
    
    $ cat glycolysis.cql |  cypher-shell -u *** -p ***
      COUNT(*)
      22
      node
      (:Metabolism:TCA:Glycolysis {name: "pyruvate"})
    
    $ 
    

    Neo4j graph (:Metabolism view):

提交回复
热议问题