Extract all parents of a given node

◇◆丶佛笑我妖孽 提交于 2019-12-20 06:28:20

问题


I'm trying to extract all parents of a each given GO Id (a node) using EBI-RDF sparql endpoint, I was based on this two similar questions to formulate the query, here're two examples illustrating the problem:

Example 1 (Link to the structure):

biological_process (GO:0008150)
           |__ metabolic process (GO:0008152)
                           |__ methylation (GO:0032259)

In this example, using the following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT (count(?mid) as ?depth)
       (group_concat(distinct ?midId ; separator = " / ") AS ?treePath) 
FROM <http://rdf.ebi.ac.uk/dataset/go> 
WHERE {
    obo:GO_0032259 rdfs:subClassOf* ?mid .
    ?mid rdfs:subClassOf* ?class .
    ?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?treePath
ORDER BY ?depth

I got the desired results without problems:

c |              treePath
--|-------------------------------------
6 | GO:0008150 / GO:0008152 / GO:0032259

But when the term exists in multiple branches (e.g GO:0007267) as in the case below, the previous approach didn't work:

Example 2 (Link to the structure)

biological_process (GO:0008150)
           |__ cellular_process (GO:0009987)
           |           |__ cell communication (GO:0007154)
           |                       |__ cell-cell signaling (GO:0007267)
           |
           |__ signaling (GO:0023052)
                      |__ cell-cell signaling (GO:0007267)

The result:

c |                            treePath
--|---------------------------------------------------------------
15| GO:0007154 / GO:0007267 / GO:0008150 / GO:0009987 / GO:0023052

What I wanted to get is the following:

GO:0008150 / GO:0009987 / GO:0007154 / GO:0007267
GO:0008150 / GO:0023052 / GO:0007267

What I understood is that under the hood I'm calculating the depth of each level and using it to construct the path, this works fine when we have an element that belongs only to one branch.

SELECT (count(?mid) as ?depth) ?midId
FROM <http://rdf.ebi.ac.uk/dataset/go> 
WHERE {
    obo:GO_0032259 rdfs:subClassOf* ?mid .
    ?mid rdfs:subClassOf* ?class .
    ?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?midId
ORDER BY ?depth

The result:

depth |   midId
------|------------
1     | GO:0008150
2     | GO:0008152
3     | GO:0032259

In the second example, things are missed up and I didn't get why, in any ways I'm sure that part of the problem are terms that have the same depth/level, but I don't know how can I solve this.

depth |   midId
------|------------
2     | GO:0008150
2     | GO:0009987
2     | GO:0023052
3     | GO:0007154
6     | GO:0007267

回答1:


Thanks to @AKSW I found a decent solution using HyperGraphQL (a GraphQL interface for querying and serving linked data on the Web).

I'll leave the detailed answer here, it may help someone.

  1. I downloaded and set up HyperGraphQL download page
  2. Linked it to EBI Sparql endpoint as described in this tutorial

    The config.json file I used:

    {
        "name": "ebi-hgql",
        "schema": "ebischema.graphql",
        "server": {
            "port": 8081,
            "graphql": "/graphql",
            "graphiql": "/graphiql"
        },
        "services": [
            {
                "id": "ebi-sparql",
                "type": "SPARQLEndpointService",
                "url": "http://www.ebi.ac.uk/rdf/services/sparql",
                "graph": "http://rdf.ebi.ac.uk/dataset/go",
                "user": "",
                "password": ""
            }
        ]
    }
    

    Here's how my ebischema.graphql file looks like (Since I needed only the Class, id, label and subClassOf):

    type __Context {
        Class:          _@href(iri: "http://www.w3.org/2002/07/owl#Class")
        id:             _@href(iri: "http://www.geneontology.org/formats/oboInOwl#id")
        label:          _@href(iri: "http://www.w3.org/2000/01/rdf-schema#label")
        subClassOf:     _@href(iri: "http://www.w3.org/2000/01/rdf-schema#subClassOf")
    }
    
    type Class @service(id:"ebi-sparql") {
        id: [String] @service(id:"ebi-sparql")
        label: [String] @service(id:"ebi-sparql")
        subClassOf: [Class] @service(id:"ebi-sparql")
    }
    
  3. I started testing some simple query, but constantly getting an empty response; the answer to this issue solved my problem.

  4. Finally I constructed the query to get the tree

    Using this query:

    {
      Class_GET_BY_ID(uris:[
        "http://purl.obolibrary.org/obo/GO_0032259",
        "http://purl.obolibrary.org/obo/GO_0007267"]) {
        id
        label
        subClassOf {
          id
          label
          subClassOf {
            id
            label
          }
        }
      }
    }
    

    I got some interesting results:

    {
      "extensions": {},
      "data": {
        "@context": {
          "_type": "@type",
          "_id": "@id",
          "id": "http://www.geneontology.org/formats/oboInOwl#id",
          "label": "http://www.w3.org/2000/01/rdf-schema#label",
          "Class_GET_BY_ID": "http://hypergraphql.org/query/Class_GET_BY_ID",
          "subClassOf": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
        },
        "Class_GET_BY_ID": [
          {
            "id": [
              "GO:0032259"
            ],
            "label": [
              "methylation"
            ],
            "subClassOf": [
              {
                "id": [
                  "GO:0008152"
                ],
                "label": [
                  "metabolic process"
                ],
                "subClassOf": [
                  {
                    "id": [
                      "GO:0008150"
                    ],
                    "label": [
                      "biological_process"
                    ]
                  }
                ]
              }
            ]
          },
          {
            "id": [
              "GO:0007267"
            ],
            "label": [
              "cell-cell signaling"
            ],
            "subClassOf": [
              {
                "id": [
                  "GO:0007154"
                ],
                "label": [
                  "cell communication"
                ],
                "subClassOf": [
                  {
                    "id": [
                      "GO:0009987"
                    ],
                    "label": [
                      "cellular process"
                    ]
                  }
                ]
              },
              {
                "id": [
                  "GO:0023052"
                ],
                "label": [
                  "signaling"
                ],
                "subClassOf": [
                  {
                    "id": [
                      "GO:0008150"
                    ],
                    "label": [
                      "biological_process"
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      "errors": []
    }
    

EDIT

This was exactly what I wanted, but I noticed that I can't add another sublevel like this:

{
  Class_GET_BY_ID(uris:[
    "http://purl.obolibrary.org/obo/GO_0032259",
    "http://purl.obolibrary.org/obo/GO_0007267"]) {
    id
    label
    subClassOf {
      id
      label
      subClassOf {
        id
        label
        subClassOf {  # <--- 4th sublevel
          id
          label
        }
      }
    }
  }
}

I created a new question: Endpoint returned Content-Type: text/html which is not recognized for SELECT queries



来源:https://stackoverflow.com/questions/54785145/extract-all-parents-of-a-given-node

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!