Working with SPARQL lists in dotNETRDF - intersection of lists

房东的猫 提交于 2021-02-08 05:41:27

问题


I'm using dotNetRDF and am having a hard time understanding how to use the provided list helpers.

Currently I'm not using a list, just one item like so:

 paramString.SetParameter("nickname", g.CreateLiteralNode(nicknameString));
 paramString.CommandText =
            @"INSERT DATA 
            { 
                data:person_1 app:nickname @nickname.                   
            }";

But now I need to account for multiple nicknames:

 //doesn't work with array, and there's no "CreateListNode()" 
 //paramString.SetParameter("nicknames", g.CreateLiteralNode(nicknamesArray)); 
 paramString.CommandText =
            @"INSERT DATA 
            { 
                data:person_1 app:nicknames @nicknames.                   
            }";

Later I need to query to check if 2 lists intersect:

queryString.CommandText =
            @"SELECT ?personWithSameNickname WHERE { 

                data:person_1 app:nicknames ?nicknames.

                #here I need to get people that have at 
                #least one nickname in common with data:person_1, 
                #aka at least one intersection in their nickname lists
                ?personWithSameNickname app:nicknames ?nicknames. 
            }";         

I also need the results ordered by the number of intersections so the best match is on top.

How can I accomplish the above? I only found this reference to lists but I can't quite make sense of it since I'm using SPARQL.


回答1:


A note on Data Modelling

So firstly are you sure that when you talk about lists you necessarily intend RDF lists? The distinction is important because it changes the shape of the data and how you accomplish things.

An RDF list is an ordered sequence of blank nodes that connect values together e.g.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/> .

:root :values [ rdf:first "a" ;
                rdf:rest [ rdf:first "b" ;
                           rdf:rest [ rdf:first "c" ;
                                      rdf:reset rdf:nil ] ] ] .

As you can see it has a lot of overhead in terms of triples, we can of course simplify the syntax in Turtle like so:

@prefix : <http://example.org/> .

:root :value ( "a" "b" "c" ) .

This is equivalent to the first example it is just hiding the explicit triples that a Turtle parser will create will encountering this syntax. The list extensions included in dotNetRDF are specifically intending for working with RDF lists.

Whereas perhaps what you mean by a list is just some set of values associated with a property e.g.

@prefix : <http://example.org/> .

:root :value "a" ;
      :value "b" ;
      :value "c" .

As you can see this is literally just stating several triples each of which states a value for the property. In Turtle we can simplify this further using the , syntax to avoid repeating the predicate:

@prefix : <http://example.org/> .

:root :value "a" , "b" , "c" .

The downside of this approach is that since RDF graphs are unordered sets of triples neither the order of values or duplicate values can be preserved. If you need either order or duplicates then you will need to use the RDF list approach.

The rest of my answer will show how to do things using either data modelling approach.

Inserting a list of values

How you do this depends on whether you want a RDF list or just a number of values, you are quite right that dotNetRDF does not have any built in support for handling such things when dealing with building parameterised SPARQL.

If you want a RDF list then you would need to write your template such that it can take the necessary number of items e.g.

paramString.CommandText =
        @"INSERT DATA 
        { 
            data:person_1 app:nicknames [ rdf:first @nick1 ;
                                          rdf:rest [ rdf:first @nick2 ;
                                                      rdf:rest rdf:nil ] ] .                   
        }";
paramString.SetParameter("nick1", "Rob");
paramString.SetParameter("nick2", "Bob");

And you obviously can extend this pattern to deal with shorter/longer lists as necessary. Clearly this requires a lot of work on the part of the user so if this is what you need then we can certainly look at adding a feature to do this for users in future releases.

If you are just inserting several values either you can use a single triple template and simply insert each parameter in turn and execute it e.g.

paramString.CommandText =
        @"INSERT DATA 
        { 
            data:person_1 app:nicknames @nickname.                   
        }";
foreach (String nick : nicknames)
{
   paramString.SetParameter("nickname", nick);
   // Execute the update
}

Or you can change your template to have a triple for each nickname, here I use the , syntax again to avoid repeating the subject and predicate:

paramString.CommandText =
        @"INSERT DATA 
        { 
            data:person_1 app:nicknames @nick1 , @nick2 .                   
        }";
paramString.SetParameter("nick1", "Rob");
paramString.SetParameter("nick2", "Bob");

Like the RDF lists approach you can extend this pattern to more/less list items as necessary. Again if this is something you would prefer to have dotNetRDF do for you we can look at adding it in future releases.

Checking if two lists intersect

For the RDF lists approach:

queryString.CommandText =
        @"SELECT ?personWithSameNickname WHERE { 
            data:person_1 app:nicknames [ rdf:rest*/rdf:first ?nicknames ].
            ?personWithSameNickname app:nicknames [ rdf:rest*/rdf:first ?nicknames ] .
            FILTER(!SAMETERM(data:person_1, ?personWithSameNickname))
        }"; 

Essentially you just select all the nicknames for your starting node and then do the same for all persons and rely on SPARQL join semantics to find us the intersection.

Note the usage of the rdf:rest*/rdf:first property path to traverse to all the value nodes of the RDF list in order to extract the actual nicknames. Also since the starting node will intersect with itself we use a !SAMETERM(data:person_1, ?personWithSameNickname) in a FILTER to eliminate the match on itself however you could do this in code if you prefer to avoid the FILTER

If you are just using the multiple triple approach the query is even simpler:

queryString.CommandText =
        @"SELECT ?personWithSameNickname WHERE { 
            data:person_1 app:nicknames ?nicknames .
            ?personWithSameNickname app:nicknames ?nicknames .
            FILTER(!SAMETERM(data:person_1, ?personWithSameNickname))
        }"; 

Again simply select all the nicknames for your starting node and then do the same for all persons and rely on SPARQL join semantics to find us the intersection.

Now if you want to rank people by the number of intersections then we can do this using GROUP BY and ORDER BY and this can be added to either variation of the query. I will use the second variation because the base query is simpler:

queryString.CommandText =
        @"SELECT ?personWithSameNickname (COUNT(?nicknames) AS ?matches) WHERE { 
            data:person_1 app:nicknames ?nicknames .
            ?personWithSameNickname app:nicknames ?nicknames .
            FILTER(!SAMETERM(data:person_1, ?personWithSameNickname))
        }
        GROUP BY ?personWithSameNickname
        ORDER BY DESC(?matches)";

So firstly we add an aggregate to the SELECT, specially we want to count the number of nicknames. We then also need to add a GROUP BY on the ?personWithSameNickname variable because we want a group for each person who has intersecting nicknames. This also means that our aggregate will be calculated for each group so we can then use ORDER BY to rank the matches in descending order.



来源:https://stackoverflow.com/questions/25351911/working-with-sparql-lists-in-dotnetrdf-intersection-of-lists

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!