workshops

Querying WikiPathways

prev

Introduction

WikiPathways (https://www.wikipathways.org) is a biological pathway database known for its collaborative nature and open science approaches. The road toward a sustainable, community-driven pathway database goes through integration with other resources such as Wikidata and allowing more use, curation and redistribution of WikiPathways content. The SPARQL endpoint allows the access of the WikiPathways RDF and integrate its content with other databases. The RDF contains all pathways, their datanodes (genes, proteins, metabolites, etc.), author information, molecular descriptors, and more.

The WikiPathways SPARQL endpoint is accessible on https://sparql.wikipathways.org

Figure of simplified RDF schema

Exercises

Exercise 1 - Listing of subjects

The simplest SPARQL queries to explore RDF is to retrieve full lists of subjects of a particular type, which is frequently defined with the predicate rdfs:type or a which can be used interchangably. See the below example of listing all pathways.

SELECT ?pathway 
WHERE {
?pathway a wp:Pathway .
}

By looking at the RDF schema figure, you should be able to adapt the SPARQL query to answer all questions that follow.

Because the WikiPathways RDF contains many properties of all subjects (such as pathways), we can also directly request all contents through the SPARQL query. For example, to extract the pathway title, we add ?pathway dc:title ?pathwaytitle to the SPARQL query and add ?pathwaytitle in the SELECT list. The returned table upon running the query will get wider, so you might need to scroll to the right to see it all.

Exercise 2 - Counting of subjects

This exercise is about creating simple SPARQL queries that count particular types of subjects in the RDF. See the example SPARQL query below that counts the number of pathways in the RDF.

SELECT (count (?pathway) as ?npathway) 
WHERE {
?pathway a wp:Pathway .
}

When copying this SPARQL query and executing it, you will find that the WikiPathways contains 3094 pathways.

Exercise 3 - More detailed exploration

With this exercise, the RDF will be explored a little more extensively. By combining statements in the RDF query, we can link multiple subjects and filter for content that we want to get back from the service. For example, the next query returns the title for pathway with ID WP1560:

SELECT ?pathwaytitle WHERE{
    ?pathway a wp:Pathway .
    ?pathway dc:title ?pathwaytitle .
    ?pathway dcterms:identifier "WP4868" .
}

Challenge: construct a query that provides the count of DataNodes for each individual human pathway

Exercise 4 - Federated SPARQL query

This final exercise adds an extra level of difficulty by linking the AOP-Wiki RDF with another database through SPARQL (this is called a Federated SPARQL query). In this exercise we will explore the connection between WikiPathways and AOP-Wiki. To do this exercise, you might want to do the AOP-Wiki SPARQL endpoint tutorial first.

The SPARQL query will need to contain a SERVICE function and the final query will have the following structure:

PREFIX aopo: <http://vocabularies.wikipathways.org/wp#>
SELECT [variables]
WHERE {
[query WikiPathways]
SERVICE <https://aopwiki.rdf.bigcat-bioinformatics.org/sparql> {
[query AOP-Wiki]
}}

End

Thank you for your participation. For any feedback or questions about this section, please contact Marvin Martens (marvin.martens@maastrichtuniversity.nl).