WikiPathways (https://www.wikipathways.org) is a biological pathway database known for its collaborative nature and open science approaches. The road toward a sustainable, community-driven pathway database goes through integration with other resources such as Wikidata and allowing more use, curation and redistribution of WikiPathways content. The SPARQL endpoint allows the access of the WikiPathways RDF and integrate its content with other databases. The RDF contains all pathways, their datanodes (genes, proteins, metabolites, etc.), author information, molecular descriptors, and more.
The WikiPathways SPARQL endpoint is accessible on https://sparql.wikipathways.org
The simplest SPARQL queries to explore RDF is to retrieve full lists of subjects of a particular type, which is frequently defined with the predicate rdfs:type
or a
which can be used interchangably. See the below example of listing all pathways.
SELECT ?pathway
WHERE {
?pathway a wp:Pathway .
}
By looking at the RDF schema figure, you should be able to adapt the SPARQL query to answer all questions that follow.
wp:Pathway
with to extract all datanodes?wp:Pathway
with to extract all metabolites?Because the WikiPathways RDF contains many properties of all subjects (such as pathways), we can also directly request all contents through the SPARQL query. For example, to extract the pathway title, we add ?pathway dc:title ?pathwaytitle
to the SPARQL query and add ?pathwaytitle
in the SELECT
list. The returned table upon running the query will get wider, so you might need to scroll to the right to see it all.
SELECT
list and requesting that variable by adding in the query ?pathway dc:description ?[new variable name]
. This should return a table with the added column.SELECT
list and requesting that variable by adding in the query ?pathway dcterms:identifier ?[new variable name]
. This should return a table with the added column.SELECT
list and requesting that variable by adding in the query ?pathway wp:organismName ?[new variable name]
. This should return a table with the added column.This exercise is about creating simple SPARQL queries that count particular types of subjects in the RDF. See the example SPARQL query below that counts the number of pathways in the RDF.
SELECT (count (?pathway) as ?npathway)
WHERE {
?pathway a wp:Pathway .
}
When copying this SPARQL query and executing it, you will find that the WikiPathways contains 3094 pathways.
With this exercise, the RDF will be explored a little more extensively. By combining statements in the RDF query, we can link multiple subjects and filter for content that we want to get back from the service. For example, the next query returns the title for pathway with ID WP1560:
SELECT ?pathwaytitle WHERE{
?pathway a wp:Pathway .
?pathway dc:title ?pathwaytitle .
?pathway dcterms:identifier "WP4868" .
}
Challenge: construct a query that provides the count of DataNodes for each individual human pathway
This final exercise adds an extra level of difficulty by linking the AOP-Wiki RDF with another database through SPARQL (this is called a Federated SPARQL query). In this exercise we will explore the connection between WikiPathways and AOP-Wiki. To do this exercise, you might want to do the AOP-Wiki SPARQL endpoint tutorial first.
The SPARQL query will need to contain a SERVICE
function and the final query will have the following structure:
PREFIX aopo: <http://vocabularies.wikipathways.org/wp#>
SELECT [variables]
WHERE {
[query WikiPathways]
SERVICE <https://aopwiki.rdf.bigcat-bioinformatics.org/sparql> {
[query AOP-Wiki]
}}
WP5083
in WikiPathways? In WikiPathways, you can extract the ChEBI ID using the predicate wp:bdbChEBI
for wp:Metabolite
subjects.Thank you for your participation. For any feedback or questions about this section, please contact Marvin Martens (marvin.martens@maastrichtuniversity.nl).