The Open Protein Structure Annotation Network
PDB Keyword

Part 1: Using Semantic Notation on TOPSAN

    This is the first in a series of blogs in which we will try to introduce you to the concepts behind the TOPSAN Protein Syntax and the TOPSAN semantic notation system. The first article will be the basics of engaging the notation environment and some simple examples of how to use the notation system. Next we will describe how to use obtain and use the semantic information that has been embedded in TOPSAN, compose queries and analyze the available information. Finally we will describe the more advanced concepts involved with the controlled ontology of predicates that the TOPSAN Protein Syntax describes.

    To get read more about the technologies involved you can find additional information at:

    What is the semantic web:

    Biohackathon semantic web series:

    How the data is being stored:

    How ontologies can control the language used to describe the relationships between different pieces of data:
    Technologies that can be used to query and examine the data:

    Introduction to TOPSAN Protein Syntax

    One of the goals of the TOPSAN protein annotation system is to make sure that human annotations of protein structures are available to the public. This includes ensuring that annotations are available in a machine readable format. When an annotator adds a link or a value to a page, it is important the intent for this link is expressed. It is important to know if a link has been added because it’s an example of a homologue or because it is a link to another protein in the same pathway. The concepts and standards behind the semantic web provide a framework for expressing this information.
    The TOPSAN Protein Syntax (TPS), is designed to cover the set of predicates used to describe the relationships between proteins and the databases and values they can be linked to. These predicates follow a formalized ontology, that begins from three different roots. These different branches represent the different basic concepts that are used to describe proteins. These include ‘links’, and ‘values’. ‘Link’ statements describe connections from a protein to another database element while ‘value’ statements assign direct values and data to a protein.

    All calls embedded in the text of TOPSAN documents begin and end with the double brackets




    . You then make a call to ‘note.link’. There are two ways to call the function, via sequential argument or by named arguments. For sequential arguments, wrap the arguments with ‘(‘ and ‘)’, and type in the values. This is usually only used for the two argument call, when passing the predicate and the object values. Alternatively, if you want to manipulate additional arguments the named argument format is preferred. In this method, the arguments are wrapped with ‘{‘ and ‘}’, and the name of each of argument is given followed by a ‘:’ and then the value. When using the named argument format you don’t have to remember a specific order of arguments.

    note.link Arguments:

    • rel : Type of relationship
    • value : The database to link to, if it is not an identifiable link it is assigned as a literal value
    • visible : If false the call does not produce text that is visible on the page
    • about : Defaults to the current page
    • rev: If the relationship is reversed, so that the destination is the subject and the current page is the object.

    Relationships can go in both directions. By default the subject is the current page and the passed value is the object. To reverse this relationship, so that the relationship statement is about the external database pointing to the current page, set ‘rev:true’.


    Embed a link to PFAM:

    {{ note.link( ‘memberOf’, ‘PFAM:PF07980′ ) }}
    Cite a PubMed Reference:
    {{ note.link( ‘citation’, ‘PMID:19191477′ ) }}

    Reverse a relationship:

    {{ note.link{ rel:’similar’, value:’UNIPROT:Q8A1G2′, rev:true } }}
    Define a relationship about something other then the current page:
    {{ note.link{ about:’TOPSAN:2aam’, rel:’similar’, value:’UNIPROT:Q8A1G2′ } }}
    On the editor this would like:
    When displayed in the page it would be:


    We will describe the set of TOPSAN Protein Syntax predicates with greater detail later. For now, there are only a handful of predicates that you need to know in order to get started.
    Represents a connection between two proteins that are homologous or structurally similar
    Connects a protein to an assigned function type
    Connects a single element to group to which it is a part of
    A connection to a literature citation


    Available Databases

    When describing a link to another database, you can use prefix codes that will recognized by TOPSAN and translated accordingly. We have a set of 10 linked databases currently, but this will grow as needed. To use the prefix code, simply name the database by code, followed by a colon and the database identified from the database, ie “PFAM:PF0798″, “UNIPROT:Q8A1G2″, or “PDB:1AAC”.

    Prefix Database
    GO The Gene Ontology database
    PFAM The Pfam protein family database
    UNIPROT The Uniprot protein database
    EC The Enzyme Catalogue
    TOPSAN The TOPSAN protein annotation system
    TAXON The NCBI taxonomic codes
    PDB The Protein Data Base
    PMID Pubmed
    SCOP Scop domain IDs, ie d1wy7a1
    SUNID Scop ID: ie, 51349 -> Alpha and beta proteins (a/b)




    No references found.

    Tag page
    • No tags

    Files (0)

    You must login to post a comment.
    All content on this site is licensed under a Creative Commons Attribution 3.0 License
    Powered by MindTouch