MODULES

2.1.9 Software tools, databases and standards (markup languages, model databases)

( Rating 5 from 2 votes ) 
Module description: Visualization tools for biological network analysis

· Medusa

Based on the Fruchterman-Reingold algorithm, Medusa provides 2D representations of networks of medium size, up to a few hundred nodes and edges. It is less suited for the visualization of big datasets. Medusa uses non directed, multi-edge connections, which allows the simultaneous representation of more than one connection between two bioentities. Additional nodes can be fixed in order to facilitate pattern recognition and spring embedded layout algorithms help the relaxation of the network. Medusa supports weighted graphs and represents the significance and importance of a connection by varying line thickness.

· Cytoscape

Cytoscape mainly provides 2D representations and is suitable for large-scale network analysis with hundredth thousands of nodes and edges. It can support directed, undirected and weighted graphs and comes with powerful visual styles that allow the user to change the properties of nodes or edges. The tool provides a variety of layout algorithms including cyclic and spring-embedded layouts. Furthermore, expression data can be mapped as node color, label, border thickness, or border color.

· BioLayout Express3D

BioLayout Express3D is a tool for layout, visualization and clustering of large scale networks in both 3D and 2D. It supports both unweighted and weighted graphs together with edge annotation of pairwise relationships. It mainly employs the Fruchterman-Rheingold layout algorithm for 2D and 3D graph positioning and display of the network. A variety of colour schemes render the network more informative and clusters can be easier visualized. Since BioLayout Express3D uses a graphics renderer it is limited in the size of networks it can process.

· Osprey

Osprey provides 2D representations of directed, undirected and weighted networks. It is not efficient for large scale network analysis but it provides various layout options and ways to arrange nodes in various geometric distributions. The layouts range from the relax algorithm over a simple circular layout to a more advanced Dual Spoked Ring layout that displays up to 1500 – 2000 nodes in a easily manageable format. The user can change the size and the colours of most Osprey objects such as edges, nodes, labels, and arrow heads.

· ProViz

It comes with both 2D and pseudo-3D display support to render data. It can manipulate single graphs in large-scale datasets with millions of nodes or connections. Leveraging the Tulip drawing package, it generates appealing 3D visualizations. ProViz predominantly relies on the GEM force based graph layout algorithm which facilitates the identification of key points in a network of interactions. In addition the tool also offers a circular and a hierarchical layout, which improve the detection of metabolic pathways or gene regulation networks in large datasets. ProViz is ideal to gain a first overview of networks because it allows fast navigation through graphs.

· Ondex

Ondex provides 2D representations of directed, undirected and weighted networks. It can handle large scale networks of hundred thousands of nodes and edges. It also supports bidirectional connections, which are represented as curves. Moreover, different types of data are separated by placing them in different disks-circles interconnected between each other.

· PATIKA

It provides 2D representations of single or directed graphs. There are no limitations regarding the size of the graphs. It offers a very intuitive and widely accepted representation for cellular processes using directed graphs where nodes correspond to molecules and edges correspond to interactions between them. Even though the implemented variety of layout algorithms is rather limited, PATIKA is able to support bipartite graph of states and transitions. It represents different types of edges: product edges, where the source and target nodes of a product edge define the transition and a product of this transition, activator edges, where the source and target nodes of an activator edge define the activating state and the transition that is activated by this state, inhibitor edges where the source and target nodes of an activator edge define the inhibiting state and the transition that is inhibited by this state and substrate edges where the target and source nodes of a substrate edge define the transition and a substrate of this transition, respectively.

· PIVOT

It projects everything in 2D and it uses single non directed lines to show relationships between bioentities. It is not limited to the size of data it can present. Overall the variety of incorporated layout algorithms is limited, but PIVOT employs specific layout algorithms for visualizing families.

· Pajek

It offers 2D representations and pseudo3d representations and supports single, directed and weighted graphs. Pajek is suitable for large scale networks with thousands or even million of nodes and vertices. It comes with a great variety of layout options like circular layout using partitions, circular layout using permutation or circular layout using random coordinates layout algorithms. Direct forcing and energy free layout algorithms, such as the Kimura-Kawai and Fruchterman-Reingold with free or fixed points are also included. It can also separate data into layers, which allows the display of hierarchical relationships. Pajek's ability to visualize multi-relational networks, networks between two disjoint sets of vertices and temporal networks make it one of the few tools that can handle dynamic graphs and reveal how networks change over time.



Standard network file formats

BioPAX, SMBL and PSI MI are the three languages most applicable to biological data. Out of them BioPAX has the richest hierarchy due to the advanced tagging vocabulary and has the most general approach. It spans a broad range of biological data including genetic interactions, interaction networks, small molecules as well as regulatory and metabolic pathways. PSI-MI is ideal to handle experimental data like molecular interactions and interaction networks. SBML, on the other hand, is better suited for the description of relationships and is mostly used in simulations. It is the language of choice when it comes to rate formulas and biochemical reactions.

· BioPAX

This "pathway language" is a collaborative effort to create a computer readable data exchange format for biological data. The language was developed to allow distribution, sharing and exchange of information between pathway databases in a standard format by using a specific controlled vocabulary for tagging. BioPAX is based on an ontology of concepts with attributes, which allows to make a more explicit use of the relations between concepts compared to other standards. It is most suitable for the description of protein-protein interactions, genetic interactions, gene regulatory, metabolic and signaling pathways. BioPAX is being developed in a series of levels incorporating different features in each round. The current version has the focus on metabolic networks and molecular interaction networks, were the next development level is trying to implement gene and DNA interactions, signal transduction and genetic interactions. BioPAX is the most expressive language and is based on a rich hierarchy, which as a trade-off can result in a high degree of computational complexity. Being a comparatively new language BioPAX is not yet supported by the majority of tools presented here.

· SBML

The acronym SBML stands for Systems Biology Markup Language and is a machine-readable format for describing qualitative and quantitative models of biochemical networks. The current version of SBML focuses on models for the analysis and simulation of basic biochemical networks. The next release will additionally incorporate the concept of model composition, the description of molecule complexes, layout information and spatial characteristics of the models. Many libraries and tools are available for parsing and editing SBML texts. Furthermore, several converters exist to convert SMBL into BioPAX and vice versa. Having started of as a language to describe biochemical reactions, SBML is now widely accepted and supported by over 100 different software systems worldwide, including systems for modeling and simulation, drawing and visualization tools and databases such as KEGG and BioCyc.

· PSI-MI

PSI-MI stands for Proteomics Standards Initiative Interaction and is a machine readable format intended for the exchange, comparison and verification of proteomics data. There are many tools available for viewing and converting PSI-MI data. The main focus is the definition of molecular interactions such as protein-protein interactions, rather than the description of complete cellular models.

· CML

CML is the acronym for Chemical Markup Language and is a language mainly developed to describe chemical concepts and information about molecules, reactions, spectra and analytical data, computational chemistry, chemical crystallography and materials.

· CellML

Cell Markup Language is an XML-like machine-readable language mainly developed for the exchange of computer-based mathematical models. CellML was originally developed for biological applications, but later proved to be applicable also to other disciplines. It can incorporate mathematical metadata by leveraging existing languages, including MathML and RDF. CellML can hold information about data models, mathematic formulas and equations as well as metadata.

· RDF

The Resource Description Framework, RDF, is a language for the representation of information about resources on the World Wide Web. Since the World Wide Web moves towards semantic web structures, RDF was designed as a machine-readable XML-like language that describes networks. RDF tags employ a controlled RDF vocabulary. The idea behind RDF is the identification of Uniform Resource Identifiers (URIs) and the description of resources in terms of simple properties and property values.

 

Module files: