Bio4jExplorer: familiarize yourself with Bio4j nodes and relationships

Hi!

I just uploaded a new tool aimed to be used both as a reference manual and initial contact for Bio4j domain model: Bio4jExplorer Bio4jExplorer allows you to:

  • Navigate through all nodes and relationships
  • Access the javadocs of any node or relationship
  • Graphically explore the neighborhood of a node/relationship
  • Look up for the different indexes that may serve as an entry point for a node
  • Check incoming/outgoing relationships of a specific node
  • Check start/end nodes of a specific relationship

Both nodes and relationships in the graph visualization are clickable and lead to their respective record. Besides, you can choose between two different layout algorithms: Level layout and Circular layout; (nodes are also draggable so that you can configure the layout as you wish)

For those interested on how this was done, on the server side I created an AWS SimpleDB database holding all the information about the model of Bio4j, i.e. everything regarding nodes, relationships, indexes… (here you can check the program used for creating this database using java aws sdk). Meanwhile, in the client side I used Flare prefuse AS3 library for the graph visualization. As always with everything we do at Oh no sequences!, everything taking part in this tool is open source. You can check the different code repositories at the following addresses:

All kinds of feedback/suggestions are welcome ;)

Bio4j incorporates git-flow model

Hi all! So summer has almost come to an end and I have now time again to continue with the development of Bio4j project. This autumn comes with plenty of new features that will be released in the next few months; the first of them: git-flow model. Bio4j is moving forward fast and it was already time to organize its development, supporting it with an adequate model. Here is where git-flow comes in, providing a simple but yet powerful development model where releases, features, hot-fixes… can be managed without having to go crazy putting them all together. Here you have a general schema of the model:

Since @nvie wrote a really good post explaining the details of his model, I’ll just provide this link to it instead of giving a much poorer explanation than the one in the article.

Bio4j current release now available as an AWS snapshot

For those using AWS (or willing to…) I just created a public snapshot containing the last version of Bio4j DB. The snapshot details are the following:

  • Snapshot id: snap-25192d4c
  • Snapshot region: EU West (Ireland)
  • Snapshot size: 90 GB

The whole DB is under the folder bio4jdb. In order to use it, just create a Bio4jManager instance and start navigating the graph!

As always, any feedback/comment/question is more than welcome, (post a comment here or a question in the user group).

Improvements in Bio4j Go Tools (Graph visualization)

Hi everyone!

A new version of Bio4j Go Tools viewer is available, it includes improvements in the graph visualization of GO annotation results. These are the new features:

  • Load GO annotation results from URL: There’s no need anymore to upload the XML file with the results everytime you want to see the graph visualization. Just enter the publicly accessible URL of the file and the server will directly get the file for you.
  • **Restrict **the visualization to only **one GO sub-ontology **at a time: Terms belonging to different sub-ontologies (cellular component, biological process, molecular function) are not mixed up anymore.
  • Choice of layout algorithms: You can choose between two different layout algorithms for the visualization, (Yifan Hu and Fruchterman Reingold).
  • Customizable layout algorithm time: Range of 1-10 minutes.

I also made a short tutorial showing most of the features available in the following real-world use case: GO annotation results for Era7 E. coli TY-2482 annotation with BG7 system of BGI V2 assembly

The corresponding GO Annotation results XML file is publicly available here. Just click the button ‘load file from url’ and paste the address of the file.

For those new to Bio4j Go Tools, two external open-source projects are used apart from Bio4j itself:

that’s all for now, keep an eye on the blog/twitter for updates ;)

Bio4j includes RefSeq data now!

Hi all,

After some weeks of hard work I finally finished the importer for RefSeq data. First of all, I should clarify some points about its licensing:

  • Data has been retrieved from the public ftp site for RefSeq complete release. There is no extra/different data coming from other source.
  • Quoting NCBI site: “NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.”

Once this has been said I will go into more details of how it’s been done.

Genome elements’ sequences

Sequences are not stored on Bio4j DB but uploaded as separate files to S3 (Amazon Simple Storage Service) instead. Why doing it this way? For several reasons:

  • Having all sequences stored in the DB would take more than a decent amount of space
  • Most queries to the DB wouldn’t be done in terms of the sequence content
  • Relevant data included in RefSeq in terms of performing queries would be information about genes, rnas, genome elements, positions of all these elements, etc (rather than the sequence itself).
  • Sequences are stored in txt files whose filename is the unique version string for the specific genome element, (e.g. NC_012932.txt) That way they can easily be retrieved whenever it’s needed. Plus, S3 service provides a way of extracting a range of bytes from a file without downloading the whole content, so there’s no need of downloading the complete sequence in the case where you already know the range of the sequence you are interested in.

In some cases, no sequence can be uploaded for a genome element. These are the cases where instead of a final sequence, a list of terms as join(x…x)complement(x…x(contig(joing(…)) is provided (I never thought I’d find hundreds of lines with these terms where a sequence was supposed to be…).

Genome elements’ data

Regarding elements, the following are included (this are stored in Bio4j, not S3):

  • m RNA
  • Misc RNA
  • Nc RNA
  • r RNA
  • t RNA
  • rm RNA
  • CDS
  • gene

Data stored for all these elements includes their positions and note attribute (whenever it’s found). I have to say that we decided not to extract more information from the gbff files since it can easily be accessed navigating through Bio4j by means of the connection Uniprot entry <–> RefSeq genome element. Plus, information included in Uniprot releases is much more reliable than that found in RefSeq files.

GO Annotation graph visualizations with Bio4j Go Tools + Gephi Toolkit + SiGMa project

Hello everyone ;)

We’re back from Easter holidays bringing some cool graph visualization stuff.

Bio4j Go Tools includes now a new feature providing you with an interactive graph visualization for protein GO annotations. The url of the app is still the same old one.

On the server side, we’re using Gephi Toolkit for applying layout algorithms while the corresponding Gexf file is generated with the class GephiExporter from BioinfoUtil project. The service is included in the project Bio4jTestServer, specifically the servlet GetGoAnnotationGexfServlet.

Regarding to the client side, we’re using the open-source project SiGMa for graph-visualization.

Here you have a screenshot of a small sample of GO Annotation results:

Bio4j Go Tools goes web

Hi!

From now on Bio4j Go Tools is available at the following address.

This new version includes a chart for GO annotation terms frequency visualization. Some of the new chart features are:

  • GO terms search by name
  • Independent ontology chart visualization
  • Links to terms gene-ontology sites

In order to visualize your results just click on the “load file” button and select a XML file you previously generated using the GO Annotation service available in the first tab of the app.

Bio4j Go Tools (first version available)

Hi everyone!

As you may have seen Bio4j has already started making his way through the bioinformatics world; however there’s not as much information as there should be about the project yet. That’s where Bio4j Go Tools comes in as the first real-world example using Bio4j as back-end.

Bio4j Go Tools is a group of Gene Ontology related services and apps. (You can find more information about this in the wiki)

The services provided so far are:

  • Uniprot protein GO annotations retrieving
  • GoSlim requests with custom Slim term sets.

Both services results and client-server communication are XML based following a really simple and intuitive structure.

A user-friendly AIR application has been developed allowing the user to directly use these services abstracting the logic of the different requests.

Enjoy it  ;)

comments

  • alper yilmaz When I put a protein id, the tools is asking for location of GoSlim.xml file, where can I retrieve this file? I checked the download page of Gene Ontology (http://www.geneontology.org/GO.downloads.ontology.shtml) but couldn’t find it.

    thanks.

    • Pablo Pareja Hi Alper, You’re right it can be a bit confusing the way the app asks for a file location to save the results. I just made some small changes to the app so that things are more straightforward. This version (v 1.01) is already available at Bio4jGoTools github repository
  • alper yilmaz Nevermind, it was asking location and filename to save the results. So, there’s no problem.. Thanks..

New Neo4j Indexing API update

So the moment is approaching for Bio4j!

I’m updating the platform indexing system right now, putting it up to date with the new index API for Neo4j. Once this change is made, there won’t be any other obstacles in the way for Bio4j to be launched…

Just wait a couple of days more…