Bio4jExplorer: familiarize yourself with Bio4j nodes and relationships

Oct 10th, 2011

Hi!

I just uploaded a new tool aimed to be used both as a reference manual and initial contact for Bio4j domain model: Bio4jExplorer Bio4jExplorer allows you to:

Navigate through all nodes and relationships
Access the javadocs of any node or relationship
Graphically explore the neighborhood of a node/relationship
Look up for the different indexes that may serve as an entry point for a node
Check incoming/outgoing relationships of a specific node
Check start/end nodes of a specific relationship

Both nodes and relationships in the graph visualization are clickable and lead to their respective record. Besides, you can choose between two different layout algorithms: Level layout and Circular layout; (nodes are also draggable so that you can configure the layout as you wish)

For those interested on how this was done, on the server side I created an AWS SimpleDB database holding all the information about the model of Bio4j, i.e. everything regarding nodes, relationships, indexes… (here you can check the program used for creating this database using java aws sdk). Meanwhile, in the client side I used Flare prefuse AS3 library for the graph visualization. As always with everything we do at Oh no sequences!, everything taking part in this tool is open source. You can check the different code repositories at the following addresses:

Bio4jExplorer: Github repository for the AS3 client.
Bio4jExplorerServer: Github repository for the java web server.
Bio4jTools: Github repository including the program for creating the SimpleDB database.

All kinds of feedback/suggestions are welcome ;)

Bio4j incorporates git-flow model

Sep 19th, 2011

Hi all! So summer has almost come to an end and I have now time again to continue with the development of Bio4j project. This autumn comes with plenty of new features that will be released in the next few months; the first of them: git-flow model. Bio4j is moving forward fast and it was already time to organize its development, supporting it with an adequate model. Here is where git-flow comes in, providing a simple but yet powerful development model where releases, features, hot-fixes… can be managed without having to go crazy putting them all together. Here you have a general schema of the model:

Since @nvie wrote a really good post explaining the details of his model, I’ll just provide this link to it instead of giving a much poorer explanation than the one in the article.

Bio4j current release now available as an AWS snapshot

Jun 22nd, 2011

For those using AWS (or willing to…) I just created a public snapshot containing the last version of Bio4j DB. The snapshot details are the following:

Snapshot id: snap-25192d4c
Snapshot region: EU West (Ireland)
Snapshot size: 90 GB

The whole DB is under the folder bio4jdb. In order to use it, just create a Bio4jManager instance and start navigating the graph!

As always, any feedback/comment/question is more than welcome, (post a comment here or a question in the user group).

Improvements in Bio4j Go Tools (Graph visualization)

Jun 10th, 2011

Hi everyone!

A new version of Bio4j Go Tools viewer is available, it includes improvements in the graph visualization of GO annotation results. These are the new features:

Load GO annotation results from URL: There’s no need anymore to upload the XML file with the results everytime you want to see the graph visualization. Just enter the publicly accessible URL of the file and the server will directly get the file for you.
**Restrict **the visualization to only **one GO sub-ontology **at a time: Terms belonging to different sub-ontologies (cellular component, biological process, molecular function) are not mixed up anymore.
Choice of layout algorithms: You can choose between two different layout algorithms for the visualization, (Yifan Hu and Fruchterman Reingold).
Customizable layout algorithm time: Range of 1-10 minutes.

I also made a short tutorial showing most of the features available in the following real-world use case: GO annotation results for Era7 E. coli TY-2482 annotation with BG7 system of BGI V2 assembly

The corresponding GO Annotation results XML file is publicly available here. Just click the button ‘load file from url’ and paste the address of the file.

For those new to Bio4j Go Tools, two external open-source projects are used apart from Bio4j itself:

that’s all for now, keep an eye on the blog/twitter for updates ;)

Bio4j includes RefSeq data now!

May 30th, 2011

Hi all,

After some weeks of hard work I finally finished the importer for RefSeq data. First of all, I should clarify some points about its licensing:

Data has been retrieved from the public ftp site for RefSeq complete release. There is no extra/different data coming from other source.
Quoting NCBI site: “NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.”

Once this has been said I will go into more details of how it’s been done.

Genome elements’ sequences

Sequences are not stored on Bio4j DB but uploaded as separate files to S3 (Amazon Simple Storage Service) instead. Why doing it this way? For several reasons:

Having all sequences stored in the DB would take more than a decent amount of space
Most queries to the DB wouldn’t be done in terms of the sequence content
Relevant data included in RefSeq in terms of performing queries would be information about genes, rnas, genome elements, positions of all these elements, etc (rather than the sequence itself).
Sequences are stored in txt files whose filename is the unique version string for the specific genome element, (e.g. NC_012932.txt) That way they can easily be retrieved whenever it’s needed. Plus, S3 service provides a way of extracting a range of bytes from a file without downloading the whole content, so there’s no need of downloading the complete sequence in the case where you already know the range of the sequence you are interested in.

In some cases, no sequence can be uploaded for a genome element. These are the cases where instead of a final sequence, a list of terms as join(x…x)complement(x…x(contig(joing(…)) is provided (I never thought I’d find hundreds of lines with these terms where a sequence was supposed to be…).

Genome elements’ data

Regarding elements, the following are included (this are stored in Bio4j, not S3):

m RNA
Misc RNA
Nc RNA
r RNA
t RNA
rm RNA
CDS
gene

Data stored for all these elements includes their positions and note attribute (whenever it’s found). I have to say that we decided not to extract more information from the gbff files since it can easily be accessed navigating through Bio4j by means of the connection Uniprot entry <–> RefSeq genome element. Plus, information included in Uniprot releases is much more reliable than that found in RefSeq files.

GO Annotation graph visualizations with Bio4j Go Tools + Gephi Toolkit + SiGMa project

Apr 25th, 2011

Hello everyone ;)

We’re back from Easter holidays bringing some cool graph visualization stuff.

Bio4j Go Tools includes now a new feature providing you with an interactive graph visualization for protein GO annotations. The url of the app is still the same old one.

On the server side, we’re using Gephi Toolkit for applying layout algorithms while the corresponding Gexf file is generated with the class GephiExporter from BioinfoUtil project. The service is included in the project Bio4jTestServer, specifically the servlet GetGoAnnotationGexfServlet.

Regarding to the client side, we’re using the open-source project SiGMa for graph-visualization.

Here you have a screenshot of a small sample of GO Annotation results:

Bio4j Go Tools goes web

Apr 11th, 2011

Hi!

From now on Bio4j Go Tools is available at the following address.

This new version includes a chart for GO annotation terms frequency visualization. Some of the new chart features are:

GO terms search by name
Independent ontology chart visualization
Links to terms gene-ontology sites

In order to visualize your results just click on the “load file” button and select a XML file you previously generated using the GO Annotation service available in the first tab of the app.

Bio4j Go Tools (first version available)

Mar 22nd, 2011

Hi everyone!

As you may have seen Bio4j has already started making his way through the bioinformatics world; however there’s not as much information as there should be about the project yet. That’s where Bio4j Go Tools comes in as the first real-world example using Bio4j as back-end.

Bio4j Go Tools is a group of Gene Ontology related services and apps. (You can find more information about this in the wiki)

The services provided so far are:

Uniprot protein GO annotations retrieving
GoSlim requests with custom Slim term sets.

Both services results and client-server communication are XML based following a really simple and intuitive structure.

A user-friendly AIR application has been developed allowing the user to directly use these services abstracting the logic of the different requests.

Enjoy it ;)

comments

alper yilmaz When I put a protein id, the tools is asking for location of GoSlim.xml file, where can I retrieve this file? I checked the download page of Gene Ontology (http://www.geneontology.org/GO.downloads.ontology.shtml) but couldn’t find it.

thanks.
- Pablo Pareja Hi Alper, You’re right it can be a bit confusing the way the app asks for a file location to save the results. I just made some small changes to the app so that things are more straightforward. This version (v 1.01) is already available at Bio4jGoTools github repository
alper yilmaz Nevermind, it was asking location and filename to save the results. So, there’s no problem.. Thanks..

New Neo4j Indexing API update

Feb 22nd, 2011

So the moment is approaching for Bio4j!

I’m updating the platform indexing system right now, putting it up to date with the new index API for Neo4j. Once this change is made, there won’t be any other obstacles in the way for Bio4j to be launched…

Just wait a couple of days more…

Blog Archives Newer →