All posts by

BRM (Bioinformatics Resource Manager)

During my internship with PNNL, my mentor Susan presented the following abstract of our work at the annual Superfund Meeting.

The Bioinformatics Resource Manager (BRM) is a software environment for data management, mining, integration and functional annotation of high-throughput (HT) biological data. We’ve recently added functionality for processing microRNA (miRNA) data, identifying conserved miRNA gene targets from multiple public databases, and integrating mRNA and miRNA datasets. miRNAs are noncoding RNAs that direct post-transcriptional regulation of protein coding genes. These added software capabilities allow for identification of differentially regulated genes from HT microarray or RNAseq platforms that result from misexpression of miRNAs. Here we show that developmental exposure of zebrafish to 30 uM nicotine from 6-48 hours post fertilization (hpf) results in alteration of both miRNAs and their putative mRNA targets in whole embryos at timepoints consistent with secondary motor neuron axon migration. Functional analysis of differentially regulated (p<0.05) gene targets indicates that nicotine exposure disrupts genes involved in neurogenesis through regulation by miRNAs. The current miRNA workflow in BRM allows for efficient processing of multiple miRNA and mRNA datasets within a single software environment with the added capability to interact with public data sources and visual analytic tools for analysis of HT biological data at a systems level. BRM is developed using JavaTM and other open-source technologies for free distribution (http://www.sysbio.org/dataresources/brm.stm).

Tilton, Susan C (BATTELLE (PACIFIC NW LAB)); Tal, Tamara (Oregon State University); Scroggins, Sheena M (BATTELLE (PACIFIC NW LAB)); Gibson, Tara D (BATTELLE (PACIFIC NW LAB)); Franzosa, Jill (Oregon State University); Peterson, Elena S (BATTELLE (PACIFIC NW LAB)); Tanguay, Robert (Oregon State University); Waters, Katrina M (BATTELLE (PACIFIC NW LAB))

The End (kind of)

GSoC is wrapping up now, with the final ‘pencils down’ date of 8/22. While the project website hasn’t been updated very much, the project itself has progressed nicely. After a hackathon with mentor Rob Buels, the final code was cleaned up and the distributions are all set to be released.

The following distributions have been extracted:

Bio-Root

Bio-Das

Bio-Event

Bio-Range
(now includes Bio::Location)

Bio-Coordinate

Bio-Factory

All can be found on github in the bioperl-live repo

GSoC gave us the chance to start this large project, as well as set up a workflow to use for the rest of the project. Mentor Chris Fields published an intro to github as well as scripts for extracting modules from the main bioperl project, they can be found on the BioPerl wiki. There’s still more to be done with the project and further developments will be posted here as they happen.

GSoC Mid-point Review

The GSoC project is now at the midpoint. Reviews are due this week, for both the students and the mentors. Its been quite the learning experience so far. The project plan changed at the very beginning, as I was expecting to have to do a lot more by hand. However, my mentors introduced me to Dist Zilla and all the amazing plugins this option offers. This means the Read Me, Build/Make files and some test files are written all in one step. I’m very glad to have learned this way of doing things, otherwise I don’t think I’d be as far along on the project as I am. Comparing my progress to the original timeline, I’m right on track.

The following distributions are outlined to be extracted:

Bio-Root:
Extracted
bioperl-live is tested and working with this extracted

Bio-Das:
Extracted
bioperl-live is tested and working with this extracted

Bio-Event:
Extracted
bioperl-live is tested and working with this extracted

Bio-Range:
Extracted
bioperl-live still needs to be tested

As far as the original project outline, only Bio-Coordinate and Bio-Factory are left to be extracted. Hopefully I can get more done than this with the time left in the project. Either way, the reorganization is off to a great start and will be easy for others to pick up and continue.

GSoC Week 4/5

I’ve spent the last two weeks trying to get the first extracted distribution ready to go, as well as ensuring that bioperl-live still passes its tests. As of 6/26, three distributions have been successfully extracted : Bio-Root , Bio-Das, Bio-Event.

Next up on the chopping block are: Bio-Range, Bio-Factory and Bio-Coordinate.

This project has been a constant struggle, learning Github, using the command line for eveything, reading lots of code, but I think its paying off. Each week I overcome some hurdle, and have to ask less ‘user error’ type of questions and more practical ‘how do things work’ type of questions. I really understand why Google throws a party for the mentors at the end of the summer, they definitely deserve it!

GSoC Week 3

Since I finally got the Bioperl pluginbundle together last week, I moved on to actually using it this week. The first goal of the actual reorganization is to break out Bio::Root into its own distribution. This will include the following modules (as of now, this may change): Bio::Root, Bio::Root::Exception, Bio::Root::RootI, Bio::Root::IO, Bio::Root::Storable and Bio::Root::Utilities. Bio::Root::HTTPget has been removed and will be packaged into its own distribution. There is still some tweaking to be done, but progress is being made….

GSoC Week 2

I spent the week working on the Bioperl Pluginbundle. After running into some beginner user errors (gotta watch my pasting!), my mentors pointed me in the right direction. Part of the coding required that I learn about Moose. Moose essentially makes object oriented programming easier. Rob pointed me to an excellent overview found here. Still working on the bundle itself.

GSoC Week 1

Coding officially started on May 24th. I spent the week working with my mentors to finalize plans for reorganization. We decided to create Dist::ZIlla::PluginBundle::Bioperl to help in the process of creating new distributions. This bundle is based off of the FLORA bundle (with permission), with two new added features:

Git::Tag
: This plugin creates a tag in git once the release is done.

NextRelease :This plugin updates the next release number in the changelog.

I spent most of the week learning how to use Dist::Zilla. I found two really useful tutorials to help.
The first one is very detailed and sometimes difficult to follow, but has lots of information: Complete Tutorial
Another site I found is more of a walkthrough and somewhat easier to follow (in my opinion) Casual Walkthrough.

GSoC 2011- BioPerl Project

BioPerl’s current structure of roughly 2000 modules has become a hindrance to its use, maintenance and development. Because there are so many modules, all with very intricate dependencies, it has become very difficult for the average user to download and use as well as complicated the maintenance and further development by knowledgeable users.

For the Google Summer of Code Project, I propose to contribute to the reorganization of this great tool. By breaking the project down from one giant library of 2000 modules into smaller distributions, the BioPerl tools will be easier to use, maintain and develop. Under the guidance of my mentors, Rob Buels and Chris Fields,  I will work to break this project down into smaller pieces. This includes improving the characterization of the dependencies as well as improving  (or building) testing systems for the new distributions.

The ultimate goal of the BioPerl reorganization project is to break down the entire giant distribution into smaller, tool specific distributions. While this cannot be accomplished by myself in the short internship time, part of the goal is to make it easier for others to contribute to the reorganization. By starting from the bottom up and breaking out the larger distributions, with more dependencies, it will give others the starting point to break out the smaller distributions. In the end, users will be able to download only the BioPerl tools they need for their project, rather than having to download the entire distribution.

BioPerl Timeline

This timeline provides a rough guideline of how the project will be done.

Before April 25:

  • Familiarize myself completely with cpanm, working to install CPAN modules in a local::lib
  • Study the BioPerl repository on GitHub, making sure to understand the many interdependencies of the modules
  • Continue working to understand testing and module authoring required for the reorganization
  • Continue project for Sol Genomics to create module to upload to CPAN

Follow up: On track so far. Sol Genomics module can be found at:   github Bio-AGP

April 25 – May 23 (Before the official coding time):

  • Actively communicate with my mentor (via IRC, email and mailing lists) to plan the details of the project and ultimate goals that can be accomplished in the limited time frame
  • Continue to study the interdependencies of the BioPerl modules to continue planning of the reorganization
  • Continue to study dependency management and discovery
  • Completely understand CPAN distribution creation and uploading

Follow up: The handling of the project has changed somewhat. We have decided to make a Dist::Zilla plugin bundle for the Bioperl project. github Dist::Zilla BioPerl PluginBundle

May 24 – June 13 (Official coding period starts):

  • Begin reorganization from the bottom up, by breaking out the Bio::Root distribution
  • Write new build file for Bio::Root distribution
  • Edit the bioperl-live build file to show updated dependency requirements
  • Ensure test file for each feature is up to date as well to ensure configuration, building, testing, bundling and installing the distribution will be successful
  • Write README file
  • Publish on CPAN
  • Communicate with mentor about progress, continue planning of reorganization

June 13 – July 4:

  • Break out Bio::Das and Bio::Event from the BioPerl distribution, as their only dependency within BioPerl is Bio::Root
  • Write new build file for Bio::Das and Bio::Event distributions
  • Edit the bioperl-live build file to show updated dependency requirements
  • Ensure test file for each feature is up to date as well to ensure configuration, building, testing, bundling and installing the distribution will be successful
  • Write README files
  • Publish on CPAN
  • Communicate with mentor about progress, continue planning of reorganization

July 4 – July 25:

  • Extract Bio::Location from the BioPerl distribution
  • Write new build file for Bio::Location distribution
  • Edit the bioperl-live build file to show updated dependency requirements
  • Ensure test file for each feature is up to date as well to ensure configuration, building, testing, bundling and installing the distribution will be successful
  • Write README file
  • Publish on CPAN
  • Communicate with mentor about progress, continue planning of reorganization

JULY 11-15th MID TERM EVALUATION

July 11 – July 15:

  • Communicate with mentor about progress of project so far, re-evaluate the project goals for the summer based on time left

July 25-August 16:

  • Extract Bio::Factory and Bio::Coordinate from the BioPerl distributions, as their only dependencies within BioPerl were Bio::Root and Bio::Location
  • Write new build file for Bio::Factory and Bio::Coordinate distributions
  • Edit the bioperl-live build file to show updated dependency requirements
  • Ensure test file for each feature is up to date as well to ensure configuration, building, testing, bundling and installing the distribution will be successful
  • Write README files
  • Publish on CPAN
  • Communicate with mentor about progress, continue planning of reorganization

August 16- August 22:

  • Finalize all documentation, README files and make roadmap of continued reorganization