Bioinformatics Project-Examples

Please find on this page an overview of selected example bioinformatics projects that we have conducted in the past.


1. EST-Sequencing: Post-Sequencing Analysis & EST Database Design

Together with the research groups of Elly Tanaka (MPI-CBG Dresden) and Tony Hyman (MPI-CBG Dresden), we are working on EST sequencing projects of the newt Ambystoma mexicanum and the amphibium Xenopus laevis.

To this end, we are performing post-sequencing analysis of ESTs, which involves several features:

  • statistical analysis of the sequenced ESTs, with the goal to identify the redundancies of sequenced clones
  • quality control of the sequenced ESTs (with the program 'pregap4' from the Staden software package)
  • assembly of the sequenced ESTs to sequence contigs (using the program TIGR-Assembler 2.0 - resulting contigs are aligned with the program ClustalW)
  • Automated Domain Analysis of the sequence contigs using the program blastx (NCBI-BLAST) to search the fasta-formatted conserved domain database (CDD)
  • Precomputed BLAST-searches of the sequence contigs using the program blastx (NCBI-BLAST), searching against the non-redundant and EST-databases available at the NCBI
  • Automated Annotation of Contigs based on their homology to other proteins and the GeneOntology database (GO-database)

Having solved the data acquisition and analysis problems, we are also developing an EST database with a user-friendly web interface to make the data easily accessible for the researchers and its curator.


2. MS BLAST Evaluation on the Basis of Phylogenetic Distance

MS BLAST is a search protocol tailored to identify very short peptide sequences, usually produced by Mass Spectrometry techiques, by sequence similarity searching (Shevchenko, et al., 2001). It is a daughter of the WU-BLAST2 search program (W. Gish, 1996-2002, http://blast.wustl.edu).

While MS BLAST has proven to work well for identical peptides, it has not been clear, whether it can reliably identify peptides on the basis of related proteins. In order to answer this question, we performed a high-throughput analysis of MS BLAST sequence similarity searches, relating the success rate of identification to the phylogenetic distance of the queried protein to the closest, fully sequenced genome. The full results of this analysis will be published soon.

The scoring system of MS BLAST maximizes the score and therefore does not rely on the Expect-value, that, for regular BLAST searches, usually gives a good measure for the statistical relevance of a hit. While this is an improvement for the identification of MS queries, it makes the interpretation of MS BLAST results less straightforward. In order to assist researchers in interpreting their results, a user friendly interface for MS BLAST search results was developed by Scionics.

This interface is installed at the EMBL MS BLAST server and can be reached by scientists all over the world.

Interested? ... Then look at http://dove.embl-heidelberg.de/Blast2/msblast.html

Our collaborators in this work are: Andrej Shevchenko (MPI-CBG, Dresden), Shamil Sunyaev (Harvard Medical School, Boston), and Peer Bork (EMBL, Heidelberg).


3. Detecting weak sequence similarity

Detection of weak sequence similarity requires expertise that goes beyond the standard use of bioinformatics tools. The user has to be able to detect similarities irrespective of statistical significances of observed hits. While relatively little effort is put into developing tools for detection of weak sequence similarities, relationships beyond the so-called 'twilight zone' of sequence similarity are some of the most essential information driving biological research.

We have a record of successful projects carried out with researchers along these lines:

BAR family of proteins:

  • Miaczynska M, Christoforidis S, Giner A, Shevchenko A, Uttenweiler-Joseph S, Habermann B, Wilm M, Parton RG, Zerial M. in Cell 116, 445-56, 2004: APPL proteins link Rab5 to nuclear signal transduction via an endosomal compartment (PubMed)
  • Habermann B. in EMBO Rep. 5, 250-5, 2004: The BAR-domain family of proteins: a case of bending and binding? (PubMed)
  • Spitzenberger F, Pietropaolo S, Verkade P, Habermann B, Lacas-Gervais S, Mziaut H, Pietropaolo M, Solimena M. in J Biol Chem. 278(28), 26166-73, 2003: Islet cell autoantigen of 69 kDa is an arfaptin-related protein associated with the Golgi complex of insulinoma INS-1 cells (PubMed)

Detection of orthologues of S. cerevisiae and C. elegans in higher eukaryotes

  • Dammermann A, Muller-Reichert T, Pelletier L, Habermann B, Desai A, Oegema K. in Dev Cell 7, 815-29, 2004: Centriole assembly requires both centriolar and pericentriolar material proteins (PubMed)
  • Pelletier L, Ozlu N, Hannak E, Cowan C, Habermann B, Ruer M, Muller-Reichert T, Hyman AA. in Curr Biol. 14, 863-73, 2004: The Caenorhabditis elegans centrosomal protein SPD-2 is required for both pericentriolar material recruitment and centriole duplication (PubMed)
  • Schwickart M, Havlis J, Habermann B, Bogdanova A, Camasses A, Oelschlaegel T, Shevchenko A, Zachariae W. in Mol Cell Biol. 24, 3562-76, 2004: Swm1/Apc13 is an evolutionarily conserved subunit of the anaphase-promoting complex stabilizing the association of Cdc16 and Cdc27 (PubMed)
  • Hannak E, Oegema K, Kirkham M, Gonczy P, Habermann B, Hyman AA in J Cell Biol 157(4), 591-602, 2002: The kinetically dominant assembly pathway for centrosomal asters in Caenorhabditis elegans is gamma-tubulin dependent (PubMed)


4. DEQOR: a web-based tool for Design and Quality Control of siRNAs

Deqor is a web-based tool that can help researchers doing RNA interference with the design of their siRNAs. Deqor is especially designed for esiRNA-based RNA interference, where long double stranded DNA (dsRNA) is in vitro digested for subsequent transfection to mammalian cell culture.

The requirements for an efficient siRNA or a larger piece of dsRNA is to 1) have a good silencing quality of the siRNA and 2) not have any cross-silencers. Several reports state that next to having a relatively low GC-content (between 20% and 50%) and no stretches of identical nucleotides along the sequence, to ensure asymmetry of the siRNA with a lower melting temperature at the 5' end of the anti-sense strand. Like this, the antisense strand of the siRNA is more likely to remain associated with the RISC-complex and therefore, the siRNA will lead to more efficient silencing of the target gene.

Deqor uses these quality criteria to score each potential siRNA in an RNA-sequence for their silencing efficiency. In addition, a BLAST search against the transcriptome of the selected model organism is carried out to identify stretches that could potentially cross-silence another gene.

For a more detailed description of Deqor, please look at our publication Henschel A, Buchholz F, Habermann B. in Nucleic Acids Res. 32 (web server issue), W113-20, 2004: DEQOR: a web-based tool for the design and quality control of siRNAs

To get access to the Deqor-server, go to http://cluster-1.mpi-cbg.de/Deqor/deqor.html.

Deqor is run at the internal cluster of the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden.

Deqor is also available for local installation, both as a web tool (coming soon in a mosix-free version), as well as in command line form that can be used for high throughput analysis.


In case you are interested in above mentioned projects and bioinformatics tools, please contact us at bioinformatics@scionics.de .

Copyright Scionics Computer Innovation GmbH 2009. All Rights Reserved. Impressum