OK INBRE Bioinformatics Bulletins

September 2006    October 2006    November 2006    December 2006    January 2007    February 2007    March 2007    April/May 2007   

2005/2006 archive

 

Automated Annotation September 2006
         One major desire of scientists is to understand in detail the relationship between the genome and the metabolic processes of an organism. Couple this with the desire with the advancing efficiency of high-throughput technology, and you have a scenario that produces a plethora of biological data that needs to be managed, organized, characterized and interpreted to extract meaningful information. To meet this need, public biological databases have arisen as vital resources to the scientific community. Because of their rising importance to science, it is important to be awar of the different databases that are available. This bulletin will introduce you to different types of databases that exist and to examples of some of the most popular databases... (download)

 

Introduction to PSI-BLAST October 2006
         Computationally identifying the existence of homology between two sequences can be a challenging task at times. Tertiary structure comparisons of members of a protein family are often times the only way to determine homology between two distantly related protein sequences. Because not all protein sequences have a solved crystal structure, we must rely soley on the amino acid (and nucleic acid) sequence for detecting evolutionary relationships. At the sequence level, the use of statistical models of conserved amino acid patterns, often referred to as motifs, profiles and hidden Markov models, are the best available resource for this task. While these models are useful for identifying weak sequence similarities between members of a protein family, using them for sequence similarity searching has required the use of multiple tools. BLAST is a great tool for fast searching for related sequences to a query sequence but will often miss more distantly related sequences.
         One method that combines the speed of BLAST with the sensitivity of statistical models for identifying distantly related sequences is called Position Specific Iterative BLAST or PSI-BLAST... (download)

 

Introduction to PHI-BLAST November 2006
         Protein sequence analysis is often focused on sequence elements in a group of related sequences that have been conserved throughout evolution. These elements are conserved patterns of amino acids known as motifs, or signatures, that often correspond to some important functional or structural domain. Signatures help to characterize some of the essential amino acids in a protein family or a functional or structural domain. Investigators studying related proteins are often concerned with the significance of these patterns within a particular sequence of interest and also with the use of these patterns in recognizing divergent homologous family members.
         PHI-BLAST is a modified version of a protein-protein BLAST where it searches for significantly similar sequences to both a query sequence and a signature. This tool is accessible at NCBI... (download)

 

Programming for Bioinformatics December 2006
         In previous versions of this newsletter, we have focused on some of the common bioinformatic tools available to the budding bioinformatician. These, and many other applications, could be thought of as the "core" of bioinformatics, and certainly a great deal can be accomplished solely with them. But what happens when you need to address a question for which an application does not exist? It’s not a surprising or rare occurrence; original research projects often require original analyses. And perhaps just as importantly, many situations in the modern biology lab require manipulation of very large datasets in order to process data for use in core applications and to help organize and interpret the results emanating from them. In these circumstances it is useful for a biologist to have at least some basic programming skills at their disposal. Apart from these (often lab-specific) tasks, programming can also open up to broader applications for use by the general biological community.... (download)

 

Bioinformatics and Whole-Genome Shotgun Sequencing January 2007
         Bioinformatics is an important part of Whole-Genome Shotgun Sequencing (WGS). WGS is one of the two main strategies employed in sequencing genomes. WGS is the method typically used to sequence smaller genomes (bacterial, viral, archeal, and eukaryotic genomes that do not have many repeats.) In this method, small random fragments of dna are isolated, sequenced, and assembled into one large contiguous sequence, referred to as a contig. This bulletin will briefly introduce the reader to WGS and the many bioinformatics programs that are applied for a successful project... (download)

 

Introduction to the Protein Data Bank February 2007
         Having the structure of a protein can help to reveal how a protein sequence relates to its structure and function. It can reveal important amino acids responsible for the structural integrity of the protein and what residues make up the active sties or protein binding sites. The analysis of protein structures is possible because of the availability of protein structure resources. Before there were sequence database, protein structure databases were created to store solved three-dimensional structures of proteins and their annotations. In this bulletin, the reader will be briefly introduced to protein structure determining methods and the Protein Data Bank (PDB)... (download)

 

Protein Homology Modeling March 2007
         A major goal of protein modeling is to predict the three-dimensional structure of a protein solely from amino acid sequence information with an accuracy that is comparable to experimentally determined protein structures. The ability to make such predictions would no longer limit scientist to only studying experimentally determined structure, but would greatly increase the number of structures to study. These accurately predicted structures could be used in studies such as protein structure-function relationships, protein-protein interactions, and rational protein design.
         While there are multiple techniques for protein structure prediction, protein homology modeling is the easiest and most powerful and accurate... (download)

 

Expressed Sequence Tag Analysis April/May 2007
         A major goal of many in the field of biology is to understand the complex biological organization and processes that involve DNA, RNA and proteins. To achieve such a goal, scientists must identify and annotate all the genes and gene products in the organism. With the advancements in biotechnology and bioinformatics, we have a plethora of sequence data (DNA, RNA, and amino acid sequences) to analyze and the computational tools use in the analyses... (download)