Computational Problems in Proteomics: Statistics, Optimization, and Combinatorics

Alex Pothen
Department of Computer Science, Old Dominion University
Norfolk VA 23529
pothen@cs.odu.edu
www.cs.odu.edu/~pothen


Abstract

Now that the genomes of many oraganisms have been sequenced, large-scale projects are under way to characterize the protein products of the genes (the proteome) and the multi-protein complexes that are responsible for the functions of the cell. High throughput, rapid, and automatable techniques are currently being developed to identify the tens of thousands of proteins (or more) involved. Among these are protein chips, novel mass spectrometric techniques, and nuclear magnetic resonance methods (NMR).

Two computational problems in proteomics will be considered in this talk.

The first problem is to use the protein profiles of tissues, obtained via novel mass spectrometric techniques (SELDI and MALDI), to classify them into diseased or healthy specimens. Statistical classification and support vector machines are used to discover protein markers that characterize disease.

The second problem is to computationally represent multiprotein complexes and protein interaction networks using graphs and hypergraphs to enable algorithms for answering biological questions. We describe a new algorithm for identifying a ``k-core'' of a hypergraph (a sub-hypergraph in which every vertex belongs to at least $k$ hyperedges of the sub-hypergraph), and use it to characterize core proteomes of yeast. The biological significance of core proteomes is that they are expected to have similar functions in related organisms.