(295e) A Generic Motif Discovery Algorithm for Diverse Biomolecular Data

Conference

AIChE Annual Meeting

Year

2005

Proceeding

- Genomic Approaches to Systems Biology

Time

Wednesday, November 2, 2005 - 9:30am to 9:50am

Authors

Jensen, K. L. - Presenter, MIT

Styczynski, M., MIT

Stephanopoulos, G., Massachusetts Institute of Technology

Here we present a novel algorithm, GeMoDA (Generic Motif Discovery Algorithm), for discovering repeated patterns in real-valued, sequential data without requiring alignment or pattern enumeration. GeMoDA is closely related to the Teiresias algorithm, but borrows new concepts from frequent itemset mining and graph theory to extend its capabilities. GeMoDA is generic in the sense that it is applicable to almost any type of sequential data, e.g. DNA/protein sequences, protein structures, and time series data. Given a user-defined notion of similarity and identity, GeMoDA deterministically enumerates all repeated patterns of a minimum length and support. In the context of DNA sequences, for example, similarity may be the Hamming distance between two subsequences. These two subsequences may be considered identical if they have at least 5 bases in common over a window of 10. Given such a problem statement, GeMoDA will find all patterns that occur at least k times and are of at least size l---where k and l are user inputs---such that all the patterns are maximal in both support and length. These patterns can be modeled using any number of motif representations including position-weight matrices, hidden Markov models, and regular expressions.

We show applications of GeMoDA in three domains: protein and DNA sequence motif discovery, and the discovery of repeated substructure discovery in protein structures. Unlike other algorithms for these two applications---viz. Gibbs sampler, MEME, Dali, and others---GeMoDA's output is guaranteed to be both optimal and exhaustive.

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

2024 International Mammalian Synthetic Biology Workshop (mSBW)

Upcoming Conferences & Events

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

2024 Brazil Student Regional Conference

2024 Dow Sponsored CCPS Process Safety Faculty Workshop

2024 International Mammalian Synthetic Biology Workshop (mSBW)

2024 Chemical Ventures Conference

2024 China Chem-E-Car Competition

2024 India Student Regional Conference

CCPS India Regional Meeting

CCPS Process Safety Knowledge Webinar (Brazil)

CEP: July 2024

CEP: June 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(295e) A Generic Motif Discovery Algorithm for Diverse Biomolecular Data

AIChE Annual Meeting

2005

2005 Annual Meeting

Systems Biology

- Genomic Approaches to Systems Biology

Wednesday, November 2, 2005 - 9:30am to 9:50am

Authors

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams