Building a Supercomputing Pipeline from Genome Sequence to Protein Structure and Drug Design

Yutaka Akiyama

Director, Computational Biology Research Center (CBRC), AIST


By the rapid progress of genome sequencing and related bio-technologies, several hundreds of living creatures have already been analyzed in its DNA sequence level. Now human being has a capability to quickly reveal genome sequences of pathogenic viruses we encounter or industrially important micro-organisms successfully found from new soil. However, as it is gradually understood well, genome sequence information itself provides us almost only a gkeyh or a gmaph to enter the complicated mystery of the target organism. (Exception is its direct contribution to understand phylogenetic relations in the evolutionary history.) In order to extract the practically valuable information from genome sequences revealed, we should transform it to protein expression and structural information level because protein and RNA molecules (both coded on the genome) are the real players in a living cell. Because protein structure determination experiment needs huge cost and time, structure information already obtained is more than thousand times fewer than DNA level information currently. And now, computational protein structure prediction technique is expected to become a new powerful tool to compensate this huge gap between DNA information and protein structure information.

The Computational Biology Research Center (CBRC) is engaged in a variety of bioinformatics research activities covering such themes as automatic gene finding, expression analysis, gene regulatory network estimation, protein structure prediction, protein-protein docking, and virtual screening against chemical compound databases.

We are conducting in-depth research using large-scale computing resources, such as, the Magi cluster system (1040 processors, Pentium III 933MHz), the AIST super cluster system (2048 processors, Opteron 2GHz, plus other 1000 processors), and the Blue Protein system (8192 processors, Blue Gene p440-based 700MHz, 22TFLOPS peak).

We have applied our large-scale parallel computing techniques in our genome research including systematic search for human GPCR genes (Suwa and genome-level annotation for micro-organism like Aspergillus Oryzae (Asai Those required extensive calculation for genome sequence parsing with large hidden Markov models.

One of our major research highlights is the development of FORTE (Tomii, a system for predicting three dimensional protein structures from amino acid sequences by applying profile-profile comparisons (Tomii and Akiyama: Bioinformatics, Vol.20, No.4, 2004). With this system, we participated in the Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP6) in 2004. The CBRC-3D team placed the third position among more than 200 prediction teams in gFR/H (fold recognition)h category. We have been also participating the CAPRI competition for protein-protein docking prediction. CBRC obtained top-level results among 37 teams in CAPRI7. A model selection scheme, similar to the one in CASP, was devised for docking (Hirokawa,

We have also developed CoLBA (Comparative Ligand Binding Analysis) system for efficient virtual screening procedure (Hirokawa, Using this scheme, we can utilize even a theoretical (predicted) protein structure as a docking target practically. We have actually obtained new drug lead compounds for important medical targets based on this technique.

We believe that bioinformatics techniques are getting matured with the strong support of supercomputing and soon we can build an automatic computational pipeline from genome sequence to protein structure and protein-protein or protein-compound docking.

<Biographical Notes>


Director, Computational Biology Research Center (CBRC), AIST


1984 B. Eng., Electrical Engineering, Keio University

1986 M. Eng., Electrical Engineering, Keio University

1990 Dr. Eng., Electrical Engineering, Keio University

Academic Career

1990-1992 Researcher, Electrotechnical Laboratory (ETL), AIST

- Neural network and Large-scale optimization problem

1992-1996 Associate Professor, Institute for Chemical Research, Kyoto University

- Nation-wide GenomeNet service for bioinformatics

- Parallel and vector processing for bioinformatics

1996-2000 Application Section Head, Real World Computing Partnership (RWCP)

- Bioinformatics on large-scale PC clusters and GRID

2000-2001 Senior Researcher, Electrotechnical Laboratory (ETL), AIST

- CBRC preparation team leader

2001- Director, Computational Biology Research Center (CBRC), AIST

Other work experiences include:

1999- Senior Vice President, NPO Initiative for Parallel Bioinformatics (IPAB)

2003- Visiting Professor, Keio University

2003- Visiting Professor, Tokyo Medical and Dental University

2005- Board Member, The Japanese Society for Artificial Intelligence (JSAI)

2006- Vice President, Japanese Society for Bioinformatics (JSBi)

Academic Awards

1988 Best Paper Award for Young Researcher, IPSJ National Convention

Research on parallel and distributed computing techniques for large-scale bioinformatics. Development of parallel application systems for protein tertiary structure prediction, protein molecular dynamics simulation, mass spectrometry analysis, single-cell analysis support system, etc. These research and development are conducted using large-scale computing resources, such as, the Magi cluster system (1040 processors, Pentium III), the AIST super cluster system (P32 subsystem; 2048 processors, Opteron) , and the Blue Protein system (8192 processors, Blue Gene p440-based) in CBRC.