An introduction to docking of macromolecules.

Finding the optimal orientation of one macromolecule relative to another is technically challenging and you are encouraged to read the recent literature to get a sense of the meaningfulness of the results. Some suggestions are made at the end of this document. In later versions I will include some theory. At this point I am just going to jump into the operative details.

We are going to use a program called HEX.


Most docking programs use Protein Data Bank files as input. The PDB is a macromolecular structural database with xray and nmr data on 1000's of macromolecules. PDB is also used as a term to describe the file format in which this data is stored. If you plan to do docking, you need to become familiar with the format and contents of PDB files. You can view the contents of PDB files with any simple text editor. Many programs read and write PDB format files but is it important to remember that the file written by a molecular modeling program may not include all of the information contained in a native PDB file. These programs often write barebones files with no information on secondary structure, chain id or residue name, and consist solely of element type and x, y, z coordinates. Docking programs may require much of the orginal information in the native PDB files and you need to be careful if using a PDB file from another source.

Even is you have a native PDB file, every docking exercise starts with editing the PDB files to make the input satisfy the requirements of the program. The most common adjustments are 1. adding Hydrogens, 2. removal of extraneous waters or other molecules commonly labelled HETATM in the PDB file, 3. adjustment of pH sensitive protonation, 4. editing out disordered elements, 5. selection of the monomer or biological unit important to the calculation.

The docking interactions are based on size and electrostatics. If you don't have a reasonable molecular input, the results will be meaningless.

Hex cannot make the PDB adjustments for you. I recommend Maestro to do the protein prep or you can use Chimera from UCSF to add hydrogens. To use Chimera, type chimera then File/Open to read original PDB file, Tools/Utilities/addH to add hydrogens, File/SavePDB to save a new PDB file). Chimera leaves in the non protein moietes so use a simple test editor like gedit on your new pdb file. Go to the bottom of the pdb file to cut out the section of waters or other non biomolecule entities. These are usually at the end of the PDB file and are labelled HETATM.

Exercise: Download 2PTC.pdb from the PDB. Open it with gedit. How many chains are there? Are there any hydrogens? Remove the HETATM section and save the file under nohet.pdb or something like that. Start Chimera and open nohet.pdb. Add hydrogens and then save the structure as complex.pdb. Copy the complex.pdb into two new files named e.pdb and i.pdb like this:

cp complex.pdb e.pdb

cp complex.pdb i.pdb

Use gedit on each of these two new files. Delete the portions in e.pdb that refer to chain I and delete the portions in i.pdb that refer to chain E.

Start HEX (type “hex”). File/Open/Receptor and choose e.pdb. File/Open/Ligand and choose i.pdb. File/Open/Complex and choose complex.pdb.

Two handy icons are on the right hand side: 2nd from the bottom is the the icon which will turn off side chains (in the display, they still exist in the calculation). 3rd from the bottom is the icon which will put ribbons for secondary structure.

Now select Controls/DockingControls. Most of the setting we use are the defaults except the Correlation Type should be Shape+Electrostatics and the Final Search slider should be 30. Before you press “Activate” you should realize the calculation will take about 15 minutes (my last test took 13 minutes). It is very cpu and memory intensive and you should expect the system to be very slow to respond to other commands during the run. When you are ready, press “Activate”. Now is a good time to get up and walk away for a few minutes.

After the run is finished, select File/Save/Range and then change last orientation to 500 and click ok. This will save all of the hits in separate pdb files in the current directory. Select File/Save/Summary and then specify a file name (output.sum). Select File/Save/Docking and then specify a file name (output.hex). These two files contain data about the hits. Use a terminal window to look at the contents of the output.hex file.

more output.hex


gedit output.hex

Look for a hit where the ReferenceRMS is small. In my test run, this was hit number 9.

more output.hex | grep ReferenceRMS

Now go back to the hex interface. Use the Soln slider near the bottom of the Docking Controls window to go through the solutions. You can see that the hit with the small ReferenceRMS overlaps the original complex.

There does not seem to be a way to reread the results back in to Hex to view them with the DockingControls Soln slider. However, any molecular modeling program should be able to read in the pdb output files for further evaluation. The top 20 hits are the best candidates for consideration.

You might also consider other runs with different settings. The author has suggested that the electrostatics portion of the algorithm may overwhelm the results. He suggests leaving that out and adding a molecular mechanics minimization as alternate choices of options.

When I did a second run with these settings, the known structure was the number 1 hit.

References: Some useful papers and web sites:

PDB home page

Hex home page

Chimera home page

High Order Analytic Translation Matrix Elements For Real Space Six-Dimensional Polar Fourier Correlations, D.W. Ritchie (2005) J. Appl. Cryst. 38, 808-818.

Docking Essential Dynamics Eigenstructures, D. Mustard and D.W. Ritchie (2005) PROTEINS: Struct. Funct. Bioinf. 60(2) 269-274.

Evaluation of Protein Docking Predictions Using Hex 3.1 in CAPRI Rounds 1 and 2; D.W. Ritchie (2003) PROTEINS: Struct. Funct. Genet. 52(1), 98-106.

Protein-protein docking: is the glass half full or half empty?; S. Vajda, C.J. Camacho. Trends in Biotechnology 2004; 22 110-116.

ClusPro: a fully automated algorithm for protein-protein docking; S.R. Comeau1, D.W. Gatchell, S. Vajda and C.J. Camacho Nucleic Acids Research (2004) 32, 96-99.

Principles of docking: an overview of search algorithms and a guide to scoring functions.; I. Halperin, B. Ma, H. Wolfson, and R. Nussinov. (2002) Proteins, 47, 409-443.