Protein Data Bank (PDB)


What is Protein Data Bank (PDB)?
  • The Protein Data Bank (PDB) is a repository for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids.
  • The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.
  • Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses.
Click on the link below to view a PDB archive:

                                   http://www.rcsb.org/pdb/home/home.do

 

Below are the examples of proteins and their descriptions :

Name of protein
Molecule     
Polymer ChainsLength

Amylase
 
ALPHA-1,4-GLUCAN-4-GLUCANOHYDROLASE
1
A
425

Trypsin
 BETA-TRYPSIN
1
E
223
Pepsin
PEPSIN
1
A
326
HTRA
Putative serine protease
1
A
134

Carboxypeptidase
CARBOXYPEPTIDASE A
1
A, B, D, E
307

S.M.I.L.E.S. For Dummies

INTRODUCTION

SMILES stands for Simplified Molecular Input Line Entry System. It is a specification in a form of line notation which describes the structure of chemical molecules using short ASCII strings.

SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.


GRAPH-BASED DEFINITION

In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.


EXAMPLES

Atoms: Atoms are represented by standard abbreviation of chemical elements in square brackets, for example [Au] for gold. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O.

For an atom holding one or more electrical charges is  it can be shown  by the sign '+' for a positive charge or by '-' for a negative charge.


Thus, the hydroxide anion is represented by [OH-], the oxonium cation is [OH3+] and the cobalt III cation (Co3+) is either [Co+3] or [Co+++].

Bonds: Bonds between aliphatic atoms are assumed to be single unless specified otherwise and are implied by adjacency in the SMILES string. For example the SMILES for ethanol can be written as CCO. Ring closure labels are used to indicate connectivity between non-adjacent atoms in the SMILES string, which for cyclohexane and dioxane can be written as C1CCCCC1 and O1CCOCC1 respectively.
For a second ring, the label will be 2 (naphthalene: c1cccc2c1cccc2 (note the lower case for aromatic compounds)), and so on.


After reaching 9, the label must be preceded by a '%', in order to differentiate it from two different labels bonded to the same atom (~C12~ will mean the atom of carbon holds the ring closure labels 1 and 2, whereas ~C%12~ will indicate one label only, 12). Double, triple, and quadruple bonds are represented by the symbols '=', '#', and '$' respectively as illustrated by the SMILES O=C=O (carbon dioxide), C#N (hydrogen cyanide) and [Ga-]$[As+] (gallium arsenide).

Aromatic: C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Benzene, pyridine and furan can be represented respectively by the SMILES c1ccccc1, n1ccccc1 and o1cccc1. Bonds between aromatic atoms are, by default, aromatic although these can be specified explicitly using the ':' symbol. Aromatic atoms can be singly bonded to each other and biphenyl can be represented by c1ccccc1-c2ccccc2. Aromatic nitrogen bonded to hydrogen, as found in pyrrole must be represented as [nH] and imidazole is written in SMILES notation as n1c[nH]cc1.
 
 
Visualization of 3-cyanoanisole as COc(c1)cccc1C#N
 
 


Branching: Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform. Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N (
see depiction) and COc(cc1)ccc1C#N (see depiction) which encode the 3 and 4-cyanoanisole isomers.

Stereochemistry: Configuration around double bonds is specified using the characters "/" and "\". For example, F/C=C/F (
see depiction) is one representation of trans-difluoroethene, in which the fluorine atoms are on opposite sides of the double bond, whereas F/C=C\F (see depiction) is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.
Configuration at tetrahedral carbon is specified by @ or @@. L-Alanine, the more common enantiomer of the amino acid alanine can be written as N[C@@H](C)C(=O)O (
see depiction). The @@ specifier indicates that, when viewed from nitrogen along the bond to the chiral center, the sequence of substituents hydrogen (H), methyl (C) and carboxylate (C(=O)O) appear clockwise. D-Alanine can be written as N[C@H](C)C(=O)O (see depiction). The order of the substituents in the SMILES string is very important and D-alanine can also be encoded as N[C@@H](C(=O)O)C (see depiction).

Isotopes: Isotopes are specified with a number equal to the integer isotopic mass preceding the atomic symbol. Benzene in which one atom is carbon-14 is written as [14c]1ccccc1 and deuterochloroform is [2H]C(Cl)(Cl)Cl.



APPLICATION ON SOME MOLECULES

MoleculeStructureSMILES Formula
DinitrogenN≡NN#N
Methyl isocyanate (MIC)CH3–N=C=OCN=C=O
Copper(II) sulfateCu2+ SO42-[Cu+2].[O-]S(=O)(=O)[O-]
Oenanthotoxin (C17H22O2)
CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO
Pyrethrin II (C22H28O5)
 

 
COC(=O)C(\C)=C\C1C(C)(C)[C@H]1C(=O)O[C@@H]2C(C)=C(C(=O)C2)CC=CC=C
Aflatoxin B1 (C17H12O6)
 
O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5
Glucose (glucopyranose) (C6H12O6)
 
OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)1
Thiamin (C12H17N4OS+)
(vitamin B1)
 
 
 
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2
Vanillin
O=Cc1ccc(O)c(OC)c1
Melatonin (C13H16N2O2)
CC(=O)NCCC1=CNc2c1cc(OC)cc2
Nicotine (C10H14N2)
CN1CCC[C@H]1c2cccnc2