The Following scripts generate three different 3D networks of the RXR protein with Deoxythymidine triphosphate (dTTP) and Deoxyadenosine triphosphate (dATP) bound at allosteric sites. Each network untilizes the protein contact map and the PDB file for RXR. Note that each node in all networks represents a residue. In the first two networks the edges connecting nodes represent the probability of a "contact" or proximity between residues defined by a Euclidean distance.
The first network utilizes the Carbon alpha coordinates from the PDB file to postion the nodes. For this reason, the 3D structure of the first network precisely represents the molecular structure visualization in Chimera, or other such software.
The second network utilizes the layout function (layt()) in Plotly as opposed to the PDB coordinates. This however comes with a tradeoff that I have been unable to adjust for. On the one hand, the layt() in Plotly allows for a larger window for visually presenting the graph, i.e. you can see it better. On the other hand, the network only closely approximates the 3D molecular structure. It is close, but it is by no means precise.
The third network utilizes the layout function as well so the same tradeoffs apply. However, the edges in this network follow the folding pattern, or conformation, observed in the static molecular structure visualized from the PDB file.
Two specific objectives are intended:
Make explicit the work I have been doing and why it is being done.
The main utility of these graphs is to observe how network properties (such as betweenness centrality) are distributed in the 3D structure of the protein.
'dAT0_CA' is a Python list object that stores the Carbon alpha xyz-coordinates from the PDB file.
First, the text file 'dAT0_coords' is created using the AWK statement below. The subsequent file has 745 lines and three fields. Each line corresponding to residue coordinates.
The following Python script converts 'dAT0.coords' to the Python list object 'dAT0_CA' with 745 elements. Each element contains an xyz-coordinate for every residue in the order in which they appear in the PDB file.
-5.935, 44.528, 18.896-2.407, 43.973, 17.563 -0.512, 41.212, 15.724
'dAT0_group' is a Python list object that stores a map of node attributes.
First, the text file 'dAT0_group' is created using the series of AWK statements below. The subsequent file has 745 lines and two columns. Each line includes a residue ID and a numeric value to indicate whether the residue is involved in ligand binding (1) or not (0). This map of node attribute values will be used to color nodes in the subsequent networks.
The following Python script converts 'dAT0_group' to the Python list object 'group' containing 745 elements. Each element contains either a 1 or 0 for every residue (in the order in which they appear in the PDB file) depending on whether the residue is also in the list of residues determined to be involved in ligand binding at the allosteric sites. The list of residues generated in Chimera were based on a 5 Angstrom distance from the dTTP and dATP ligands.
'dAT0_labels' is a Python list object that stores a list of node IDs.
The following Python script converts the 'dAT0_labels' text file into a Python list object that stores the list of node IDs. Each ID contains the three letter residue ID followed by the residue position in the protein sequence
['MET_1', 'HIE_2', 'VAL_3',...,'TTP_744', 'DAT_745']
'dAT0_numlist' is a Python list object that stores the residue edge list as a set of numeric values.
The mean.con is easily converted into a network of nodes and edges. Each integer value represents a residue that appears as a node in the network. The mean value represents the weight of the edge (probability of 'contact'). The mean.con is essentially a symmetric matrix. Therefore, it is only necessary to use either the upper or lower matrix triangle. Additionally, the mean.con may be further trimmed by eliminating values where there is always a contact (weight = 1) or where there is no contact (weight = 0).
First, we trim the contact map as described above using the following AWK statement and save the result to 'dAT0_numlist'.
Second, the following Python script converts the 'dAT0_numlist' text file into a Python list object that stores the list of network edges derived from the mean.con file.
[1, 3], [1, 11], [1, 12], [1, 15], [1, 18]
Networkx reads the 'dAT0_numlist' into the interpreter and converts the edgelist into a networkx Graph object 'g'.
'Edges' is a Python list object that contains the 3D coordinates for each of the 4720 edges in the network.
Each element in this python list has eight components. There is a source node ID, a target node ID, and two xyz-coordinates. One for the source and for the target. These coordinates define the ends of the edges in the network.
This file is created in the following manner:
First, an index file is created by using the paste cmd in BASH to concatenate 'dAT0_labels' and 'dAT0_coords' to make a third file 'dAT0_coord_ids'.
MET_1 -5.935 44.528 18.896HIE_2 -2.407 43.973 17.563 VAL_3 -0.512 41.212 15.724
Next, a second index file is created using AWK cmd to parse the fourth and fifth columns of the dAT0 PDB file. This file is a temporary file; however, it is used together with 'dAT0_numlist' to create an edge or adjacency list composed of the three-letter identifiers for residues along with their sequence positions.
MET_1 VAL_3MET_1 GLU_11 MET_1 ARG_12
Second, Pandas data frames are used to match the node IDs to a set of coordinates so that the following Python list object is created:
'MET_1', 'VAL_3', -5.935, -0.512, 44.528, 41.211999999999996, 18.896, 15.724'MET_1', 'GLU_11', -5.935, 0.903, 44.528, 46.342, 18.896, 14.157 'MET_1', 'ARG_12', -5.935, -2.8489999999999998, 44.528, 46.971000000000004, 18.896, 14.697000000000001
Here Met_1 is the source node and Val_3 is the target. Met_1 has xyz-coordinates (-5.935, -0.512, 44.528) and Val_3 has xyz-coordinates (41.211999999999996, 18.896, 15.724).
The following scripts creates six Python lists. The lists Xn, Yn, and Zn hold the respective xyz-coordinate for each node, or residue. The lists Xe, Ye, and Ze contain the source and target coordinates for the respective xyz-coordinates.
The following code has been adapted from the https://plot.ly/python/3d-network-graph/ and is used to initialize and plot the layout of the network.
Examining the above network it becomes clear that the dAT0 RXR network reflects a strong approximation to the structure yielded by the Chimera (https://www.cgl.ucsf.edu/chimera/) molecular simulation. In fact, the above network can be rotated and using the image below one can visually identify the labelled residues by sight: GLN 10, ASN 109, GLY 267, ARG 379, and ASP 674.
Additionally, using Chimera functions the following image highlights all residues within five Angstroms of each of the ligands. A list of these residues was downloaded from Chimera and used to create the list of node attributes 'dAT0_group'. These are the residues highlighted in 'light blue'. The ligands are colored 'yellow' and 'red'. Again, on inspection one observes that the residue organization in the network very closely approximates that of the molecular simulation.
Originally, the following scripts were used to generate a network using the plotly layout function layt(). As mentioned previously, this does not yield the same precision as using the PDB xyz-coordinates. I show it here merely for comparison.
With a simple adjustment to the code above the edges that connect the nodes follow pattern of the protein conformation. In other words, the edges in the previous two networks reflect the 'mean 'contacts' and in this network the edges follow the protein folding pattern.