Caffeine Proteomics Public Data


This notebook contains data processing for 4 conditions in the proteomics data from Schmidt et al. 2016.

Benjamín J. Sánchez, 2020-01-22

1. Loading Data


2. Convert Units:


First of all, note that the variation values come as coefficients of variation (%), so let's transform them to the same units as the mean values (molecules/cell):

Now everything is in molecules/cell, and as we need to transform to mmol/gDW, we need to do:

1. Abundance [mmol/cell] = Abundance [molecules/cell] / Na [molecules/mol] * 1000 [mmol/mol]
2. Abundance [mmol/gDW] = Abundance [mmol/cell] / ( cell volume [fL/cell] * cell density [g/fL] * dry content [gDW/g] )

Where Na is Avogadro's number = 6.022e+23. Cell volumes for all conditions are available in Volkmer et al. 2011.

TODO: Cell volume measurements in that reference are quite variable (Table 1), so we could account for that variability in the uncertainty .

Additional assumptions:

TODO: How much do these assumptions affect the final simulation results?

3. Data Validation


Before this study, the assumption had been that the average E. coli cell weights 1 pgDW. Let's see how close we are to that by using the new formalism:

We see the new values are slightly smaller, but in the same order of magnitude.

Finally, let's check out how much protein in total are we adding to the models. For that we need the molecular weights (g/mmol) of each protein:

We see that:

  • All means add up to reasonable fractions g(prot)/gDW (roughly half of the protein content in E. coli ).
  • The average uncertainty for all 4 conditions is below 10%

TODO: Assess instead transforming the data so that it adds to E. coli's protein content, corrected by an estimated measured fraction.

4. Data Export