Epilepsy Comorbidity Analysis using SCAIView

#Epilepsy-Comorbidity-Analysis-using-SCAIView

This notebook contains the Quantification of gene overlap comparing Epilepsy with other disorders using text mining

#This-notebook-contains-the-Quantification-of-gene-overlap-comparing-Epilepsy-with-other-disorders-using-text-mining

Authors: Daniel Domingo-Fernández and Charles Tapley Hoyt

#Authors:-Daniel-Domingo-Fernández-and-Charles-Tapley-Hoyt

Following, the set of queries used in this analysis

Reference queries:

Queries used for calculating pleitropy rates

The queries were retrieved using SCAIView version 1.7.3 Corresponding to the indexing of MEDLINE on 2016-07-14T13:50:07.797575Z.

*Note that the reference queries might take time since thousand of articles need to be analyzed.

Notebook results

#Notebook-results
DiseaseReference QueryNumber of documentsGenes associated with the diseaseGene set sizeNormalized pleitropy rate (%)
Alzheimer's disease[MeSH Disease:"Alzheimer Disease"]109495496839613.65
Migraine[MeSH Disease:"Migraine Disorders"]30928123030610.54
Parkinson's disease[MeSH Disease:"Parkinson Disease"]7910336462588.89
Hypertension[MeSH Disease:"Hypertension"]39119055742528.68
Dementia[MeSH Disease:"Dementia"]18380258332207.58
Diabetes[MeSH Disease:"Diabetes Mellitus"]39441166611846.34
Anxiety[MeSH Disease:"Anxiety Disorders"]8413817821244.27
Arthritis[MeSH Disease:"Arthritis"]25932753671224.20
Cataracts[MeSH Disease:"Cataract"]5215022381194.10
Colon cancer[MeSH Disease:"Colonic Neoplasms"]1072743646301.03
Urinary incontinence[MeSH Disease:"Urinary Incontinence"]34170720240.82
Peptic ulcers[MeSH Disease:"Peptic Ulcer"]682341445210.72
COPD[MeSH Disease:"Pulmonary Disease Chronic Obstructive"]356272244150.51

Table 1. Results of the Epilepsy Comorbidity Analysis using SCAIView.

#Table-1.-Results-of-the-Epilepsy-Comorbidity-Analysis-using-SCAIView.

Description of each column:

Column 1. Disease.

Column 2. Reference query for the disease.

Column 3. Number of documents retrieved using the disease reference query.

Column 4. Total number of genes found in the corpus retrieved with the reference query for the disease.

Column 5. Number of genes with a relative entropy greater than 0 retrieved from a query containing the disease of interest and epilepsy. An example for diabetes would use the following query: MeSH Disease:"Epilepsy" AND MeSH Disease:"Diabetes Mellitus" and the corpus would contain articles that mention Epilepsy and Diabetes. The relative entropy is calculated using the occurrence of genes/proteins within this query and comparing with their occurrence in MEDLINE.

Column 6. Normalized pleitropy rate. Overlap of genes in comparison with the Epilepsy geneset (total of 2901 genes) containing genes with a relative entropy greater than 0 using the Epilepsy reference query MeSH Disease:"Epilepsy" (192245 documents).

Load resources

#Load-resources

Parsing result files from SCAIView

#Parsing-result-files-from-SCAIView

First column structure: Common Name;Internal Identifier;Relative Entropy;Reference Entity Count;Entity Count;Query Entity Count;

HGNC names and relative entropy greater than 0 will only be extracted

It seems to be a problem with the structure of the exported csv file because pandas is not able to import it

Overlap between genesets with epilepsy

#Overlap-between-genesets-with-epilepsy

Supplementary analysis

#Supplementary-analysis

Overlap between Alzheimer's, migraine and Parkinson's queries

#Overlap-between-Alzheimer's,-migraine-and-Parkinson's-queries
Loading output library...

Distributions of the geneset relative entropies

#Distributions-of-the-geneset-relative-entropies

Explanation about the calculation of relative entropies can be found in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541249/

Relative entropy (p1, p2) = p1 * log (p1/p2)

Where p1 is the number of abstracts containing the entity in the query selected corpus and p2 denotes the total number of documents in which the entity occurs within an unspecific reference corpus (i.e. the entire Medline). The Kullback–Leibler divergence ranks those entities high, which have especially high frequency in the selected corpus in comparison to the unspecific reference corpus. This means that frequently occurring entities do not receive high ranks. For example, using the query “ ‘Alzheimer’s Disease’ AND ‘Evidence marker’ AND ‘Human Genes/Proteins’ ”, we retrieved 331 abstracts containing IL1B with a frequency ranking of 10. Conversely, according to the relative entropy formula, IL1B has an entropy rank of 34 despite its high occurrence in Medline (i.e. 40685 abstracts).

Check the distribution of relative entropies in each query

#Check-the-distribution-of-relative-entropies-in-each-query
Loading output library...