Note: This analysis is almost identical to this one: https://github.com/adelavega/neurosynth-lfc/blob/master/Functional%20preference%20profiles.ipynb
Here, I'll take advantage of Neurosynth's semantic data to assuming function to each sub-component of the default network.
For each region in the clustering analysis, we're going to determine how well we can classify studies that activated the region, versus those that did not, on the basis of latent topics describing the psychological states in each study.
First, we have applied a topic model to the Neurosynth terms in order to reduce them to 60 topics. These topics are more robust and intepretable than individual terms found in studies.
Next, for each ROI, I've selected a set of studies that activate the region, and a set of studies that do not.
Then, I've used used a naive Bayes classifier to discriminae these two sets of studies on the basis of the Neurosynth topics associated with each study.
From this analysis we can determine: a) How well studies that activate each ROI can be differentiated from studies that do not b) Which topics are most predictive of activity in region (measures using the log odds ratio)
Finally, we can apply statistical tests to determine if the loading of topics onto each ROI sigificantly differ.
For both ROIs, we achieved moderate classification performance (comparable to previous studies)
A supplemental test would be the use permutation testing to determine which topics (for each ROI) are signficantly different from zero. Usually this would be higher up, but given the similarly of these two regions, it may be more useful to focus on their differences using the above test
permute_log_odds_ratio we perform a permutation test for each region - topic log odds ratio, resulting in the z-score and p-value of the observed log odds ratio in permuted null distribution. Small p-values indicate it is inprobably we would observe the given log odds ratio under the null distribution.
Note that this function takes a fitted RegionalClassifier model (that we generated above) and the number of times to resample as required arguments
Next, we must adjust the p-values for multiple comparisons. To do so, we will use False Discovery Rate, and focus only on a subject of tests. As such, I'm only going to include the topics that we focused on for the above plots
Finally, we use
multipletests from the stats models package to correct our p-values given an alpha of 0.01. We then consider the null hypothesis rejected if the adjusted p-value is less than 0.05 and the sign is positive (excluding less easily interpreted negative associations)
To determine if topics loadings significantly differ between regions, one option is to calculate 95% bootstrapped confidence intervals for the log odds ratio for each topic for each region and see if the CIs overlap between the two FPCN subnetworks.
Below, I plot the LOR of each topic to each region (using the same colors as above) for the 10 mo
For reference, here are the top words for each topic. The first column is the "nickname" that I assigned to this topic (and matches the plots above), and the next 9 columns are the top 9 words (in descending order) that loaded into this topic