- Title
-
Granger causality analysis for calcium transients in neuronal networks, challenges and improvements
- Authors
- Chen, X., Ginoux, F., Carbo-Tano, M., Mora, T., Walczak, A.M., Wyart, C.
- Source
- Full text @ Elife
Granger causality (GC) analysis on synthetic data generated from N = 10 interconnected neurons of specific connectivity. |
|
|
The consistency of GC values as a function of trial duration T and connection strength c for VAR dynamics simulated on random networks of N = 10 neurons. We measure the consistency of GC values using the Pearson’s correlation coefficient between GC values computed from two realizations of synthetic data generated using the same underlying dynamics. The consistency always improves with longer trial duration and stronger connection strength and is in general better for time series generated using VAR dynamics compared to the ones generated using the non-linear non-gaussian GLM-Calcium dynamics. Error bars are standard deviation across 10 random realizations of the network. |
Cross-correlation analysis for synthetic data generated with VAR dynamics reveals the correct interaction structure at the exact delay time difference. Here, N = 10 neurons are simulated for T = 5000 time points, the same simulated trajectory as the one used to plot Figure 1CD. (A) Cross-correlation matrix at Δt = 0, 2, 4 The true underlying dynamics as neuron-neuron interaction for Δt = 1, 2 (B) Significant cross-correlations are marked by black squares, while the true underlying network is indicated by red dots. (C) Links with top absolute value of cross-correlations, with the number of total links matching that of the true underlying connectivity. |
Same as 1—figure supplement 4, but for synthetic data generated using GLM dynamics with N = 10 neurons for T = 5000 time points, the exact same simulated dynamics as used to plot Figure 1EF. Even at the true time difference, Δt = 2 , the significant cross-correlation over-identifies true connectivity. |
Same as Figure 1—figure supplement 4, but for synthetic data generated using GLM-calcium dynamics with N = 10 neurons for T = 5000 time points, the exact same simulated dynamics as used to plot Figure 1GH. Cross-correlation cannot identify the underlying structure of connectivity. |
Presence of redundant signals harms the performance of MVGC. GC analysis performed on synthetic data generated using VAR dynamics on the network structure in the main figure, for N = 10 neurons for T = 5000 data points. GC reveals all links for networks with redundant structures, where an ‘artificial’ neuron 11 is created with identical input and output strength as neuron 1 (redundant structure, middle column). However, MVGC fails to identify causal links when the signal from neuron 1 is copied to create another ‘artificial’ neuron 11 (redundant signal, right most column). BVGC is able to identify the underlying true connectivity. |
Comparison of Granger causality analysis results at different maximum lags. Correlation between the GC values obtained for each pair of neurons using max lag L and max lag L + 1. All fish (n=9) of the motoneuron data set are used and represented by color (same as in Figure 7). The similarity in Granger causality values, measured by the Pearson correlation coefficient r, increases as L increases and then stabilizes after maximum lag L = 3 (r = 0.97 both for BVGC and MVGC between L = 3 and L = 4, as well as between L = 4 and L = 5), justifying our decision to use L = 3. Using a larger maximum lag does not bring more information but brings more complexity, and increases the computation time (especially for MVGC). |
Validation of trial length sufficiency by high correlation of Granger causality values using the first and second halves of the motoneuron data. Correlation between the GC values obtained for each pair of neurons using either the first or the second half of the calcium traces. Each color represents a different fish (n=9). The BVGC (left) and MVGC (right) results are similar (Pearson correlation coefficient) suggesting that the length of the whole time series (1000 time steps) is sufficient and appropriate to apply GC analysis (on smoothed DF/F , motion artifacts removed). The correlation is particularly for the number of time points and neurons high due to the cyclic pattern of neuronal activity leading to two very similar halves. |
|
|
The Pearson correlation coefficient shows that the residual noise carries signals of information flow, after the original calcium signals are smoothen, using the dataset f3t2 from De Vico Fallani et al., 2014 as an example (N = 11 neurons). Top-left panel is identical as Figure 4B. |
|
Correlated noise increases the error of Granger causality analysis, exemplified by synthetic data generated by VAR dynamics and corrupted by a system-wide shared noise for N = 10 neurons and T = 4000 time points, with connection strength c = 0.1265. (A) Example traces for VAR dynamics added with shared noise with increasing strength. (B) The error of GC analysis increases as the amplitude of the shared noise increases. Error bars are standard deviation across 10 random simulated trajectories of the same network, each corrupted by adding a realization of randomly generated noise. |
Same as Figure 2, but for data generated with GLM-calcium dynamics for N = 10 neurons, a connection strength of c = 1, and for duration T = 4000. Error bars are standard deviation across 10 realizations of simulation. |
Spectral Granger-causality (computed using the Barnett and Seth, 2014). Results for motoneurons (dataset f6t1, N = 12 neurons) reveals more structre when applied to smoothened fluorescence data compared to the raw fluorescence signal. The top row shows the ipsilatarel links in solid curves, and contralateral links in dashes. The top row shows the rostral-to-caudal links in solid curves, and caudal-to-rostral links in dashes. The last row shows the weighted ipsilateral ratio WIC and the weighted rostral-to caudal ratio WRC. |
(A) Example traces of the synthetic data for two chains of five neurons (N = 10 neurons for the whole system). The common stimuli, IL and IR, are sampled using the empirically observed on- and off-durations for the neuron bursts (τon and τoff). The time delay of the onset of the right stimulus IR from the offset of the left stimulus IL is also sampled using empirical distribution. Each synthetic data is generated for the duration of 1000s. (B) Success of information flow retrieval, measured by the weight of the ipsilateral GC links, WIC, and the weight of rostral-to-caudal links, WRC, as a function of the information propagation time scale τinfo, evaluated at the empirical time scales τca = 2.5s, τsampling = 0.25s, and biologically responsible base spike rate λ0=32s−1. Results from bivariate GC are plotted in purple, and results from multivariate GC are in green. (C) Success of information flow retrieval for τinfo = 0.025s and τinfo = 0.25s for synthetic data with different calcium decay time constants. Greater calcium decay constants lead to worse information retrivals. Errorbars represents standard deviations across 10 realizations of synthetic data. The empirical calcium time scale τca is indicated by the red vertical line. Error bars in panel (B) and (C) are standard deviation across 10 random realizations of the synthetic data. |
(A) Schematics for generating null models by randomly shuffling the data. The blue dots and curves correspond to the original data, the effect neuron fXand the cause neuron fY; the green ones are generated by random cyclic shuffling of the original fY(t), with gray rectangles indicating matching time points before and after the cyclic shuffle. (B) The probability density of the F-statistics of the bivariate Granger causality for an example experiment (dataset f3t2 from De Vico Fallani et al., 2014 with N = 10 neurons). Blue circles are computed using the smoothed Calcium signals, and the green crosses are computed using the null model generated with cyclic shuffles. The black curve is the F-distribution used in the naive discrimination of GC statistics, and the gray curve is the best fitted F-distribution, fitting an effective number of samples. The gray and the black dashed vertical lines indicate the significance threshold for Granger causality, using the original threshold and the fitted threshold, respectively. (C) Same as (B), for multivariate Granger causality analysis. (D) Comparison between the significant GC network for the experiment f3t2 using the naive vs. fitted F-statistics threshold. |
(A) Description of the whole pipeline. Before running the GC analysis on the raw calcium transients (DF/F), the lag parameter must be chosen. Then, before running the GC analysis again, motion artifacts are corrected, the fluorescence is smoothed and a new threshold is calculated. (B) Comparison of WIC and WRC before and after applying the pipeline to the GC analyses. Points in the upper right triangle gray-shaded area represent the ratios that have increased in the final GC results. Each fish (n=9) is represented by a color, BVGC by circles and MVGC by triangles. Means are shown in black. We find that using our pipeline, WIC strongly increases and we can clear out the spurious contra-lateral links present in the original GC. WRC is not significantly improved: the recording frequency and GCaMP decay are likely too low and slow compared to the speed of rostro-caudal propagation of the information flow. (C) Network results before (top row) and after (bottom row) applying our pipeline for computing bivariate Granger causality (BVGC), for all fish. For consistency in the network representation and better ability to compare, we removed the uncharacteristically behaving neurons of the GC analysis before applying the pipeline. Note that more links are found on the side that was better in focus in the recording. (D) Same as (C) but for the multivariate Granger causality (MVGC). |
|
Correlation of the drive and receiving values with the cell position and signal-to-noise ratio, and comparison across neuron groups. (A) The neuronal drive, calculated as the sum of the strength of all outgoing Granger causality (GC) links for each neuron, is not correlated to the neuron position. In data recorded by a two-photon laser scanning microscope, different pixels are scanned across the sample with a very small delay. Panel A shows that the scanning direction and delay do not influence the GC results. (B) The neuronal drive is correlated to the signal-to-noise ratio (SNR) of the calcium traces. It is especially high for medial neurons (in red, r = 0.69 compared to r = 0.45 across all 139 neurons). After computing the new threshold for significance with a corrected null hypothesis and the normalized GC values, the correlation between the drive and SNR decreased. (C) The neuronal drive is significantly higher for motor-correlated neurons than for other neurons (Mann-Whitney U test; ns for not significant when p >0.05, ***** when p <0.0001). This observation suggests that motor-correlated neurons are important drivers of the network activity. (D) The receiving value of a given neuron is defined as the sum of all its incoming GC links. As for the drive, it is not correlated to the cell position. (E) The receiving value is correlated to the SNR. This correlation is reduced when computing the receiving value using the normalized GC values. (F) The receiving value is significantly higher for motor-correlated than for other neurons. This shows strong information flow between the motor-correlated neurons (Mann-Whitney U test; ns when p > 0.05, ***** when p <0.0001). |
Distribution of the F-statistics of the shuffled data before and after re-scaling. (A) Original distribution of the BVGC F-statistics of shuffled data, ℱshuffled i→j. It has a shifted support that is larger than that of the F-distribution (red curve), meaning that if we use the original F-test based on the F-distribution as a null model, even if there are no Granger-causal links between two neurons, we will classify the link as significant. (B) For each neuron pair, (i,j) the distribution ℱshuffled i→j is well-described by a constant-rescaled F-distribution. Applying an adaptive threshold on F i→j can be simplified to applying the original threshold on the normalized, F~ i→j defined by dividing the naive F-statistics with the expectation value of the F-statistics generated by the shuffled data (F ~ i→j = F i→j /⟨Fshuffled i→j⟩). (C) Original distribution of the MVGC ℱshuffled i→j. (D) The normalized ℱshuffled i→j is almost perfectly described by a constant-rescaled F-distribution. |
Test for overfitting and information flow from regularized GC on zebrafish hindbrain (N = 23 neurons). (A, B) Residues of the VAR model with different max lags for training (the first 2/3 of the data) and test (last 1/3 of the data) datasets, computed for and averaged across each possible neuron pair (i, j) for neuron j driving neuron i. The overlap between the training and test curves for max lag less than or equal to 3 shows the choice of max lag l = 3 does not overfit. Error bars are standard error of the mean across all possible pairs of neurons. (C) The training and testing residue using ridge regularization for the full model shows a preferred ridge regularization parameter ~ 2 x 102 (D) The percentage of connection after GC analysis shows a decrease at the preferred ridge regularization parameter, if we use the naive threshold. Using adaptive threshold increases the connection. (E) Shows examples of information flow at different ridge regularization parameters, using the adaptive threshold. (F–H) Same as (C–E), using LASSO regularization. |
Distribution of GC link directions. Polar histogram of the proportion of angles of significant Granger causality links over all possible GC links between neurons. Using the naive bivariate GC (top row), we obtain a network that is almost fully connected, thus the proportion is close to 100% for all links and the polar histogram is close to a circle. The results of the adapted bivariate GC (2nd row), the naive multivariate GC (3rd row), and the adapted multivariate GC (bottom row), show a majority of links from right to left, suggesting an information flow from the right to the left hindbrain. This result is obtained in the medial swim-correlated neurons of the reference plane (N = 20 neurons, red, left column) and is conserved when GC is applied to all swim-correlated neurons of this plane (N = 41 neurons, blue, 2nd column), as well on swim-correlated neurons of all planes the reference fish (11 planes, 176 neurons in total, purple, third column), and across all swim-correlated neurons of all planes of all fish (n = 10 fish, 975 neurons in total, gray, right column). |
Correlation of Granger causality values using the first and second halves of the hindbrain data (N = 41 neurons). The correlation of r = 0.76 for the two halves assesses that BVGC (left) can be applied to the whole time series (1744 time steps). The correlation between the MVGC analyses of the halves of the hindbrain dataset (right) is low, and in this case, 872 time points are not sufficient to infer reliable MVGC results. |
The distributions of the tail angles for turns (tail angle α with 45∘ < |α| < 90 ∘, gray, top left of A-C) and the distributions of the new MVGC link angles (blue, bottom left of A-C) indicate a bias when the visual grading stimulus was presented on the right towards the left across 3 hindbrain planes spaced by 10µm (A: plane 5, dorsal to B: plane 7, analyzed in Figure 8; C: plane 9, ventral to plane 7). Across these three planes, a group of strong driver neurons (middle and right panels of A-C) was found located on the right, in the locus of the MLR. This is to our knowledge the first evidence of recruitment of cells in the MLR locus during locomotion. As expected from anatomy (Carbo-Tano et al., 2022), right MLR neurons drive contralateral neurons in the left MLR and in the medulla where the reticular formation is located, both on the right and left sides. |