Toward the Bayesian brain: a generative model of information transmission by vestibular sensory neurons

1 Introduction

Mathematical modeling of the vestibular system was pioneered by Steinhausen (1) [reviewed by Straka et al. (2)], who developed a differential equation model of the mechanics of transduction in the semicircular canals during head movement. This equation, which became known as the torsion pendulum model, provided a foundation for later dynamical models of the sense organ, afferent neurons, central neural circuitry, and head and eye stabilizing reflexes during head rotation (3). Researchers showed that torsion pendulum-like dynamical models can predict firing rates of semicircular canal afferent neurons during head rotation in species drawn from all vertebrate orders from fish to mammals. However, they also discovered unexpectedly high variability in response properties and correlations between parameters of models fitted to different neurons (3). These discoveries led to population rate-coding models, in which the idiosyncratic behavior of individual neurons is quantified by differential equations or equivalent transfer function models that predict firing rate (4–9). Stochasticity was modeled by a descriptive statistic, CV*, which measures the relative variability of spontaneous interspike intervals (10).

Analysis of model parameters revealed that dynamical response properties co-vary systematically with each other and with CV* across the afferent population. The same pattern is found in all species. Units with more regular spontaneous firing activity (low CV*) tend to have higher spontaneous and average firing rates, to be less sensitive to rotation, and to fire in phase with angular velocity (11). Units with more irregular spontaneous firing activity (high CV*) tend to have lower spontaneous and average firing rates, to be more sensitive to rotation, and to be phase-advanced relative to angular velocity. Parameters vary continuously between these two extremes. Goldberg and Fernandez (10) introduced the terms “regular” and “irregular” as a “rhetorical convenience” associated with simple bivariate statistical analysis on artificially defined subsets of the data. This has led to a persistent misconception that there are two naturally occurring functional subclasses of neurons in the afferent nerve (3, 12).

These models could describe the data but did not explain why the neurons behave as they do or why the patterns occur. We suggest approaching such questions by considering models of behaviors that depend on information provided by vestibular sensory neurons. Borah et al. (13) developed a model of head-eye coordination in the framework of stochastic optimal control theory. A stochastic optimal controller decomposes mathematically into two components in series. The first component, called a state estimator or observer, estimates the relevant state variables from noisy data. The second component uses the state estimate to compute an optimal control signal, taking into account the observer's uncertainty about the true state [reviewed by (14)]. Optimal stochastic control models were able to predict dynamical and psychophysical features not predicted by earlier deterministic models developed in the framework of classical control theory. This suggests not only that the brain takes account of uncertainty in sense data when it computes control signals but also that head-eye coordination may involve two stages of neural information processing. In the first stage the brain uses sense data to construct a belief about (i.e., perceive) relevant world and body states, and in a second stage the brain decides how to respond given what it perceives, i.e., how to act given what it believes. The prediction that the vestibular-cerebellar hindbrain contains (or is) a neural analog of a dynamical state estimator is supported by a range of anatomical, behavioral and psychophysical evidence (15).

Despite its success in modeling the dynamics of head-eye coordination, stochastic optimal control theory has had little impact on models of underlying neural mechanisms, no doubt partly because the theory is restricted to systems with linear dynamics and Gaussian errors, and employs an algebraic formalism that cannot be mapped onto neural mechanisms in a realistic way (16, 17). But linear-Gaussian stochastic optimal control theory is now recognized as a special case of Bayesian decision and control theory (18, 19), which has become a standard modeling framework in cognitive science (20, 21). A Bayesian observer computes the conditional probability distribution of states, a.k.a. the posterior distribution, given some measurements. In the special case of linear dynamical systems with Gaussian measurement errors the mean and covariance of this distribution, i.e., the entire distribution in that case, can be calculated algebraically by an algorithm called the Kalman filter (22).

The success of Kalman filter models at the psychophysical and behavioral level (23) suggests that a way forward at the neural level is to model semicircular canal afferent neuron spike trains as observations transmitted to a Bayesian observer in the brain, without assuming linear dynamics or Gaussian uncertainties. In the Bayesian framework, observations are random samples from probability distributions parameterized by variables of interest to the observer. It follows that the first task in constructing or analyzing a Bayesian observer is to determine the probability distribution of the data parameterized by the relevant variables.

Existing models show robustly that the relevant state variables for modeling information transmission by semicircular canal afferent neurons are angular velocity and angular acceleration around the canal axis (3). Therefore, we propose to model semicircular canal afferent neuron spike trains as sequences of intervals, each of which is a random sample from a probability distribution whose shape depends on the kinematic state of the head. This turns the conventional approach on its head, making quantification of stochasticity the primary goal of modeling, rather than merely an accounting procedure that quantifies the difference between models and data and explains it by calling it noise.

In this paper we identify probabilistic models of spontaneous firing using spike train data recorded from semicircular canal afferent neurons in chinchillas. At first sight, models of spontaneous activity, which describe how the neurons behave when the head is not moving, would seem to have limited value for understanding neural mechanisms of sensory-motor control when the head is moving. However, holding the head steady is an ecologically important behavior for many animals, for example while attending to auditory and visual cues that may betray a nearby predator or prey. Optimal control of this behavior requires inferring the kinematic state of the head from measurements when the true state is near the goal state, i.e., when the head is nearly stationary. Intuitively, a model that describes the behavior of a stochastic dynamical system in a particular state should be continuously modifiable to describe the system's behavior in nearby states. In that case a model of spontaneous activity may be extended to provide a model of driven activity in a small region of head kinematic state space relevant to explaining ecologically important behavior.

The potential for a model of spontaneous activity to generalize to model driven activity is presaged by the ability of CV*, a simple summary statistic of variability in spontaneous activity, to predict the dynamical behavior of semicircular canal afferent neurons during head movement. After presenting our probabilistic model of spontaneous firing we will expand on this point and discuss how our model may be used to analyze neural mechanisms underlying statistically optimal control and coordination of head and eye movements. This provides a simple accessible model system for empirically grounded, realistic analysis of how brains use sense data to perceive relevant world and body states, and how brains make decisions and control actions based on data-driven internal beliefs about relevant states and parameters.

2 Materials and methods 2.1 Spike train data acquisition

All procedures involving animals were approved by the UCLA Chancellor's Animal Research Committee and conformed to guidelines mandated in the NIH Guide for the Care and Use of Laboratory Animals (National Institutes of Health Publication, revised 2011), and the Guidelines for the Use of Animals in Neuroscience Research (Society for Neuroscience).

2.1.1 Animal preparation

Adult male chinchillas (n = 27; body mass 450–650 grams) were used in these experiments. They were first anesthetized with isoflurane, after which an intravenous cannula was secured within a jugular vein through which maintenance doses of sodium pentobarbital (0.05 cc, 50 mg/cc) were administered. A tracheotomy was performed into which a catheter delivering 100% O2 was loosely placed. Heart and respiratory rates, as well as O2 saturation levels, were monitored throughout the surgical preparation and recording session. Core body temperature was maintained between 38°-38.5°C with a custom servo-controlled heater and rectal thermocouple probe. Animals remained physiologically stable throughout the long electrophysiologic recording sessions, which at times lasted longer than 12 h.

Upon achieving a surgical plane of anesthesia animals were fit into a custom head holder fixed to a turntable. Surgical procedures were similar to those utilized in previous investigations of vestibular afferent electrophysiology (24). The right middle ear was exposed by removing the bony cap of the tympanic bulla. The bony ampullae of the superior and horizontal semicircular canals were identified, which provided landmarks to the internal vestibular meatus channeling the superior vestibular nerve between the labyrinth and brainstem. The superior vestibular nerve was exposed at this site, approximately 1–2 mm from the landmark ampullae, using fine diamond dental drill bits. Final exposure of the nerve was achieved by gently teasing the epineurium from the nerve with electrolytically sharpened pins.

2.1.2 Single afferent electrophysiology

Spontaneous discharge epochs from 330 semicircular afferents within the superior vestibular nerve were recorded with high-impedance microelectrodes (40–60 MΩ) driven by a piezoelectric microdrive. Spontaneous discharge was detected as the electrode approached an afferent, and generally improved with subtle adjustments in electrode position achieved by small manipulations of the microdrive (e.g., small forward and reverse displacements, in addition to gentle tapping of the drive). Upon achieving stable recording, manual turntable displacements were used to identify the epithelium from which the afferent projected. Afferents innervating the horizontal and superior cristae increased their discharge to rotations resulting in utriculofugal and utriculopetal endolymph flow, respectively, and would decrease in discharge in response to turntable rotations in the opposite direction. Afferents projecting to the utricle were generally unresponsive to rotations, or increased their discharge during application of rotations in both directions (centripetal displacements of the otolithic membrane concomitant with rotation in either direction). These afferents were excluded from the present dataset.

2.2 Data analysis 2.2.1 Data acquisition, summary statistics and exploratory analysis

Single-unit spike times were acquired in 20-second records with 300μs resolution, and imported into MATLAB as arrays of interspike interval (ISI) lengths in seconds. Plots of spike time data and ISIs were visually inspected to identify trends, discontinuities and outliers indicating possible miss-triggering during data acquisition. We tested for serial correlation in interval length using a Wald-Wolfowitz runs test (MATLAB function runstest). Records with detectable artifacts or non-stationarity were removed, leaving 306 of an initial 330 records for further analysis and modeling.

Mean (x¯), standard deviation (s), coefficient of variation (CV=s/x¯) and Pearson's moment of skewness (γ = E[(x−μ)3]/σ3) were computed for the intervals in each spike train, using MATLAB functions mean, std, and skewness. Standard deviations of interval length for the most regular units in our sample are comparable to the resolution of spike time data acquisition (300μs). Because of this, estimates of CV and skewness for very regular units may be less reliable than estimates for irregular units. CV is a scale-invariant measure of variability. It is near zero for highly regular spike trains, near 1 for completely random or Poisson-like activity and becomes larger than 1 for clumped or bursting activity. By convention, neurons whose CV falls in the lowest 1/3 of a sample of vestibular afferents are deemed “regular,” neurons whose CV falls in the largest 1/3 are deemed “irregular,” while neurons with intermediate CV are deemed “intermediate” (4, 10). As Goldberg (12) reiterated, this convention is a rhetorical convenience with no empirical basis.

2.2.2 Candidate models

The selected records are observations from a stationary renewal process that can be modeled as samples from a constant-parameter probability distribution of interval lengths. This is a complete model because the event times themselves, up to an arbitrary start time, can be recovered from the sequence of intervals between them. Since interval lengths must be positive and can have arbitrary length any candidate model must be probability density functions f(t; α) defined on t>0 with parameters α.

Previous studies have shown a consistent pattern of ISI distributions in vestibular afferent spike trains. ISI distributions of the most regular afferents have narrow distributions which are nearly symmetrical and approximately Gaussian, with standard deviations much smaller than mean interval length (σ≪μ). A Gaussian with σ≪μ>0 has essentially no probability mass below zero and can be treated as a density on t>0. ISI distributions of more irregular neurons tend to be more right-skewed with larger CVs, while interval distributions of the most irregular neurons resemble exponential distributions, with standard deviation similar to mean interval length (CV = 1). Suitable candidate models therefore are positive-valued, continuously-parameterized probability densities whose shape transforms continuously between limiting cases resembling Gaussian and Exponential distributions.

Candidate models are presented in three groups, to provide transparency about how we selected these candidates and were ultimately led by analysis of the data in three successive stages, to obtain a model that accurately reproduces the shape of the data distribution and quantifies the information that it contains. The first group (1.1–1.5) is the initial set of candidates chosen because they represent models of simple physical processes that are at least somewhat analogous to the canonical “noisy integrate-and-fire” model of a stochastic neuron and/or have been applied previously to model spiking statistics of neurons, including vestibular semicircular canal afferent neurons. The second group (2.1–2.3) contains modified forms of candidates from the first group, which appeared promising after the first round of fitting. These models have an additional parameter which, as explained below, we expected would provide a better fit. To construct the third group of candidates (3.1–3.3) we modified the same group of “promising” candidates by incorporating the additional parameter in a different way, for reasons that will be explained.

Candidate 1.1: Weibull

fWB(t; λ,κ)={κλ(tλ)κ−1e−(tλ)k t≥00 t<0

is the distribution of intervals between events when event rate is proportional to a power of the waiting time since the last event. This is a birth-death model with “aging.” When κ = 1 (constant event rate) the Weibull reduces to an Exponential distribution.

Candidate 1.2: Log-normal

fLN(t; μ,σ)= 1tσ2πe(-(ln t-μ)22σ2)

is the distribution of outcomes of a growth process involving multiplicative interactions among many small random effects. Multiplicative interactions are additive on a log scale, so the log of the outcome has a Gaussian or normal distribution because of the Central Limit Theorem.

Candidate 1.3: Erlang

fERL(t; κ,μ)=tκ-1e-tμμκ(κ-1)!

where the shape parameter, κ, is a positive integer and the scale parameter, μ, is a positive real number, is the distribution of waiting times for κ events in a Poisson process when the average waiting time is μ (such that the average waiting time in the underlying Poisson process is μκ). When κ = 1 the Erlang reduces to an Exponential distribution, the waiting time distribution for events in a Poisson process. This has been a popular model of neuronal firing variability, including for vestibular afferent neurons, because of its flexible shape which resembles empirical interval distributions, and because it has a simple mechanistic interpretation as the waiting time for the accumulation of quantal events occurring at random times to reach a threshold (25, 26).

Candidate 1.4: Birnbaum-Saunders or Cumulative Damage

fBBS(t; β,γ)= tβ+βt2γt2πe(-(tβ-βt )22γ2)

is the distribution of waiting time for the accumulation of events with a Gaussian distribution of amplitudes occurring at random times to reach a threshold. It is also known as the Cumulative Damage distribution because of its application to modeling time-to-failure of a system subjected to impacts with random magnitudes occurring at random times. It is a physically plausible model of time to threshold for a neuron receiving EPSPs with Gaussian amplitudes, which fits spike train data from real neurons and biophysically realistic computational neural models (27).

Candidate 1.5: Inverse Gaussian or Wald

fWLD(t; μ,λ)= λ2πt3e(-λ(t-μ)22μ2t)

is the distribution of waiting times for Gaussian noise with mean 1/μ with and variance 1/λ to integrate to a threshold at 1. It models the first passage time (time to reach a barrier or integrate to a threshold) of a drift-diffusion process, i.e., Brownian motion in constant flow (28, 29).

The second candidate group was constructed by modifying candidates selected from Group 1 by adding a time offset or delay term to each model. The reason for adding this term is explained in the results section, following analysis of the fitted Group 1 models.

Candidate 2.1: Offset Erlang

fOEL(t; κ,μ, τ)=τ+tκ-1e-tμμκ(κ-1)!

Candidate 2.2: Offset Wald

fOWL(t; μ,λ,τ)=τ+ λ2πt3e(-λ(t-μ)22μ2t)

Candidate 2.3: Offset Birnbaum-Saunders

fOBS(t; β,γ,τ)= τ+tβ+βt2γt2πe(-(tβ-βt )22γ2)

The third candidate group was constructed by replacing the constant offset parameter τ in the Group 2 models with an Exponentially distributed random time offset having mean τ. In each case this creates a new random variable as the sum of two random variables, whose distribution is the convolution of the distributions of the components. We changed the time offset from a fixed value to a random value for reasons explained in the results section following analysis of the fitted Group 2 models.

Candidate 3.1: Exerlang

fEXE(x; κ,μ, τ)=1τ(1-μτ)κe-xτ gammainc (x(1μ-1τ), κ)

This expression for the convolution of an Exponential distribution and an Erlang distribution was obtained using Mathematica (Wolfram Research, Illinois, USA). Gammainc is the MATLAB incomplete gamma function, a MATLAB built-in special function. The incomplete gamma function is defined slightly differently in MATLAB and Mathematica, so the result derived by Mathematica requires adjustment to obtain the formula given above.

Candidate 3.2: Exwald

fEXW(x; μ,λ, τ)={e(λμ−tτ)((erfc(b−c))d+d(erfc(b+c)))(2τ) if a≥0e(λμ−tτ)e−(b2+at)Re(w(−at+ib))τ otherwise.

Where a=λ(2μ2)-1τ, b=λ(2t) and c=at. erfc is the complementary error function, w is the Fadeeva scaled complex complementary error function (30), i= -1 and Re(z) is the real part of the complex number z. This expression was modified from formulae given by Schwarz (31), by setting the barrier distance/threshold level parameter in the Wald component of Schwarz's derivation to 1 and scaling the other parameters accordingly. We found that this expression can be numerically unstable when λ≪μ (diffusion negligible compared to drift) or τ≪μ (Exponential component negligible compared to Wald component). In the former case we reduced the Wald drift-diffusion component to a pure drift, approximating the Exwald using an Exponential distribution with fixed time offset, μ. In the latter case we removed the Exponential component, approximating the Exwald using only the Wald component. None of our data were fitted by models with parameters in regions of parameter space where these approximations were applied, but it was necessary to include these approximations to prevent numerical instability in the numerical search procedure used for fitting the models, which explores a wider parameter space before converging.

Candidate 3.3: Exgaussian

fEXG(x; μ,σ,τ)=12τe(2(μ-x)+σ2τ)erfc(μ-x+σ2τ)

This expression for the convolution of a Gaussian distribution with mean μ and variance σ2 and an Exponential distribution with mean interval parameter τ was derived analytically using Mathematica (Wolfram Research, Illinois, USA). In this expression, erfc(x) = 2π∫x∞e-t2dt, is the complementary error function, a MATLAB built-in Special Function.

2.2.3 Fitting and model selection criteria

Given an observed probability distribution p(t), and a model q(t), the Kullback-Liebler divergence from q(t) to p(t), also known as entropy of p(t) relative to q(t), is

DKL(p||q )=∫p(t)log2p(t)q(t)dt. (1)

DKL measures information lost in bits when q(t) is used to approximate the empirical distribution, p(t). Given a set of candidate models, minimum DKL identifies the candidate that minimizes the expected information in future observations, given what has been observed (3, 32, 33). This criterion, which avoids the problem of over-fitting (i.e., models with too many free parameters that fit better but make worse predictions) provided the theoretical foundation used by Akaike (34) to derive his famous information criterion (AIC) for model selection (n.b. the cited paper is a reprint of Akaike's original 1973 conference paper). AIC is asymptotically equivalent to DKL, i.e., is expected to give the same answer as DKL on average in the long run. AIC is usually used instead of DKL because DKL requires the true data distribution to be known. We apply DKL under the bootstrap assumption that our data sets are large enough represent the true shapes of the distributions that they are drawn from, which is justified by inspection of ISI histograms (35, 36).

Given N observations t1, t2, ⋯ , tN, the empirical distribution can be represented as a normalized frequency histogram, with probability pk=nkN in the k th bin, where nk is the number of observations in the k th bin. Assuming that q(t)≈qk is constant in the kth bin, the expression for DKL reduces to a sum,

DKL(p‖q )=∫p(t)log2p(t)q(t)dt= ∑pklog2pkqk (2)

If each bin is very narrow and contains at most one observation then q(t) = qk and the normalized histogram reduces to a particle model, with probability p(tk)=1N at the observed points tk and zero elsewhere. In that case the expression for DKL reduces to

DKL(p‖q )=∫δ(t−tk)Nlog2(δ(t−tk)N q(t))dt

View original article

FRONTIERS IN NEUROLOGY

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Toward the Bayesian brain: a generative model of information transmission by vestibular sensory neurons

Comments (0)