Jed Singer: Neural Coding
Date Posted:
June 2, 2014
Date Recorded:
June 2, 2014
CBMM Speaker(s):
Jed Singer Brains, Minds and Machines Summer Course 2014
Description:
Topics: Characterizing neural firing rates, tuning curves, identifying effective stimuli, modeling spike trains, integrating information across time and across neurons, estimating response using reverse correlation, decoding fundamentals, two-way decoding, optimal decoding (Bayesian inference, ML, MAP), evaluating performance of an estimator, and kernel-based decoding from spike trains
JED SINGER: As Gabriel said, I will be speaking to you about neural coding, and in particular, about two complimentary questions. Number one, if we know what a stimulus is, as presented to some nervous system, how well can we predict what the response of a neuron that's a member of that nervous system is going to be? And sort of the flip side of that question-- if we know what a neuron or a population of neurons have done in response to a stimulus, how well can we infer what that stimulus was? In other words, can we decode the neural code?
And the more overarching sort of meta-question that I'd like you to keep in mind is, if we can do these two things-- if we can answer these two questions-- how well can we say that we understand neural function, that we understand what the brain is doing? So here's an overview of the subjects that I'll be covering. I'm not going to read through it, but it'll be coming back frequently.
The first subject that I'm going to touch on is very basic, just techniques for characterizing neural responses. So a delta function is a function that is 0 almost everywhere, except at 0. We can describe the spike train, rho, as a sum of these delta functions.
So rho as a function of t is the sum of delta functions shifted by t sub i for i spikes. And that might look something like this. The overall firing rate of a spike train, rho, is then simply the number of spikes fired divided by the time window under consideration.
And generally, in neuroscience experiments, we are presenting a stimulus multiple times, because individual neurons are not deterministic, or at least not at the level we can access. And so multiple repetitions are typically needed to properly understand how a neuron behaves. So we can average across multiple stimulus presentations.
And throughout, these angled brackets denote the average of the expected value. So the mean firing rate response to stimulus is just the mean number of spikes fired divided by the length of time. Now brains do interesting things, and the neurons that make them up do interesting things, such as varying their firing rates. So we'll oftentimes be interested in firing rate as a function of time.
One way of accessing that is to take a window, w, and slide that along, convolving the window at a spike train to give an approximation of the firing rate at a given time. At perhaps the most basic level, such a window function could be just a rectangle. If we're interested in firing rate at time t, denoted by this dashed line, we count up all the spikes in this window, divide by the width of the window, and that's our estimate at time t.
A slightly more sophisticated window might be a Gaussian curve, in which we weight action potentials closer to time t more heavily. Now, both of these techniques are what's called acausal, which means they incorporate spikes after the time in question. So one thing that you can do is limit your consideration at time t to spikes that happened before time t. So one example is this rectified exponential.
So here's an example of a idealized spike train, as we discussed previously-- sum of delta functions. And this is actually computed from the real spike train in an actual MP neuron while the monkey watched a video. So if we want to know what the firing rate profile of this neuron during the three seconds of this trial looked like, perhaps the most basic thing we can do is to end the spike train, and that's what's done here. And divide it up into 100 milliseconds bins.
And this first 100 millisecond bin-- sorry, I had a little too much coffee today-- this first 100 millisecond bin, there are five spikes. And so five spikes in 100 milliseconds implies a 50 Hertz firing rate. The next 100 millisecond bin had nine spikes, so we had a 90 Hertz firing rate, and so on for all 30 bins.
Now if we actually rather than just using a rectangular bin, use a sliding rectangular window, the profile looks considerably different. We start to see a lot more structure. And this estimate is more sensitive to rapid changes in firing rate.
Here we see the results when we use a Gaussian window, as described on the previous slide. And here we see the results with a causal rectified exponential filter. These two, you may note, quite similar except the estimates here-- the peaks and troughs and other features of this estimate-- are all shifted in time slightly later than they are here. And again, that's because this estimate is only considering spikes that came before time t in the calculation of the firing rate at time t.
So those are sort of the basic ways that we can talk about what is a neuron doing? How often is it firing? What does a spike train look like?
Typically we're interested in what a neuron does in response to different stimuli. And the description of a firing rate, the firing rate as a function of stimulus, is called the tuning curve. So one basic and classic example comes from monkey V1, primary visual cortex.
So if you present an oriented bar moving with the direction through the receptive field of a cell in V1, you might not get any spikes. You might get a few spikes. You might get a lot of spikes, all depending on the orientation of the bar relative to that cell's preferred orientation.
And it turns out if you plot firing rate as a function of orientation-- that's what's shown with these dark circles here-- you get something that looks an awful lot like a Gaussian. And so one can fit a Gaussian to these points, and thereby describe how this particular neuron responds as a function of the stimulus, so firing rate as a function of the stimulus. Yeah?
AUDIENCE: I have a quick question on the previous slide. So other one-- yeah, this one. So what are the pros and cons of using some of these? Because a Gaussian seems kind of intuitive, but then rectifier shifted in time, which may or may not be how the brain does it? So what--
JED SINGER: Right, so a non-causal filter, in other words, one that incorporates future spikes in the estimate of the firing rate at a given time, is a little bit strange, because the brain doesn't know what future spikes are going to be. So if you're trying to describe things as the brain sees them, you might want to use a causal filter, such as this bottom one. On the other hand, spike rates are typically auto-correlated. So the spike rate at a given moment is going to be quite similar to a spike rate at the next moment.
So you can actually typically learn useful information by incorporating a larger window around the time. So it largely depends on the application. And one thing we can see here versus here-- so this Gaussian had a standard deviation of 100 milliseconds, and this rectangular window had a width of 100 milliseconds.
And of course, with a standard deviation of 100 milliseconds, the overall kernel is broader than that. And you can see some of the details, like these two spikes here, it was distinctive. Whereas here, it's just sort of very low [INAUDIBLE].
So you do lose temporal precision, depending on the width of your Gaussian. So that parameter is another thing that's going to depend on what exactly you want to use your firing rate estimate for. So I'm not sure if that answered your question.
AUDIENCE: Yeah, it seems like a lot of choices.
JED SINGER: There are a lot of choices. And it's often a question of trial and error, figuring out what works best for whatever you're trying to accomplish. And we'll get a little bit into what things one might try to accomplish with firing rate a little bit later.
So another example of Gaussian tuning curves comes from a little higher up in the visual hierarchy. So this comes from an experiment by Pasupathy and Connor recording cells in monkey V4. And they used a stimulus set that was parameterized along a few different dimensions. Two of them shown here are the orientation of these little arc shapes within the receptive field, and the curvature of the end of the arc. So here, the curvature is very high, and as you get down here, the curvature is very low.
And here, they've plotted firing rates, or normalized firing rate, as a function both of curvature and angular position. So this is actually a two-dimensional Gaussian. And you can throw in more dimensions if your stimuli have more dimensions. So the limit is basically your ability to handle mathematics and the computational problem that you have at hand as well as density with which you cover it in space.
So one thing suggested by this angular position, and actually, in fact, suggested by this angle here, some stimulus dimensions are cyclic rather than linear. So in some cases, it may make more sense to fit a sinusoidal tuning curve rather than a Gaussian. So here's an example from monkey primary motor cortex, where the monkey was moving his arms in eight different directions.
And you can see that some directions associated with that movement, there's a lot of firing. And in other directions, there was little firing or even suppression. And so the firing rate was plotted as a function of movement direction. And you can see 0 and 360 degrees are the same direction. And the sinusoid fits quite nicely.
Here, this is modeled as just one cycle per cycle of the stimulus. That need not always be the case. So this comes from an experiment in which recordings were made in monkey IT in response to these sort of abstract paper clip symbols. And this particular cell responded with a peak at negative 120 degrees rotation, and then also another peak at 60 degrees rotation, sort of a quasi mirror image, which is something we see pretty often. And the curve shown sitting here is not exactly a sinusoid, but you can see how a sinusoid might do a reasonable job. Yeah?
AUDIENCE: Have they [INAUDIBLE] experiments? Like how fixed is the spike rate for a given stimulus instance? Is it actually like at 64 Hertz? Like if you put a speaker on it, you would hear like a buzz 64 Hertz?
JED SINGER: It would sound more like [RASPBERRY]. But yeah, neurons in monkey IT can fire up to a couple 100 Hertz.
AUDIENCE: I guess is there any difference in variation? If you were to plot the spike rate over time, it would be like a plot line?
JED SINGER: So again, there's tremendous variability in the firing rate profile. The eraser doesn't work so well. So one very common thing that you see is an initial transient followed by some other stuff happening. And that other stuff may be a sustained period elevated beyond baseline, may be a dip in firing rate. Other times, there's more of a ramping up in response that is more sustained. Sometimes there's just a burst of firing nothing. It's tremendous.
AUDIENCE: [INAUDIBLE] some the conditions are-- you might have the spike rate after some time after presenting the stimulus maybe, for this plot, for example.
JED SINGER: Sorry, I missed the first few words you said.
AUDIENCE: You measure that this is the spike rate after some time.
JED SINGER: Right, yes. So all of these examples are looking at spike rate just as the count of spikes divided by window length for some window after stimulus onset. The particular window varies by experiment and by brain region. Later on, maybe in 45 minutes or so, we'll look into how to use individual spikes to better understand the characteristics of response. But much of the coding and decoding work that we're talking about today will involve just spike counts. Yeah.
AUDIENCE: So neurons have a base firing rate. So where would that be? So going below a certain level means something. It's not just when it spikes up. It's spiking--
JED SINGER: Right, so many neurons have a baseline firing rate. And they can be inhibited, as you see in this example. So over here, things are clicking along at the baseline rate. And then an unpreferred stimulus comes along and actually suppresses firing rate. So yes, that can be important. There are also neurons whose baseline rate is, for all practical purposes, zero, and only respond to presented stimulus. Yeah, so depending on how you're characterizing the neural response, that may or may not be useful or obvious. But it's definitely something to be aware of.
So this slide may look familiar. This is a nice study. This is just showing an example-- in the context of this lecture-- this is just showing an example of another type of tuning curve. And as Gabriel mentioned, this is actually a behavioral tuning curve, rather than a parametric tuning curve. But it's a sigmoidal tuning curve.
And these are useful in cases where behavior, or neural firing rate, starts off at some baseline level and then increases through a dynamic period or dynamic range, and then plateaus at some peak level. So I won't get into the experiment itself, because you've already heard about it. But basically, the behavioral data are well fit by these sigmoid functions.
So that was characterization of firing rates as [? scalars ?] as a function of stimuli. So that leads to a question. Well, what are good stimuli? How do we elicit high firing rates? I guess I'm going to present it as two, but it's really one nice technique for figuring that about is using the spike triggered average, which turns out to be essentially the same as reverse correlation.
So the intuition for the spike triggered average denoted here as C, is the following. Say here's a spike train, a very simple train. There are three spikes shown-- one, two, three. And this is the stimulus.
In particular, this is a one-dimensional stimulus. What it is isn't so important. It could be contrast. It could be velocity, could be size-- something that varies along one dimension. And at these three time points, this rapidly changing stimulus elicited three spikes.
So what you do is you take some window, and if you look at the stimulus in that window leading up to each spike. And you simply take the average of all those windows. So this window plus this window plus this window. And if you do that for these three spikes plus many other spikes that aren't shown, you might arrive at a spike-triggered averaged that looks something like this.
And what this tells us is that as time proceeding of spikes gets closer and closer to that spike, immediately prior to a spike, there tends to be a strong dip in the stimulus. And presumably, that might have something to do with the elicitation of that spike. So this is just the formulation of what I described heuristically here. And C here, again, this is spike-triggered average. And tau is the time before the spike.
AUDIENCE: How do you interpret the stimulus before a spike?
JED SINGER: So the interpretation here for the spike-triggered average is that this is the kind of stimulus that best elicits the spike. Now one caveat to that is, you can only evaluate that for stimuli that are tested. So doing this properly involves covering your entire range of stimuli evenly and comprehensively. But if you've done that, then this average should approach, under a few assumptions such as linearity, should approach the ultimate stimulus to elicit a spike.
So another way of looking at the spike-triggered average ends up being called reverse correlation. So first of all, let's just define the correlation between stimulus and the response, also known as the rate stimulus correlation. So this is just the correlation between the firing rate at time t and stimulus at time t shifted by some offset tau.
And it turns out the spike-triggered average is functionally equivalent to reverse correlation-- so-called reverse correlation because of the negative sign here. So here's an example in which an electrical current was applied to a neuron whose responses are shown here in the lateral line system of the weakly electric fish, which picks up electrical fields. And you can see that there's some action potentials fired.
And again, by calculating the reverse correlation, they were able to infer what the optimum stimulus was to elicit a spike from this cell. And you can see that there's some possible voltage, followed immediately before a spike by a sharp negative voltage. And as I sort of alluded to a moment ago, reverse correlation, spike-triggered average is most useful when your set of stimuli actually covers all possible stimuli. Yeah.
AUDIENCE: Do you have a sense of how gracefully this fails when-- it's probably [INAUDIBLE] non-linear, too. That's not such a very useful thing to do. So do you have a sense of how gracefully this fails when things are non-linear? And can you get a sense of what [INAUDIBLE]?
JED SINGER: So this is much more sensitive to the parameters of your stimulus set than to the details of neural response. But it can be sensitive to that as well. And soon, I'll show you at least one basic techniques for overcoming that. Yes?
AUDIENCE: So this seems that you've shown examples of where this works on one-dimensional stimuli. Is there any way to generalize the image to two dimensions?
JED SINGER: Yes, and I'll get to that soon. It's very straightforward. But another thing that's useful for a first correlation is for the stimuli to be uncorrelated in time. So this stimulus, for example, it's not uncorrelated in time. It's not a white noise stimulus.
But if it is a white noise stimulus, then calculations get easier. And I'll come to what I mean by that later. So you can also use reverse correlation to probe temporal dependency inch by inch.
So here's an example in which reverse correlation was used to find the mean stimulus leading up to one spike, and the mean stimulus leading up to two spikes separated by approximately 10 milliseconds. And you can see that these average stimuli are quite similar. But if you start to look at the average stimulus leading up to pairs of spikes separated by approximately five milliseconds, then the average stimulus leading up that looks rather different.
And so this tells you that while seeing a couple of spikes separated by a fair amount of time may not mean much, if you see two spikes very close together, that's probably an indication that there is a much higher value of stimulus. And you can get as fancy as you want with your triggers for reverse correlation. So yes, reverse correlation works beyond one-dimensional stimuli. And it also beyond spikes.
So this is an experiment in which the triggers for reverse correlation were not neural spikes, but were actual behavioral responses of people looking at these two-dimensional arrays of visual white noise. And subjects were asked to press a button when they saw an S in that white noise. And they did this many, many times.
And here, this little sub-panel A shows what's called the classification image, the average stimulus that's elicited a report of an S. And you can maybe see very faintly an actual S here. And if you blur the stimulus and then threshold it, you can see, maybe a little more clearly, the outline of an S. And it matches rather well an actual S from a database of images of S's. So it's a very powerful and very widely applicable technique. Yeah.
AUDIENCE: So let's say you want to record a preferred stimulus on a V1 neuron. [INAUDIBLE] neurons that like [INAUDIBLE] four or five specific phase relations [INAUDIBLE]. Then there's this complex neurons that like a certain orientation or frequency, but don't care about the specific phase. So with reverse correlation in [INAUDIBLE], would one to be able to recover a Gaussian that is [? familiar ?], like positive here and negative here and the other way around?
JED SINGER: It depends on how you construct your set of test stimuli. So if you used two-dimensional spatial white noise such as this, then you should be able to recover a Gabor filter. It's not clear to me that you would be able to recover anything from a complex cell.
However, if you instead used Gabor patches of varying orientations and frequencies and phases, then your complex cell-- actually phases might confuse things. But if you used varying frequencies and orientations, then reverse correlation based on a complex cell's firing rate should give you that cell's preferred orientation and preferred spatial frequency.
AUDIENCE: So depends a lot on how you parameterize the input space.
JED SINGER: Yes, absolutely. Absolutely. So yeah, it's much more likely to work if you know something about the sort of stimuli that cell might prefer, rather than just throwing everything at it. Because everything is a very large space.
OK, so up until now, we've talked a little bit about how to describe what cells do and what they like. Now in many cases, you may want to be able to generate your own spike trains, either for testing some software, for building a model, or for perhaps even running some theoretical analyses on what networks of neurons may do.
So a little bit of background first-- a point process is a stochastic process that generates a sequence of events. And by a stochastic process, I mean a set of random variables indexed by time. A renewal process is a particular kind of point process called a renewal process because the probability of an event is uninfluenced by anything that happened prior to the previous event. So if you're trying to figure out whether to have an event, you'll [INAUDIBLE] only the most recent event. And that guides you.
And a Poisson process is a point process which is a renewal process that, in fact, has no dependence on any previous events. So in the context of this course, these point processes are going to be spike trains. And whether or not you fire an action potential, potentially in the case of a Poisson process, going to have no dependence on what happened previously, or in the case of the renewal process, may depend on the time of the previous action potential.
So this is what a Poisson distribution looks like. The probability of seeing n spikes within a time window of length t is given by this, where r is the one parameter of the Poisson process. The rate parameter, mean firing rate, and just a couple examples of what a Poisson process might give you.
So if this is the rate times the time window-- in other words, the expected number of events and spikes-- and plotted here is the probability of seeing n spikes when n equals 0, the probability of seeing zero spikes when you're expecting the number of spikes is 0 is 1. And as your expected number of spikes increases, the probability of seeing zero spikes drops off quite quickly. The probability of seeing one spike when you're expected number of spikes is 0 is 0. But the probability of seeing one spike climb is quite quickly peaking out at around one expected spike before dropping off. And as the number of spikes increases, this distribution flattens out and drifts to the right.
A slightly different way of looking at this distribution, if you plot the number of spikes against the probability of seeing that number of spikes, given that the rate parameter is 10, the dots show the Poisson values for many different numbers of spikes actually observed. And the lines show the Gaussian fit, which is quite a good fit.
AUDIENCE: So I understand how can the probability be one when n is 0?
JED SINGER: You're talking about here? So if your rate parameter times time is 0, that means that either your rate is 0, so you'll never find any spikes, or your time window is 0, so it's an infinitely small window which has room for no spikes, so in this case you'll always see zero spikes.
And the solid curve traces out the probability for seeing zero spikes. And probability for seeing zero spikes when you expect to see zero spikes is 1. Does that make sense?
So this solid curve is at 1, where rT is 0. And all of the other curves corresponds to one, two, or five spikes are at 0 when rT is at 0. So when rT is 0, you'll always see zero spikes.
So probability of n equal to 0 is 1. The probability for n greater than 0, for n equals 1, for n equals 2, n equals 5, for n equals anything, is 0. Does that make sense? It's a little bit of a complex figure.
OK, so just a few properties of the homogeneous Poisson process-- and by homogeneous Poisson process, I mean the rate parameter is constant. There are also inhomogeneous Poisson processes in which the rate parameter can vary with time. And those will be useful for modeling firing rates that change.
But just for starters, for homogeneous Poisson process, the mean interspike interval, tau, expected value of tau, is 1 over the rate. This makes sense. Spikes are temporally uncorrelated with each other. If you expect to have 10 spikes in some unit of time, you expect to have that unit of time over 10 space between spikes on average.
The variance of the interspike interval is 1 over r squared. And again, this falls out quite naturally from the definition. And the coefficient of variation, which is the standard deviation divided by the mean of interspike intervals, is 1.
Now, you can also look at the Fano factor, which is the variance divided by the mean. And if you do that, not for interspike intervals, but for spike counts within a window for a homogeneous Poisson process, that's also going to be 1. In other words, for a Poisson process, the variance in the spike counts is equal to the mean of the spike counts.
So how well does a Poisson process actually describe neural data? Usually pretty well. So this is an example from a large number of cells recorded in area MT of a monkey. And here we have the mean firing rate and the variance in firing rate plotted against each other. It's actually spike counts over about a quarter of a second.
And the diagonal line shows where variance and mean are equal. And you can see that the actual physiological data fall roughly along this diagonal line. So it suggests that a Poisson process is a pretty good description. You can also fit a couple of parameters.
So you can look at variance of a multiplier of the mean raised to some exponent. If you fit these two parameters to this data set, you see that both the multiplier and the exponent stay quite close to 1 regardless of the time window in which you count spikes. So whether you count spikes within a few milliseconds or many hundreds of milliseconds, it looks fairly close to Poisson.
One point of deviation between a Poisson process and actual neurons is that neurons are refractory. So what this means is that if a spike is tired, it's extremely unlikely or impossible for that neuron to fire another spike within some short period of time, a few milliseconds, typically. So here we have actual neural data, again from monkey MT, showing the interspike interval plotted in a histogram.
And you can see that the majority of interspike intervals are relatively small. But at the smallest possible interspike intervals, there are actually no examples. In contrast, a pure Poisson process, this would look like an exponential [INAUDIBLE].
So one thing that you can do, is just explicitly-- if you're generating a spike train using a Poisson process based on [INAUDIBLE] parameter, you can explicitly add a refractory period. So if a spike occurs too soon after an earlier spike, you can either shift it later or remove it altogether. If you shift it, reserve the firing at the expense of a little bit of pure Poisson nature. If you remove it altogether, you're going to lower your firing rate a bit. There are more sophisticated ways of keeping both of those, but I won't get into that.
So this is a spike train generated from a Poisson model with stochastic refractory period added. And you can see that it quite nicely recapitulates the [INAUDIBLE]. Another thing that you can do is, instead of drawing your interspike intervals from a Poisson distribution, you can draw them from a gamma distribution. So that adds some factor k here.
And when k equals 0, this is exactly a Poisson distribution. When k is greater than 0, you see this refractory period arising. So this is sort of a common way to model a spike train if the underlying cellular level biophysics aren't that important to you-- you just want something to reflect spikes with some firing rate, possibly a time-variant firing rate.
So next, I'm going to take a sort of a hard left turn, and talk a little bit about how we think about information being measured, both across time and across multiple neurons. Since brains operate in real time and contain many neurons, sometimes it's useful to move beyond single neurons and stationary firing rates. One very basic way to get at questions of timing is to look at autocorrelation of a spike train or firing rate time course.
So what's typically called an autocorrelation would perhaps more properly be called an autocovariance, because you use mean subtracted spike trains. But you are just taking the expected value of the product of an [INAUDIBLE] spike train cells shifted with varying time length. And here you see the autocorrelation functions of two different neurons in cat V1.
And at time lag tau 0, there's a strong peak. That makes sense. You're comparing a spike train against that same spike train shifted by 0 milliseconds. It's itself. So there's always going to be a peak at 0.
But what you also see, particularly in this bottom auto correlogram, is this strong oscillation. What this means is that the particular spike train in question has strong correlations with itself 25 milliseconds later and before, and 60 milliseconds later and before, and 75 milliseconds later and before. Another way of thinking about this is the firing rate is oscillating at 40 Hertz. It's a 25 millisecond period.
And again, that's happening to a lesser degree in the cell. You can also compare the firing rate trajectory spiking profiles of two different cells in what's called a cross correlation or cross-correlogram by replacing rho and rho with rho 1 and rho 2-- two different spike trains. So if you compare these two spike trains whose auto-correlograms are shown here with each other, the cross-correlogram looks like this.
And so you can see that not only are these two cells exhibiting an oscillation and the firing rate, both at the same frequency, but they're actually in the same phase. Because the peak here is at a time lag of 0. So this gives a very basic way of comparing time courses of two different cells.
Now whether these kinds of things are important or not has been discussed a whole lot. And you can coarsely divide studies of neural coding into two paradigms. And I'm going to do this very crudely at first, and then try to get a little bit into the nuances.
But if you're looking at a single neuron, and you're looking at its activity over time, an independent spike code holds that the only thing that matters in terms of the information being carried by this neuron is the firing rate of this neuron as it changes over time. The specific moment at which any given spike happens is not important. That's just an epiphenomenon, these fluctuations in the firing rate.
Alternatively, it may be that correlation between spike times carry additional information. So a burst of spikes may carry special importance. Or oscillatory activity may carry special importance.
Now ultimately, any given pattern of firing can be represented to any arbitrary degree of specificity by a time-varying rate code or independent spike code whose rate varies quickly enough. So this sort of distinction breaks down often. And this becomes a question of well, what sort of time course are we talking about in terms of how quickly does the underlying rate vary?
And then also, people who would argue for an independent spike code wouldn't say oscillations don't happen, or bursts don't happen, but rather that we can get the majority of the useful information merely from looking at changes in firing rates, rather than looking at the individual spikes. There's a similar dichotomy if you look at multiple neurons and look across them. So the independent neuron code holds that action potentials between two neurons or multiple neurons in a [? circulatory ?] population, the specifics of those action potentials across neurons don't carry any additional information. Again, it's just the firing rate trajectories of individual neurons.
Alternatively, there may be some extra information present in neural synchrony if two neurons fire spikes at the same time, or if they fire spikes reliably one after the other, with some fixed latency. Or if they're fluctuating at a similar or the same frequency. So these sort of issues are very fundamental, verge on the philosophical. Yeah.
AUDIENCE: Well, you know they're not independent. I mean, one fires, one [INAUDIBLE] fire also, so you know--
JED SINGER: Right, so I should be careful there. If you have, just to take a very good example, if you have one cell that has an axon with one synapse onto another cell, and that synapse is 100% effective-- so whenever this cell fires, this cell fires-- those spikes are correlated very tightly. But there's no additional information present in that correlation. So the fact that this cell fired and three milliseconds later this cell fired tells you nothing more than just the fact that this cell fired. Does that make sense?
OK, so with that brief detour over, we're going to return to reverse correlation. And use a reverse correlation to estimate or predict how a neuron might respond to an arbitrary stimulus. So the basic problem is, we want to estimate the firing rate as a function of time. So this is a varying firing rate.
So we're going to do that by assuming that there's some baseline firing rate. And then there's some kernel that's being convolved with the stimulus. Now, what is that kernel? Well, it's something that we have to estimate.
In order to estimate it, we're going to minimize the error. In other words, we're going to minimize the difference between estimate and the actual firing rate. And we want to, in order to minimize this, you can set the derivative of this to 0, or equate the derivative of this to 0. And when your stimulus is light-- in other words, when there's no temporal autocorrelation in your stimulus-- solving that becomes fairly straightforward. And it turns out that your kernel, D of tau, is the spike-triggered average times the mean rate divided by the variance.
When S is not light, things get a little hairier. At the very end of this, I'll briefly touch on a technique for dealing with that case. But since the stimulus is under your control, if this is what you want to do, it's definitely easier to use a temporally uncorrelated stimulus.
Now earlier, somebody asked about well, what if the firing rate is a nonlinear function of the stimulus? So you can still estimate the kernel as before. And then this convolution of kernel with the stimulus gives you what we'll call L of t, the linear part of the response. And you can then simply add some nonlinearity by taking some function F of that linear part, and adding that to the baseline firing rate.
So there are a couple different ways you might decide, oh, I need F. One is if you're building a model, and you can say, well, I want neurons that don't fire until some threshold is crossed, and then they increase their firing rate linearly, or something similar. To the other is to actually plot observed firing rate against L F t, and you might see something that looks like this, or looks like this, or looks like this.
And then you can say, oh, well, that looks like this sort of nonlinearity. Let's stitch a rectified linear function, or a sigmoid or a rectified hyperbolic tangent, to the data and use that as our nonlinearity. So these are just a few examples of things that are commonly seen physiologically.
So up until now, we've been talking about how to measure and describe firing rates as a function of stimulus. And the second part of this talk is sort of the reverse question of say we have some firing rates, and we want to know what stimulus is likely to have elicited these firing rates-- in other words, decoding. And at some level, this is what the brain does. There are neurons firing there, and somehow, based on the activity of those neurons, we're somehow understanding what's out there.
So first a few basics. So we've got a vector, r. This is just the vector firing rates for most of the rest of the talk. And in particular, there's stationary firing rates, so this is just a vector of scalars.
This is the response that we're working-- we're trying to go from this to some estimate of what the stimulus was that elicited this. So we've got P of s. s is the stimulus. P of s is the probability of stimulus s being presented. This is also known as the stimulus prior.
P of r is the probability of response r being [INAUDIBLE], the response prior. P of r and s is the probability that s is presented and response r is observed. This is called the joint probability between r and s. Here we have P of r given s, the conditional probability that response r was observed given than stimulus s was presented. And P of s given r, the conditional probability that stimulus s was presented given the response r which was observed.
A little bit of probability arithmetic-- the prior probability of a response is equal to the sum over all stimuli of the probability of each stimulus times the conditional probability of the response given that stimulus. So if there's a particular stimulus that's very common, now, again, in laboratory experiments, you also want your priors on stimuli to be uniform, but in the real world they're not. So P of s, where s is a face might be relatively high.
P of s where s is a unicorn might be much lower. And if you want to know what's the probability of a given response, well, the probability of that response given a unicorn is going to be weighted much less than the probability of that response given a face. Similarly, P of s is equal to the sum over all possible responses of P of s given r times P of r and for corresponding [INAUDIBLE].
Up until now, we've been considering how to estimate P of r given s. We know what the stimulus is. We want to know what's the response likely to be.
Now we want to talk about P of s given r. And given this, this, and this, we can use Bayes theorem. And we find probability of S given r is equal to P of r given s times P of s divided by P of r. And we'll get into a little bit more about how to use this in a moment.
In particular, we'll do that in the context of two-way decoding. So perhaps the most basic kind of decoding task is a two-way discrimination task. So here's one classic example.
Your goal, or the monkey's goal in this case, is to indicate which of two possible directions an array of moving dots is predominantly moving in. In some cases, every single dot is moving with the same velocity. The task is then very easy.
In other cases, the dots have 0% coherence. They're all moving with random velocities. The task is actually impossible.
In between, you can have any level of coherence. Here at 50% coherence, half of the dots are moving at the same velocity-- one of two possibilities in the experiment-- and the other half are moving at random velocity. So this experiment was run upwards of 30 years ago. And monkeys performed it as a behavioral task, indicating which of two directions the dots were moving, while neurons in monkey MP were being recorded.
Then in a given experiment with a given neuron, the direction of motion, the two directions of motion, corresponded either to that particular neuron's preferred direction-- in other words, the direction that elicited the maximum firing rate, presumably established using some of the techniques that we just talked about. And then the other direction was the opposite direction, also called the anti-preferred direction.
So how did the monkeys do? Here we have plotted on the x-axis the percent coherence. On the y-axis, we have the fraction correct. These black dots show the monkey's behavior.
And there's actually a typo here. This should be 0.4. This is 0.5.
So at very low coherence, below 1% coherence, the monkeys' performance was at chance. And by about 10% coherence, the monkey's behavior was appealing-- 100% correct. And in between, you see this nice, sigmoid, psychometric curve.
Based on the neural data, we see a very similar curve. The neuron was very poor at predicting which direction the dots were moving when they're moving with very low coherence. And they're very good at predicting which direction the dots were moving when they're moving with relatively high coherence.
So how is this neurometric tuning curve created? So bear in mind that the dots were tuned to align either with the neuron's preferred direction of motion or its anti-preferred direction of motion. Here we have histograms as a function of firing rate, showing how frequently the neuron responded with a given firing rate when the dots were moving in the anti-preferred direction versus when they were moving in the preferred direction.
And you can see, with 12.8% coherence, these two distributions overlap almost not at all. When coherence drops to 3.2%, the two distributions firing rate in response to anti-preferred direction versus firing rate in response to preferred direction overlap a fair amount. And at a 0.8% coherence, they overlap almost entirely.
So one thing that you can do is just draw a line between these two distributions. And say, all right, if the firing rate is less than this line, less than this threshold, we'll say that things are moving in the anti-preferred direction. If the firing rate is to the right of the threshold, we'll say that things are moving in the preferred direction.
Your ability to discriminate between these two distributions in such a manner it is called D prime. And D prime is very easy to calculate as long as two distributions are Gaussian and have the same variance. When that's true, D prime is just the difference of the mean rate in positive condition minus the mean rate in negative condition divided by the standard deviation.
So in other words, how far apart are these two peaks in standard deviations?
[INAUDIBLE]?
Just the distribution of firing rate for, in this case, the anti-preferred direction and the preferred direction. Does that make sense?
[INAUDIBLE] distribution [INAUDIBLE]?
Yes, so this is the histogram counting the number of trials. So each black bar here shows the number of trials in which the corresponding firing rate on the x-axis was seen. So in the anti-preferred direction, you saw firing rates of, let's say, five Hertz in about eight trials. And you saw a firing of 11 Hertz in about 17 trials, and so on. So, yeah, it's just that kind of distribution. Does that make sense? So we're using those firing rates to predict which of the two stimuli was shown for one. And this extends to multiple runs in a relatively straightforward fashion, but this is one.
So D prime is a statistic that comes to us from signal detection theory, which I believe originally came from people trying to detect radio signals in the midst of noise, and then say whether they're sigmoid or not. Another thing that comes to us from signal detection theory is this notion of the receiver operating characteristic curve. And to making an ROC curve, you plot beta against alpha, where beta is the [INAUDIBLE] beta's the test, in other words, the fraction of trials in which the condition is positive, and you say that the condition is positive.
Alpha is the false alarm rate of the test, in other words, the fraction of trials in which you say that the condition is positive, but the condition is actually negative. So in the context of this experiment, the hit rate is the fraction of trials in which the stimulus was shown in the preferred direction, and you say, correctly, "preferred".e And the false alarm rate is the fraction of trials in which the stimulus is actually in the anti-preferred direction and you say, incorrectly, "preferred."
And you can plot this out as a function of that threshold. In other words, as a function of where you draw this line, you draw it here, you draw it here, you draw it here. And many different coherence levels show a 0.8% coherence. This ROC curve follows the diagonal. In other words, your probability of getting hit is equal to your probability of getting a false alarm. And this might be clear, but when you're in that regime, you don't really know anything about the stimulus. You can even get more hits, but the cost is getting exactly as many more false alarms.
Alternatively, at 12.8% coherence, remember, you can do very well. So you can get beta, the hit rate, up to about 90% before you start getting any false alarms. And you can get your hit rate up to 100% at the cost of only about 10% false alarms.
Now in between these two extremes, there's a little range. And you made choose, OK, I want to draw my threshold right here, and get, say, 97% hit rate and 2% false alarm rate. But it's still, even if you draw these ROC curves, although it gives you a little more information, D prime, it still leaves some things up in the air as to where exactly the best place to draw your threshold is. And there's some heuristics that you can use that make sense.
So you can, for example, choose a threshold that maximizes overall performance. Or you can choose a threshold at the point where the two distributions intersect. Or you can choose to put the threshold halfway between the means of the two distributions.
But there is a way to of getting rid of the question of threshold entirely, which we'll come to in a moment. So this is another example from the Kreiman lab showing the use of signal detection theory in a multi-class decoding scenario by basically converting that multi-class problem into a two-class problem. So this experiment's data were recorded from intracranial electrode [INAUDIBLE], as she described, while those patients looked at images belonging to five categories.
Now for each electrode, one preferred category was chosen. And that category was considered in opposition to the other four categories. So in this example, the [? electric ?] preferred fruit and exhibited larger responses to fruit than to images in these other categories.
So here we see the distribution of intracranial field potential responses to fruits in black, and to non-fruits in gray. You can see that these two distributions are different, and the different means indicated by the dashed lines. But particularly, the non-food category is not Gaussian and is a different shape from the fruit category distribution. So calculating D prime here will be a little bit more complicated. But one can still calculate an ROC curve and see that we're well above the diagonal and thereby able to do quite a bit of decoding.
So if we want to get rid of that question of threshold, we can use what's called a two-alternative forced-choice task. And in this task, rather than presenting your subject with one stimulus and asking the subject to make a binary decision based on that stimulus, you present, in a given trial, two stimuli, one after the other corresponding to the two different classes in your experiment. And the subject's task is to indicate which of the two stimuli corresponds to positive class.
So going back to the direction of motion discrimination, you would present a stimulus moving in the preferred direction, a stimulus moving in the anti-preferred direction. And the monkey would have to say well, the first stimulus was the one moving in the preferred direction. So in this case, the probability of being correct is just the integral across all possible thresholds of the probability that the firing rate is equal to the threshold times your hit rate at that threshold-- in other words, the area under the ROC curve.
Essentially what you're doing in two-alternative forced-choice task is using one of the conditions as the threshold for the other condition. Now how well does this sort of thresholding do? It turns out that it's possible to do. It's possible. There's a result that's mathematically provable given some basic assumptions that it's impossible to do better than what's called the likelihood ratio, which is the probability of seeing a given response assuming the positive condition divided by the probability of seeing a given response assuming negative condition.
So if you calculate this likelihood ratio and use that for some monotonically increasing function of the likelihood ratio as a threshold, it's possible to do provably as well as possible in this kind of two-class discrimination. And just as an aside, this likelihood ratio is equal to the slope of the ROC curve.
OK, so we've talked about two-way decoding. And I mentioned briefly, yes?
AUDIENCE: I have a question. [INAUDIBLE] Poisson distribution the only distribution or is there [INAUDIBLE] one?
JED SINGER: I'm not sure, actually.
AUDIENCE: I'm not sure-- do you fit the data you get at one and you conclude it's a Poisson process, or it's the other way around?
JED SINGER: You conclude it's consistent with the Poisson process. So--
AUDIENCE: If it's not the only one, [INAUDIBLE] second one, it might not be [INAUDIBLE].
JED SINGER: I know there are other point processes whose coefficient of variation [INAUDIBLE] interval is one that are not Poisson.
AUDIENCE: [INAUDIBLE].
JED SINGER: So you can have other point processes with a Fano factor of 1 that are not Poisson processes, but they're also not [INAUDIBLE] processes. So let's talk a little bit more about optimality of decoding. So what is the best we can do? And what does that mean?
So we need some notion of best. And one way of establishing that is to come up with a loss function. So you say, all right, well, if obviously to infer the stimulus perfectly correctly all the time. But say I'm not going to be able to do that. What's the penalty for being wrong?
So one common loss function is the squared error, which is the square of the difference between the actual stimulus and the inferred stimulus squared. If you accept that as your loss function, and getting back to that basic framework that I mentioned earlier, the expected loss is minimized by using as your estimate the mean of the distribution. So this is the best you can do, assuming this kind of L2 error.
If your loss function is an absolute error, just the absolute value of the difference between the stimulus value, the predicted stimulus value and the [INAUDIBLE] value, then the expected loss is instead minimized by the [INAUDIBLE]. And for other loss functions, it's possible to calculate what is the optimum way to calculate or to infer what the stimulus was.
Another way of thinking about best decoding is to ask, OK, well, what was the stimulus that was most likely to have generated the observed response? If we can configure that out, the most likely stimulus, isn't that the best? And yeah, maybe it is.
So that takes us to maximum likelihood estimations, which answers exactly this question. So to calculate the maximum likelihood estimate of a stimulus, you solve for FML in this equation, where F sub a is the firing rate as a function of your stimulus for the a-th neuron in a set of neurons that you're considering. And R sub a is the overall firing rate of the a-th neuron. So set this equal to 0, and solve for F and L.
Now there's certain classes of function for which solving is quite easy-- for example, if F is a Gaussian, which is relatively common tuning curve, as we discussed. In other cases, you may have to solve this computationally rather than analytically. So one small drawback with the maximum likelihood estimate is that it can be very sensitive to conditions where small changes in the response yield large changes in estimates.
For example, if you happen to get a spike when you're way out into the tail of your firing rate estimate, that's going to give you an estimated stimulus that's rather different than what there actually was. Another problem is that it reflect stimulus priors. So it may be that on a particular trial, and say if you're recording from IT of a monkey who's walking around looking at stuff in this classroom, and you observe a particular firing rate pattern from the assembly of neurons from which you're recording, and the maximum likelihood estimate of the stimulus that elicited those patterns is a unicorn.
There are no unicorns here. There are no unicorns in your life. But maybe the firing rate pattern for a unicorn looks pretty similar to the firing rate pattern for a MacBook Pro. And then you might say, oh, well, MacBook Pros are very common in this room. Unicorns are very uncommon.
So we'll estimate that the monkey actually saw a MacBook Pro. So the maximum a posteriori estimate is an extension of maximum likelihood estimation that includes weights for stimulus priors. So we don't get nailed by that problem.
Now again, under experimental conditions, it's often advisable to keep your stimulus trials uniform, in which case this term is 0, and MAP is exactly equal to ML. That's using less constrained stimuli. It's not always the case.
So say you've come up with an estimator that you like. You want to know how well it's doing. Let's talk a little bit about how we can evaluate an estimator.
There's sort of three fundamental quantities, or I should say there are three sort of fundamental qualities in evaluating an estimator. One is bias. And bias is just a measure of, over repeated trials for a given stimulus, how different is the expected value of your estimator from the actual stimulus value? And we generally like for our estimators to be unbiased, which means if you collect enough data, you will, on average, get the right value for The stimulus.
Another quantity for variance, and that is the expected variation of the estimator from its mean, trial to trial. This is generally not possible to get 0, and we'll describe that explicitly in a moment. But this is just a measure of how much your estimator spreads from run to run, from trial to trial.
And you can actually combine these two terms to arrive at the expected estimation error, which is just the expected squared error between the estimator and the true value. That's given by the variance of the bias squared. And again, under the case of a zero bias estimator, that's just the variance.
So I said that I'd give you a bound on variance, a lower bound. And I will keep my promise. So the Cramer-Rao bound, which is a function of the derivative of the bias and what's called Fisher information, sets such a bound. it's impossible for the variance to get below this quantity.
The Fisher information tells us how much an observed random variable-- for example, firing rate-- can tell us about a linked parameter, for example the stimulus. And it can be calculated several different ways. One way is the expected value of the negative curvature of the conditional probability surface of your fighting rates with respect to your stimuli. So it's generally impossible to get zero variance, unless you can get perfect Fisher information, which in the real world, never happens.
So the maximum likelihood actually, when I introduced it, I said, well, this makes sense for saying what's the most likely stimulus? It also has some nice properties in the context of what we've just been talking about. One is that, as the number of neurons that you're considering in your maximum likelihood estimate increases, your estimate can actually get arbitrarily good, so the expected difference between the estimate and true value can be made arbitrarily small.
It's also an unbiased estimator. So this is 0. So we just have 1 over the Fisher information. And it actually attains the Cramer-Rao bound, so the variance is actually equal to 1 over the Fisher information. So it's very efficient-- 0 bias at minimal possible variance. So it's, in that sense, optimal.
All right, finally, I'm going to take the last few minutes to talk about how we can decode not from the vector or scalar firing rates, but from an actual spike train. And this is going to recall from the earlier stuff we talked about with reverse correlation, spike-triggered averages, and kernel convolutions.
So the basic cartoon of the procedure I'm going to sketch out first. So say you have some stimulus, some one-dimensional stimulus, in this case, varying with time. And the stimulus is eliciting a spike train. And you want to be able to predict what the stimulus at time t based on spikes leading up to time t.
In one sense, this was a little bit of a funny thing to do, because the spikes are actually being fired in response to the stimulus. And yet you're using spikes up to your stimulus in order to predict the stimulus. So this only has any hope whatever of working if there's some temporal correlation with the stimulus.
And yet, this is kind of what the brain does. It is predicting what's out there, what the stimulus is, based on spikes that have happened. Now the brain allows itself a little bit of latency. So we'll do ourselves that same favor.
And we'll actually, at time t, be predicting the stimulus at time t minus t0, where t0 is a prediction delay. So you can think of this as maybe the amount of time that it takes for information to get from the retina to IT to pre-frontal areas. And so the basic idea is, we're going to take all of the spikes up to time t. And everywhere that there is a spike, we're going to insert some kernel.
And then at time t, we're going to add up the value of that kernel. And that's going to be our estimate at time t minus t0. So the three example spikes here, spike number four, number seven, number eight.
Spike number eight happens after time t, so it's contribution is going to be 0. Spike number seven happens after t minus t0, so this is actually a spike that we think might have been caused by the stimulus at t minus t0. But it happens before time t, so it's still a [INAUDIBLE] decoder.
And just as an aside-- I should have said this earlier-- one of the reasons we might want to build a decoder like this is to perform real-time decoding based on a neural spike train or set of neural spike trains. And finally, spike our happens well before t minus t0. So it may tell us a little bit about what's going to happen. But to the extent that it does, that's only because of the temporal correlation to the stimulus.
Now here's the kernel that we're using. Here it is again, larger. And here's the spike-triggered average for this stored data set.
You may notice some similarity between these two. In fact, the kernel is exactly the spike-triggered average rehearsed in time-- so notice the tau axis, the time axis here, is running in the reverse of the usual direction-- shifted by tau 0, so negative 160 milliseconds moves over. And is set to 0 for tau less than 0. And this works, in some cases.
Given the time, I think I'm not going to run through this math. But the basic setup is we have some kernel that we're convolving with our spike train. And we're using that convolution to get an estimate of the stimulus at time t minus t0. We want to optimize this kernel.
Optimizing the kernel means minimizing the squared error between the estimated stimulus and the actual stimulus. And if the spike train is uncorrelated with itself, which is realistic when firing rates are low and time [INAUDIBLE] consideration small, then the optimal kernel is just the spike-triggered average, tau 0 minus t-- exactly as I described [INAUDIBLE]-- Spike-triggered average, temporal reversed, and shifted by tau 0.
Things get more complicated when the spike train is correlated with itself. In that case, you can use spectral techniques involving the Fourier transform of the rate stimulus correlation, and the spike train autocorrelation. And the derivations of all that are in this book. I believe they're on page 110. It's a pretty slick trick.
So using this sort of decoder, we can do very well. To choose one example from the visual system of one of Nature's most robust seers, the fly. These are data recorded from neuron H1 in a fly's visual cortex, which is sensitive to horizontal motion.
And here we have plotted the velocity of the visual stimulus being presented to the fly, while the [INAUDIBLE] being recorded from. And here we have the optimal kernel, as calculated using this technique. The dashed line is the reconstruction of the stimulus using external convolution technique. And you can see this does quite well, based on a spike train, at reconstructing what the stimulus was that elicited that spike train.
So we know that stimuli elicit spike trains in the brain. We've seen how we can move from spike trains to spike counts-- relatively straightforward. We've also seen how we can represent spike trains as firing rate time courses.
We can use spike counts or firing rate time courses to generate spiking models, which can give us data to test or to debug or model things. We've also seen how we can go from spike counts back to estimates of the stimuli. And most recently, we've seen that we can also go from spike trains back to stimuli, decoding directly from spike trains.
So we can move around this diagram with some fluency now. I wouldn't say that we therefore understand everything about how the brain works, but we've at least got some tools that are useful to approach it. Yeah.
AUDIENCE: Can you go back one slide? I'm so confused about what exactly we're trying to do here. So we can put a voltage meter in the neuron and measure the spikes.
JED SINGER: Right, so we've got spikes here.
AUDIENCE: And we know what we're putting in front of the fly. That's the dotted line on the bottom there.
JED SINGER: I believe actually the stimulus is the solid line. But I might have that backwards.
AUDIENCE: What you're doing is some kind of deconvolution and you're getting some [INAUDIBLE] responses there at the top. I don't know what to do with that.
JED SINGER: So this is the kernel that we've estimated. So that's this kernel, which, if the spike train has no autocorrelation, we can estimate at the first correlation, the spike-triggered average.
AUDIENCE: But even if you do it perfectly, what does it mean physically? What can I do with that?
JED SINGER: So what you can do with that, then, once you've generated this kernel, you can then continue recording from the same neuron. So you're no longer controlling the input to this cell. And you can, based on the spike train, you can infer what the visual stimulus that the cell is receiving is. So you can do real-time decoding of the stimulus based on the observed spike train.
Does that make sense? And there are other ways of doing real-time decoding from spike trains. People have used [? common ?] filters for that, particularly in the realm of neuroprosthetics, there's been a lot of work. But this is one useful technique.
Associated Research Thrust: