The Introduction described two requirements that, we argue, are important for FSP strategies to convey reliably the pitch of complex sounds. Our results show that, for the simplified stimuli used here, neither requirement is met: listeners do not typically report a pitch corresponding to the F0, and pitch estimates can be substantially affected by between-channel timing differences. The following discussion relates these findings to the existing literature, considers possible reasons for between-listener differences, describes possible limitations of our simplified approach, and proposes a more robust method for using timing cues to convey pitch.
Comparison to Previous FindingsFigure 4 shows that pitch ranks for our same-rate stimuli increased markedly and consistently with increases in pulse rate from 100 to 200 pps per channel, but less so with further increases to 300 and (especially) 400 pps. This general pattern of results is broadly consistent with a large literature on the upper limit of pitch for SPP pulse trains presented to a single electrode [11,12,13] and to multiple electrodes [38,39,40,41]. Recently, we measured pitch ranking over a wide range of rates (80–981 pps) for a stimulus in which the four apical electrodes of the MED-EL device were stimulated simultaneously, rather than with the 100-µs offset between adjacent channels applied in the present study [25]. The upper limit in the article by de Groote et al. [25] was estimated by fitting a broken-stick function to the pitch-rank data. Six participants (M004, M014, M030, M031, M032, and M035) also took part in the present study. They showed upper limits in the previous study of between 335 and 686 pps, which are generally higher than the rates above which pitch ranks appear to asymptote here. This might suggest that rate discrimination is better with simultaneous than with interleaved stimulation. However, it is not straightforward to compare pitch ranks obtained with different ranges of stimuli and in a different context, and so we would refrain from drawing this conclusion in the absence of a direct comparison. In addition, an inspection of the data in de Groote et al. [25] reveals some flattening of the pitch-rank function around 335 pps in all six listeners who also took part in the present study, even when the upper limit derived from the broken-stick fit was substantially higher than 335 pps.
The data presented in Fig. 4 show that between-channel timing differences can have substantial effects on pitch ranks. The size of this effect varied across listeners, but it was highly significant at the group level for the multi-rate comparison ([1234]SD vs [1234]LD, d = 2.0). Although the comparison for the 100-pps same-rate stimuli ([1111]SD vs [1111]LD) did not reach significance at the Bonferonni-corrected level, the pitch rank for the [1111]LD stimulus for participants M014, M031, and M035 was close to that for the [2222]SD stimulus—i.e. manipulating the between channel delay had an effect similar to doubling the pulse rate. The effect of across-channel timing on pitch has been previously studied using pairs of pulse trains presented to the same or different electrodes and by implementing a paradigm based on a method introduced by McKay and McDermott [33] in a study with five participants implanted with the Cochlear CI. Macherey and Carlyon [42] asked six users of the Advanced Bionics and Cochlear CI to pitch-rank 2-channel stimuli, each of which consisted of a train of pulses that had the same rate in each channel and with a between-channel delay that was either very short (< 400 µs) or equal to half the period. These two delays are broadly comparable to those in our [1111]SD and [1111]LD conditions. The pulse rate per channel ranged from 92 to 516 pps in half-octave steps, making it possible to estimate the rate of an SD stimulus having the same pitch as each LD stimulus. They found that, for most listeners, the pitch of an LD stimulus of a given rate was very close to that of an SD stimulus of the same rate, even when the two pulse trains were presented to adjacent electrodes—that is, the across-channel delay had only a small effect on pitch. A similar finding was obtained by Griessner et al. [43] in an experiment that presented pairs of pulse trains to two adjacent apical electrodes in ten participants implanted with the MED-EL CI. Hence, with the caveat that, as with almost all CI experiments, effects can differ between groups of listeners, our results—at least in the multi-rate condition—show a larger effect of across-channel timing than has been reported previously with same-rate stimuli presented to two electrodes. One possible reason for this is that neurons may be activated by more than two electrodes with our stimuli and that this increases the temporal complexity of the neural responses compared to the case where only two electrodes are stimulated. Alternatively, greater across-channel interactions between pulses, due either to charge summation or refractory effects, might change the shape of the neural excitation pattern.
Pitch-Ranking Variability Between and Within ListenersAs noted above, the size of some of the effects on temporal pitch ranking, observed at the group level, differed across listeners. In particular, participants MED05 and MED06 showed no effect of across-channel timing on the pitch ranks of the [1111] SD vs LD stimuli and also showed smaller effects than other participants for the [1234] SD vs LD comparison. Participant MED05 was unusual in that her pitch rankings depended, for the multi-rate stimuli, on the assignment of rates to electrodes (conditions 5–8; orange symbols); her rank for [4321]SD was similar to that for [1111]SD and lower than for [1234]SD, [3214]SD, and [4123]SD. This pattern of results is consistent with the pitch ranks for MED05 being strongly influenced by the pulse rate applied to electrode 4 when that rate was 100 pps. As noted above, when e4 was stimulated at 400 pps as in [1234]SD, the pitch rank was lower than for [4444SD], suggesting that under those circumstances, the lower-rate stimulation on e1-3 influenced her pitch judgements. In contrast to the results for MED05, the pitch ranks for participant MED06 were similar for conditions 5–8, and so there was no evidence that his judgements were dominated by the rate applied to any one electrode. His pitch judgements may have been based on a combination of the separate rates applied to more than one electrode, unlike those of the majority of participants whose judgements were affected by between-channel delays. It is worth noting that both MED05 and MED06 showed excellent place-pitch ranking of the individual electrodes (Fig. 2), consistent with the idea that the effect of between-channel timing on pitch judgements depends on the extent of between-channel interactions and hence on spatial selectivity, which in turn affects pitch ranking. We counsel some caution, however, given the modest number of listeners tested and the fact that listener M030, who also showed excellent place-pitch ranking, showed between-channel timing effects similar to that in the group-averaged data.
The temporal pitch ranks shown in Fig. 3 differed not only between conditions but also between the individual runs for a given listener and condition. Inspection of the error bars in the figure indicates that these were generally larger for the multi-rate conditions (triangles) than for the same-rate conditions (circles). For example, the between-run variance, averaged across listeners, corresponded to standard deviations of 1.4, 0.8, 0.8, and 0.8 for the four same-rate SD conditions (1–4) on the left of each plot but 1.7, 2.0, 1.8, and 1.4 for the next four mixed-rate conditions, respectively. This suggests that the mixed-rate stimuli, whereby the temporal response is likely to differ between electrodes, had a less well-defined pitch than that conveyed by presenting the same temporal stimulus to each channel.
LimitationsThe stimuli employed here were not intended to reproduce exactly the pattern of stimulation that would be provided by a MED-EL FSP processing strategy in everyday life and differed from that pattern in several ways. Rather, they provide a simplified stimulus set that allows us to evaluate the processes by which CI listeners can—and, importantly, cannot—combine information from multiple electrodes to perceive the pitch of a complex sound.
One simplification is that we restricted stimulation to have either the same or harmonically related rates on each of the four electrodes stimulated. This was done in order to give the CI participants the best chance to hear a complex pitch. We have argued that acoustic pulse trains filtered to contain only unresolved harmonics provide a useful NH analogue of the perception of pulse trains by CI listeners, and experiments using those stimuli show that mixtures of inharmonically related pulse rates produce an unpleasant “crackle” percept without a clear pitch [44]. We therefore expect that, had we included inharmonically related rates, pitch perception would have been even worse. It is also possible that the outputs of some channels of an FSP strategy would be amplitude modulated at F0 due to beating between adjacent harmonics that fall within the passband of an analysis filter, and that this would have improved the perception of F0. However, this is arguably an envelope cue that would also be present in the output of a traditional CIS strategy, rather than reflecting a feature of the stimulus fine structure. We additionally note that the temporal fine structure at the output of a channel that passes (say) two harmonics with equal amplitude can be more complex than that occurring when it is dominated by a single harmonic. Another simplification was to present a single pulse per period instead of the short burst of pulses produced by MED-EL’s FSP strategies. We do not think that this would have degraded the pitch perception of the multi-rate stimuli studied here and cannot think of a reason why it would have done so. Finally, we note that our stimuli combined equally loud pulse trains from each electrode, roughly corresponding to a situation in which the harmonics of a complex sound had equal amplitudes and corresponded to the centre frequency of each channel. In a real situation, the amplitude of the pulses on each channel will vary with the spectral shape of the input, which for a speech sound such as a vowel will depend on the formant frequencies and hence on vowel identity. It is of course very important that perceived pitch does not, for a fixed F0, depend strongly on vowel identity, and it is a crucial feature of models of NH pitch perception that pitch is independent of spectral shape [e.g. “the case of the missing fundamental”; 45]. Our data do not tell us anything about the dependence of pitch conveyed by TFS strategies on spectral shape, but we do not think that variations in e.g. vowel identity would somehow allow listeners to combine the TFS applied to each electrode so as to provide a more robust estimate of F0, as in our first requirement.
A second potential limitation arises from the MPC pitch-ranking method used. Although this method is well-established in CI research, it does, as with all pitch-ranking and scaling methods, implicitly assume that listeners are responding along a single perceptual dimension (i.e. pitch). The procedure involves a series of forced-choice comparisons without feedback, and so responses were presumably based on a percept that listeners spontaneously interpret as pitch [cf. 46], but we did not include alternative tasks, such as melody identification, that may be more relevant for the perception of music. However, these considerations are arguably more pertinent to interpreting cases where participants reliably and consistently assign different ranks to two or more stimuli, and to the issue of whether doing so genuinely involves a difference in pitch, than to cases where a group of stimuli is generally assigned the same pitch rank. One of our two main questions concerns the pitch ranks of mixed-rate SD stimuli, which generally fall into the latter category, and for which there is no evidence of a percept equal to that produced when the F0 rate is applied to all electrodes ([1111]SD). Our second main question concerns the effect of increasing between-channel delays, which reliably increase pitch ranks, as has been observed in previous studies where the delays are applied to pulses interleaved on the same channel, and as would be expected in conditions where neurons respond to the composite pitch rate from two or more electrodes [33, 42, 43]. However, as with all pitch-comparison procedures, we cannot completely rule out the possibility that these delays affected some percept other than pitch but that nevertheless affected the pitch judgements that participants were instructed to make.
Finally, we note that, especially when the pitch of a stimulus is weak, pitch comparisons can be affected by the range of stimuli included in the comparison set. For example, Carlyon et al. [34] required listeners with single-sided deafness to make place-pitch comparisons between an electric pulse train presented to a CI electrode in one ear and to acoustic pulse trains bandpass-filtered into a range of different frequency regions in the contralateral NH ear. They reported that, for some combinations of listener and electrode, the acoustic pulse train judged equal in pitch to the CI pulse train fell in the middle of the range of acoustic stimuli included in a block of trials, such that changing the range of acoustic stimuli could substantially shift the “pitch match”. They suggested that in these conditions, the comparisons between the NH and CI pulse trains depended only on the pitch of the NH pulse train; when that pitch was higher than the middle of the range of acoustic pitches, then the CI pulse train was judged to have a pitch lower than the NH pulse train; the opposite bias would occur when the NH pitch was lower than the middle of the range of acoustic pitches heard. It is in principle possible that a similar phenomenon could occur here if only the short-delay same-rate stimuli had clear pitches; judgements involving one multi-rate stimulus and one same-rate stimulus might then have depended only on the pitch of the same-rate stimulus. To explain our results, the middle of the range of pitches heard in the same-rate stimuli would have to fall between that of [1111]SD and [2222]SD, and if correct, it would mean that the multi-rate stimuli had unclear pitches that did not necessarily lie between that of [1111]SD and [2222]SD. However, when we repeated the pitch-ranking experiment with listeners MED05 and MED06 and including an additional 2 same-rate conditions with rates of 50 and 71 pps on every channel, we found that the pitch rank for stimulus [1234]SD still generally fell between the ranks for [1111]SD and [2222]SD (Fig. 5), and with no evidence that the mixed-rate stimuli were matched to a lower pitch than in the main experiment. In addition, we replicated the finding of a lower pitch rank for the [4321]SD stimulus compared to the other mixed-rate conditions (orange symbols) for participant MED05, consistent with her pitch judgements being dominated by the pulse rate applied to electrode 4. Pitch ranks for the mixed-rate stimuli for participant MED05 were also broadly similar to those in the main experiment, albeit with large error bars for stimulus [4123]SD.
Fig. 5Mean rate-pitch ranks and standard deviations (SD) obtained from five runs of the midpoint comparison procedure for MED05 and from eight runs for MED06. The same-rate stimuli are presented by circles, of which the short-delay (SD) conditions are joined by lines. The mixed-rate conditions are presented by triangles. Filled markers indicate the SD to long delay (LD) comparisons. The reversed-delay (RD) condition is marked by a diamond
Practical Implications and SuggestionsIn the Introduction, we proposed two requirements for the effective transmission of pitch by FSP strategies. The results presented here, using simplified versions of the outputs of 4 FSP channels, showed that, broadly speaking, neither consideration was met. First, there was no evidence that listeners extracted the fundamental frequency from multiple channels; stimuli in which multiples of a 100-pps F0 were applied to the different electrodes produced variable pitch matches that did not correspond to that produced by an unambiguous stimulus ([1111]SD) in which all channels nearly-synchronously conveyed a 100-pps rate. Second, for most listeners, pitch ranks were affected by the temporal offsets between channels. In a real-life strategy, this suggests that pitch will be affected by factors such as reverberation and the group delays introduced by the analysis filters that may influence between-channel timing relationships. (The possible effect of the analysis filters could arise because, for a bandpass filter, the group delay varies as a function of the relationship between the input frequency and the filter’s centre frequency, and because this relationship will vary between the harmonics of a complex sound.) As a result, we believe that although an FSP strategy could do a good job of conveying the pitch of a sinusoid, the pitch of a complex harmonic sound is likely to be weak, vulnerable to features of the environment and (possibly) analysis filters, and does not correspond to the F0.
A caveat to the above conclusion is that, despite the absence of a strong, robust pitch, listeners might be able to identify the direction of pitch changes between successive notes (or vowels) or of dynamic F0 changes within a single sound. In terms of our nomenclature, this would correspond to a change from stimulus [1234]SD to e.g. [1.2, 2.4, 3.6, 4.8]SD causing a (weak) pitch to increase. Experiments with NH listeners have shown that listeners can indeed make fairly accurate sequential comparisons between two weak pitches. For example, although (as noted earlier) mixtures of inharmonically related pulse rates produce an unpleasant “crackle” percept without a clear pitch [44], listeners could identify the direction of changes in one of the rates in a mixture with an accuracy that was only moderately lower than when a single pulse train was presented. In addition, McPherson and McDermott [47] found that identification of the direction of a small F0 change between two temporally adjacent complex tones was equally good when those complex tones were inharmonic vs. harmonic; the discrimination of melodic contours also did not depend on harmonicity. However, a clear pitch was important for more-musical tasks (e.g. interval identification) and for the discrimination of F0 differences between notes separated by longer intervals [48].
We therefore believe that it is worth exploring alternative methods that, as far as possible, allow CI listeners to extract a robust pitch from harmonic complex sounds. One straightforward approach is inspired by the observation that the pitch ranks for the same-rate conditions were more reliable (smaller confidence intervals) than for the multi-rate conditions and increased monotonically with pulse rate up to about 300 pps [cf. 39, 40, 41, 49]. Combined with the effects of cross-channel timing differences, this suggests that presenting the same TFS to one or more electrodes might provide a more robust and stronger pitch percept than when the temporal pattern of stimulation differs across electrodes. This pattern could be derived for example from the output of a real-time F0 estimation algorithm and with the amplitude for each electrode determined by the envelope amplitude for that channel. Possible drawbacks could arise in noisy backgrounds and/or when more than one F0 was present, and because presenting pulses at the F0 rate may under-sample the envelope in the more-basal channels stimulated (which have wider analysis channels and hence contain faster modulations). However, even a robust perception of the pitch of single sounds in isolation may be considered an advance, and there may be a sweet spot in the trade-off between conveying F0 on more channels so as to maximise pitch salience and restricting that code to more apical low-frequency channels so as to minimize under-sampling of the envelope.
Comments (0)