So, we've got the vocal folds acting like a fixed string with multiple resonant frequencies called harmonics, but before we go further, I realized there was some terminology that I should go over. Remember how the actual sound wave produced from vocal fold vibration is far more complicated than that of a single string? That's mainly due to the motion of the lamina propria, but even single strings can produce complex sound waves. This is a wave pattern that is created from the interaction of multiple frequencies known as harmonics. If you have several, or even thousands, of simple sine waves that are harmonically related, you've got a complex periodic sound wave. Periodic means it still has a predictable pattern, as opposed to complex aperiodic waves, like white noise. But the linguistic signal contains both periodic and aperiodic waves. Periodic would be vocal sounds, like vowels; aperiodic would be like unvoiced consonants like /s/, and a combination of periodic and aperiodic happens during a lot of voiced consonants, like /z/. We're just going to focus on complex periodic waves for the purposes of understanding resonance more fully. Complex periodic waves can be broken down into their simple sine wave components by using something called a Fourier transform. We certainly won't go through the math for all of that, but just know that all complex periodic waves are made up of simple sine waves, and even though the vocal folds create a more complicated sound than a string does, we're going to continue with the string comparison cause I think it makes everything so much easier to visualize.
The thing about vibrating strings fixed at both ends is that they have a fundamental resonance of 2 times the length of the string. This just means that one-half of a wave can "fit" on a string at any given pass along the string. It also means that the string will vibrate at both even and odd harmonics of the fundamental. This can be represented mathematically if we use our knowledge of frequency = velocity divided by wavelength. In this basic case, we're going to consider velocity to be the speed of the wave being produced, and in the human voice, that speed is determined by the tension and mass of the vocal folds. So when the folds elongate, tension increases and mass decreases resulting in a high frequency of vibration (cool, huh?). And the wavelength, thanks to the half-wave resonator we're working with here, will be two times the length of the string (or folds). Using this equation and setting our speed at 340 meters per second, the approximate speed of sound at sea level, we can figure out the harmonics of a 1 meter string. The fundamental frequency would be 170 Hz, the first harmonic would be 340 Hz, the second harmonic would be 510 Hz, the third 680 Hz, the fourth at 850 Hz etc. This sound wave (up to the fourth harmonic) would sound like this:
The thing about vibrating strings fixed at both ends is that they have a fundamental resonance of 2 times the length of the string. This just means that one-half of a wave can "fit" on a string at any given pass along the string. It also means that the string will vibrate at both even and odd harmonics of the fundamental. This can be represented mathematically if we use our knowledge of frequency = velocity divided by wavelength. In this basic case, we're going to consider velocity to be the speed of the wave being produced, and in the human voice, that speed is determined by the tension and mass of the vocal folds. So when the folds elongate, tension increases and mass decreases resulting in a high frequency of vibration (cool, huh?). And the wavelength, thanks to the half-wave resonator we're working with here, will be two times the length of the string (or folds). Using this equation and setting our speed at 340 meters per second, the approximate speed of sound at sea level, we can figure out the harmonics of a 1 meter string. The fundamental frequency would be 170 Hz, the first harmonic would be 340 Hz, the second harmonic would be 510 Hz, the third 680 Hz, the fourth at 850 Hz etc. This sound wave (up to the fourth harmonic) would sound like this:
Now that we've compared the vocal folds to strings, what do we have to compare the vocal tract to? An open-closed tube! ...which is not that exciting at all. But what the vocal tract does is pretty darn exciting. Of course, the vocal tract itself can change it's shape for communication and such, but the open-closed tube is a good simplification of what the basic function of the vocal tract is. An open-closed tube is a quarter-wave resonator, as opposed to the half-wave resonator that the string up there is. So what does that mean? A quarter-wave resonator means that only a quarter of the wave can "fit" during one pass through the tube. So this resonator only vibrates at odd harmonics of the fundamental frequency. So, if we look at that 170 Hz frequency produced from that meter-long string up there, The open-closed tube resonating at this fundamental 170 Hz would have a length of 0.5 meters and the first harmonic would be at 510 Hz, the second at 850 Hz, etc. Notice something there? This tube is only resonating at even frequencies of the string up there. So what happens to the sound wave produced by that string as it passes through this tube? Well, it'll sound something like this:
File made with Audacity
Where did those other harmonics go? The tube ate them. No really! Well...it kinda-sorta did. See, the tube acts as a filter for that sound wave. Those missing frequencies, the ones that the tube won't resonate, are going to be filtered out due to destructive interference, while the frequencies the tube vibrates at are going to constructively interfere and exit the tube for us to hear. Yup, that's right. Without resonance, we wouldn't hear our own speech, much less a singer singing over an orchestra. Your voice is always resonating all of the time; it's just that opera singing requires a difference resonance than your speaking voice...obviously. We don't really sound like we're talking when we're singing, do we?
Now, I don't know about you, but I personally find the second audio file a little more pleasing than the first. The first one is objectively "richer," in the sense that it has more harmonics, but the second one subjectively sounds "richer" to me. I'm not really sure why, but I suspect it has something to do with the fact that I am physiology wired to find the sound of the human voice important, as are you, and so perhaps I also find sounds from an open-closed tube more pleasing? (And if you didn't find this to be true, you're really messed up! Just kidding.) And where this "pleasing" association would occur in the brain, I'm not sure. But I know my brain is associating the second file with a richer sound that I happen to find more pleasing, because the first sound has more harmonics in it for sure...I would know; I inputted the frequencies myself! But if you played those two tones for me without my knowing about the harmonic structure, I would assume the second one has more harmonics. The brain sure is one crazy organ, amirite? Of course, I digress, but this is an example of some of the stuff people are trying to figure out in terms of how we listen, pick out, and associate the speech signal into meaning in our lives all day long. It's some cool stuff, for sure. Perhaps I'll learn an answer to that soon and will update you guys.
Now, in a stationary tube, the harmonics are pretty fixed, but lucky for us, our vocal tract can change shape, length and configuration to produce a lot of different sounds. By changing it's shape, the vocal tract filters the same sound source differently, producing all of the different sounds we make in our languages and then some. Conveniently for us, it seems to do this pretty much on auto-pilot most of the time, like when we're speaking, or how the vocal tract lengthens when our voice drops in pitch (the larger cavity will resonant at lower frequencies and shorter at higher). The shape the tract takes determines which frequencies are amplified and which ones are dampened out. And this sets us up quite nicely to talk about formants next time, doesn't it?
Resources:
Raphel, L. J., Borden, G. J., Harris, K. S. (2007). Speech science primer: Physiology, acoustics, perception of speech (5th ed.). Philadelphia, PA: Lippincott Williams & Williams.
No comments:
Post a Comment