So which pronunciation is standard for the [ʊ] sound? Rounded or unrounded?
Certainly there is some rounding, but because roundedness is not phonemic in this position, there is also considerable variation in how much of it actually occurs in any given word and speaker.
For example, you will find that it is generally somewhat more rounded in pull and full than it is in put and foot respectively. That’s because having an r or an l right next to it rounds it off a bit — which is why it is a bit more rounded in root and rook than it is foot or cook. Same with rookie versus cookie, where the first version is a bit more rounded than the second. And of course, a w helps: compare how wool is even more rounded than full, and also moreso that wood.
I believe English has no words with [ʊw], as that seems redundant. However, it can occur in phrases, especially in some dialects, where something like I knew it full-well may approach that.
However, it is still perceived as the very same phoneme in all those words and cases I’ve just listed above.
Correction — or not
I said that I thought English had no words with [ʊw] in them. And at the end of the day, I still believe that. However, I have discovered that grepping the OED yields the apparent existence-proof counterexample of Rauwiloid, which means:
A proprietary name for a hypotensive preparation containing a number of alkaloids extracted from Rauvolfia serpentina.
You also have compound words whose first element ends in [aʊ] (rather than [aw], as it is sometimes spelled) connecting to something that begins with [w], and which have in effect a “double w” in them, you expand the list to include such things as:
bow-wow, powwow, skeow-ways, wow-wow
Finally, if you consider the sound in words like no and micro to be
an [oʊ] diphthong rather than [ow], then you get all these, most of which were originally compounds of some sort:
froward, frowardly, frowardness, glow-worm, Holloway,
hollowwort, Howeitat, Khowar, meadow-wink, microwave, microweld,
Moldo-Wallachian, nowise, Oldowan, Parowax, powan,
shalloway, slow-worm, swallowwort, werowance,
yellow-wood, yeowoman.
For example, yeowoman theoretically yields /ˈjoʊwʊmən/, at least in North America. Still, there is a reasonably convincing argument to be made that that one is better written as simply /ˈjowʊmən/.
Slightly less uncommon is nowise, which is a compound of one word ending in a diphthong connected to another starting with a triphthong, so /ˈnoʊˌwaɪz/.
But I am still highly dubious of the existence of [ʊw], because I think it fuses into the semi-consonantal glide, [w]. After all, nowise and no eyes are homophonic, so I think this idea of [ʊw] is very hard to justify, and so I stand by my initial statement.
Even towel is usually pronounced with just one syllable, /taʊl/, thereby rhyming with cowl /kaʊl/. Even with folks who work very hard to put two syllables into that, with /ˈtaʊ.wəl/, I submit that you could write that /ˈtawːəl/ and avoid the whole controversy of whether a semi-vowel/semi-consonant/off-glide is really /ʊ/ or really /w/. However you write it, it seems like the same sound to me, such that bisyllabic towel just has a geminate [w]: /ˈtaw.wəl/.
Many phoneticians and phonologists use the term lenis and fortis to describe different types of consonant. Those phonemes which we typically think of as being voiced are lenis and those which we characteristically think of as voiceless are termed fortis.
There are two reasons for using these terms. The first is that lenis, or so-called "voiced", consonants in English have other properties which they share apart from the fact that we characteristically consider them voiced .
The second reason is that in actual speech, most voiced consonants undergo devoicing when they are next to voiceless sounds. Voiceless sounds here includes silence. This means that, for example the /b/ in bed, which we normally consider to be voiced, will be partially and sometimes fully devoiced at the beginning of a sentence when preceded by silence. Similarly, the /d/ in bed, again normally considered voiced will be partially or fully devoiced when occurring at the end of a sentence next to silence. This, as described, will also be true of the /b/ in hotbed where it occurs next to voiceless /t/.
The same is true to a lesser extent for fortis consonants. In certain environments consonants which we normally consider voiceless may become voiced if they occur between two vowels. So for example in the word better in expressions such as we'd better go, the /t/ will often be realised by a voiced alveolar tap. Similarly the /h/ in ahead is often fully voiced instead of being voiceless.
There is obviously a problem with talking about devoiced voiced sounds and so on. But there is a deeper problem, which is that we still need terms to talk about consonants that are usually voiced, even if they happen to occur in a position in which the actual phonetic sound will be entirely voiceless. This is because the most important properties of these sounds remain the same. It won't have escaped the quick-witted reader here that I have just said above that the /b/ and /d/ in bed may both be voiceless if the word is said in isolation. This is absolutely true. If you record a speaker speaking naturally and cut out just the /b/ or/d/ segments they will sound exactly like a /p/ and a /t/ most of the time.
I can already hear screams of gibberish, claptrap and the like from the audience. Quite understandable. It doesn't take one second to realise that any native speaker is able to freely distinguish between the words bed and pet when produced by another native speaker. However, the reason that this is true, is that fortis and lenis consonants keep their fortis and lenis characteristics whether they are voiced or not.
Most importantly, fortis and lenis consonants affect the other sounds around them. It's been proven that English listeners are practically deaf when it comes to distinguishing the voicing of word initial sounds by listening to the sounds themselves. The reason that we know that the sound at the beginning of bet is a /b/ and not a /p/ as in pet is that the vowel after /b/ is immediately voiced, whereas there is a delay after /p/ before the voicing for the vowel kicks in. During this voiceless period of the vowel there is an audible hissing as the air from the consonant is released from the vocal tract. This is what is known as aspiration. Fortis consonants in English are aspirated when in word initial position, and it is the aspiration after /p/ and not the sound of the consonant itself which helps us to distinguish /p/ from /b/.
Fortis and lenis consonants have a completely different effect when occurring at the end of a word. Fortis consonants in all languages have the effect of shortening the preceding vowel when they occur at the end of a syllable. This is known as prefortis clipping. If you say the words bead and beat, and listen carefully, you might be able to notice that the FLEECE vowel, /i:/, is considerably shorter in the word beat. This is because the fortis consonant, /t/, causes the vowel to be shortened. Even though the /d/ in bead will usually be devoiced, and therefore is phonetically voiceless, it remains lenis and causes no shortening of the vowel. It is the length of the preceding vowels or other voiced segments which tells a listener whether a syllable final consonant is fortis or lenis. It has nothing to do with whether the actual realisation of the consonant actually contains any voicing, any vibration from the vocal folds.
Below is a speech pressure waveform and spectrogram for the words beat and bead respectively. The wavy readout at the top is the waveform, the diagram underneath is the spectrogram. You will see that the vowel on the left, represented by the first big burst of activity in the waveform readout is considerably shorter than the one on the right. You can also see the vowel in the spectrogram underneath. The vowel is the the first vertical grey band. It's about half the width in the first word as in the second - a result of pre-fortis clipping.
_____ / bi:t / __________________ / bi:d / ________
The Original Poster's question
If the Original Poster has a very sharp ear, they may indeed have heard that the /z/ at the end of words is actually voiceless when said by native speakers. In other words, the phonetic quality of the sound itself will be very close to our canonical idea of /s/. This is because it will be devoiced here at the end of the word. However, native speakers' ears and brains will tell them that this is the /z/ phoneme, and not /s/. The reason is that the /d/ segment is clearly /d/ and not /t/, because the vowel in /wɜ:ds/ is fully long here. The following plural marker will be understood as /z/ simply because it follows the lenis consonant /d/.
However, we shouldn't think that this means that we can try to say /s/ at the ends of English words instead of /z/. This would be a terrible mistake. Using /s/ where we need /z/ at the end of a word will cause the vowel to be reduced. If you try to say the word peas with an /s/ instead of a /z/, you will clearly just be saying the word peace because the /s/ will cause the vowel to be shortened. If you want some peas it is probably not a good idea to say:
- Please give me some peace!
Hope this is helpful!
Best Answer
One thing you need to understand is that the things you may have been taught “are” diphthongs are not necessarily such under all possible circumstances. And while some of this is the normal reduction in unstressed syllables, some of it is not — because they were never diphthongs to begin with for those speakers.
In the case of vocabulary, since that’s in an unstressed syllable, it is subject to reduction to [ɵ] or [ə]. The [ɵ] sound might be the one that you are hearing which is halfway between [o] and [ə]: a neutral vowel like a schwa, but rounded like an o.
For so and no, those are usually diphthongs when at the end of an utterance or stressed, but they may be monophthongs in other places, whether [o], [ɵ], or [ə].
In the case of don’t, you have the nt-reduction happening too, so that’s often just [dõʔ] with a slightly nasalized [õ] replacing the n by coloring the earlier [o] through regressive assimilation, and a brief glottal stop where the t used to be.
In North America (and also Scotland, amongst other places), the off-glides in both /eɪ/ and /oʊ/ are known to be realized as monophthongs in many speakers and environments: just [e] and [o] alone suffice. That’s what makes it a “long e” or “long o” to them, not whether it has a glide.
This is not a matter of “not pronouncing half the sound”, because for native speakers, these are not diphthongs in their minds. If and when they become such, it is nothing but a phonetic allophone, and so you won’t be able to find a minimal pair between the glide and non-glide version.
Indeed, this is what schoolchildren are taught; they are not taught about a diphthong, although some teachers may admit to it if prodded about words where the vowel comes at the end without a final consonant, as in bay or snow where there isn’t a consonant rather than like in bare and spoke whether there is. You will quite often find the latter pair without a glide in them at all in North America. This is not lazy pronunciation; it is normal.
These phonetic effects where the “long” vowel may or may not become a falling diphthong with an off-glide may be too subtle to be useful for non-native speakers trying to learn English. Certainly our schoolchildren are not taught them.
The “long o” of phonemic /o/ contrasts with the “short o” of phonemic /ɔ/, not with the diphthong [oʊ], which is only an allophone. Similarly, the “long e” of phonemic /e/ contrasts with the “short e” of phonemic /ɛ/, not with they diphthong [eɪ], which again is still only an allophone.
It may be more useful for such learners to instead concentrate on phonemics, with the understanding that diphthongization is a phonologic effect that occurs universally in particular environments, and that this particular diphthongization is almost never phonemic in English.