These are all quite old names with many syllables—an ideal place for ellipsis, elision, and contraction.
The -cester bit is from the Old English word ceaster, which in itself is borrowed from Latin castrum. It is thus cognate to castle. (The palatalisation of the initial /k/ to /tʃ/ is common in Old English and was more widespread in some dialects than in others, so at some point, there were probably two variants around: caster /kastər/ and ceaster /tʃastər/.)
In Old English, ceaster referred to a Roman town or settlement (in England). As tchrist notes in his comment, some other places that have this element in their names include Lancaster, Doncaster, and Manchester.
The first part of names that have -cester tend to be inherited place names of either Anglo-Saxon or (even earlier) British Celtic stock, often names of rivers or other local toponyms.
Going from Old English down through the centuries, the cities the first element of whose names ended in a consonant (like Lan-, Don-, Man-) usually kept a fuller version of the word ceaster, retaining the initial consonant. The ones whose first element ended in a vowel, on the other hand, ended up having the initial /tʃ/ (or /k/, as in Lancaster and Doncaster) in an intravocalic position, where it was quite likely to be weakened—something that happens often in place names, especially longer ones.
Thus, Leicester and Gloucester were once pronounced as they’re written, with a /tʃ/ (or perhaps /k/); but that consonant was weakened over time and eventually disappeared altogether, leaving the vowels free to contract into single monophthongs, too.
There might be exceptions to the statement in the title of your question, but I'm not going to quibble. The simplest reason for the lack or scarcity of word-initial ð in English (for words of all grammatical classes, whether nouns, adjectives or verbs, outside of a small closed set of "function words") is because there is no regular historical source for it. In Old English, the unvoiced sounds [s f θ] are believed to have been in complementary distribution with their voiced counterparts [z v ð]; the voiced consonants only occurred when they were both preceded and followed by other voiced sounds (such as between vowels, or after a voiced consonant and before a vowel) and the voiceless consonants occurred elsewhere (before and after voiceless consonants, at the end of words, and also at the start of words). The distribution in Old English may have been different depending on the dialect, and some dialects of Middle English are known to have had voiced initial [v] and [z] at least instead of [f] and [s]. However, while a handful of words with initial /v/ in Modern English come from this source (vat and vixen) I can't find any comparable words with /z/ or /ð/.
In the development up to Modern English, the Old English phonetic values of these fricatives were mostly preserved, but there was irregular voicing of some fricatives at the start or end of some commonly used words and suffixes. That's why we have /ð/ in this, that and them (as well as /v/ in of, and z in the plural suffix -(e)s). In fact, this change is still in progress to some extent; one word where it is incomplete is with (which is sometimes pronounced with voiceless /θ/, and sometime with voiced /ð/.)
In Modern English, the classes of nouns that start with /v/ and /z/ have been expanded due to loanwords from sources like French and Greek. But none of the main source languages for English loanwords have /ð/. (Modern Greek does, and Spanish has [ð] as a non-phonemic allophone of /d/ (though not utterance-initially), but I don't know of any loanwords to English where this is reflected).
There are a few other sources of words that start with /v/ and /z/: newly coined words that aren't from earlier roots, like brand names or exclamatory, imitative or "expressive" words. In the case of brand names and the like, the lack of any clear way to write word-initial /ð/ ("th" will generally be taken as /θ/) and the pre-existing rarity of other words that start with this sound probably prevent it from being used.
In the case of expressions and imitations, I think it reflects an overall marginal status of /ð/ as a phoneme in English. It's often noted that even in other contexts than the start of a word, /θ/ and /ð/ are almost never contrastive in English. The pair /s/ and /z/ are more often contrastive (in pairs like lice, lies or mace, maze). The sounds /θ/ and /ð/ are also rarer than /s/ and /z/ overall. This may be why word-initial /z/ is found fairly often in imitative words and exclamations like zap, zip, zowie, but word-intial /ð/ is not.
Here is a quote about it from the paper Dental fricatives and stops in Germanic: deriving diachronic processes from synchronic variation, by Bridget Smith 2007 (She has also written a paper covering acoustic analysis of dental fricatives in Modern English, available from her "selected publications" web page):
Historically, the dental fricative was one voiceless phoneme in Old
English, with a voiced allophone between voiced sounds (much as the
voicing can be generalized today). It could be represented
orthographically with thorn <þ> or edh <ð>, which could be used
interchangeably to represent either the voiced or voiceless variant.
At this time, the alveolar and labio-dental fricatives were also
subject to voicing assimilation, but were written with only the
voiceless graphemes /s/ and /f/ (Mitchell & Robinson 2001:15). After
the Norman Conquest brought unprecedented numbers of loanwords into
English, /s/ and /f/ became contrastive with /z/ and /v/,
respectively, due to loanwords that contained these sounds in
contrastive positions. Borrowing between dialects that had different
distributions, such as the initial voiced fricatives in dialects in
the Southwest of England, may also have contributed to the
phonologization, creating opposing forms such as fox and vixen. The
sounds in assimilatory voicing patterns, and in the new borrowed
lexemes became phonemic by around 1250. French did not have a
word-initial voiced dental fricative, however, so it is more difficult
to ascertain when the phonologization of /θ/ and /ð/ occurred.
If we take a broader look at languages, /ð/ is fairly rare as a distinct phoneme, and it is easily changed into other sounds, but there are definitely languages where it occurs at the start of more words than it does in English (for example, the aforementioned Modern Greek).
Best Answer
Wikipedia has what seems to me to be a very complete write-up on the pronunciation of words like "tutor," "news," "brew," that are not pronounced with a "y" sound for many speakers, even though the spelling and history would suggest that pronunciation. This phenomenon is named "yod-dropping," as "yod" is one name for the "y" sound. In general, it is possible to predict which words are affected, although the rules are fairly complex and there are some areas where there is variation even among speakers of the same regional variety. For this reason, I won't try to list them all here and risk giving a simplified, wrong picture–you can find them over at Wikipedia, or if you look at phonologists' work on describing "yod-dropping."
So to answer your first question: the pronunciation with "y" is the original one, and has the same origins as it does in words like "feud" or "fume" where both British and American English speakers standardly have a "yoo" sound. The pronunciation without "y" is newer, but I've had a hard time finding actual dates given that correspond to the timeline of this change. It appears that yod-dropping occurred earlier or later depending on the particular phonetic environment, and the change is still in-progress.
Regarding the second part of your question: the consonant "y" is pronounced towards the front of the mouth. For many American-English speakers, there is a tendency for the ordinary "oo" sound to be pronounced more towards the front of the mouth after coronal consonants, a class that includes /t, d, n, s, z, l/ among others (Source: The Atlas of North American English, Sound Changes in Progress, the fronting of /uw/ after coronals). It seems possible to me that this tendency first led to confusion between "yoo" and "oo" after these sounds (for example, in the pair of words "do" and "dew"), and then led to what phonologists call neutralization: a complete lack of contrast.