I think part of the premise of this question is incorrect: build did not gain a "u" between Middle English and Modern English. Modern English spelling is standardized, so we can talk about "the" spelling of most specific words. But in Middle English, there are usually multiple attested spellings for the same word; this is partly due to overall lack of standardization (some writers might even spell the same word several different ways on a single page), and partly due to the existence of conflicting standards (Middle English is a cover term that encompasses many different regional varieties over a pretty wide range of time, traditionally from the Norman Conquest of 1066 up until the Great Vowel Shift around the 1500s).
Dictionaries, such as Wiktionary, need to choose one written form to use as the headword (in this case, "bilden"), but this doesn't mean that no other forms existed during this time period. According to the Oxford English Dictionary (OED), forms of this word spelled with "ui" or "uy" actually date back to Middle English, but their origin is unclear:
Etymology: Middle English bulden(ü), bylden, bilden < Old English
*byldan to build (recorded only in past participle gebyld), < bold a dwelling. Hence the two fundamental senses are ‘to construct a
dwelling’ and ‘to take up one's abode, dwell’. The normal modern
spelling of the word would be bild (as it is actually pronounced); the
origin of the spelling bui- (buy- in Caxton), and its retention to
modern times, are difficult of explanation.
My guess is that it has something to do with phonetic rounding, as the letter "u" is generally associated with rounded vowels. This could be due to the preceding rounded consonant /b/, the historical rounding of the root vowel (Old English y was the rounded counterpart of i) or some combination of both. I'm not at all sure of this, because as the OED says, the regular development of Old English y is complete loss of rounding and merger with i, as in words like kiss v. (from Old English cyssan v.) or lice n. (from Old English lȳs n.).
However, modern standard English vocabulary does also show some influence, in both pronunciation and spelling, from dialects where y developed differently. (I'll describe this in more detail later by quoting the etymologies of specific words.) According to the link that Ricky found (Notes on Etymology by Walter W. Skeat 1901), the digraph "ui/uy" was used in Southern Middle English to represent the sound that developed from older /yː/. Skeat says this is the reason for the spelling of build (which apparently had a long vowel at this point due to homorganic lengthening before the consonant cluster -ld). To support this, he lists the words bruise, from Old English brȳsan, and buy, from a form of Old English bycgan.
I was not able to find any other examples of "ui/uy" being used this way in modern spelling. However, the OED does list muys, muyse as Middle-English spellings of the word mice (from Old English mȳs). For comparison, ice (from Old English īs, without historical rounding) does not have any spellings listed with "ui" or "uy."
Buy seems to be particularly relevant because, like build, it is pronounced with an unrounded vowel in modern English. The OED gives the following information about the etymology:
Old English bycg(e)an, bohte, geboht, corresponding to Old Saxon buggjan, *bohta, giboht, Gothic bugjan, bauhta, bauhts; of unknown origin, not found outside Germanic, and not to be connected, so far as can be seen, with the stem bug- bow n.1 The inflection was imperative byge, bycgað; indicative present bycge, bygest, bygeþ, plural bycgað; subjunctive present bycge, bycgen; whence Middle English s.w. buye, buggeþ; bugge, buyest, buyeþ, buggeþ; bugge, -en; levelled before 1500 to buy- all through, whence the modern spelling. The forms in begge, bey- were Kentish; bigge, bie, by, midland and northern; in the latter the levelling to bie, by, took place as early as 1300.
The forms with the long consonant “cg” seem to have been eliminated through leveling. Another verb that seems to have developed in a similar way is licgan "to lie (down)”: both of these words have the diphthong /aɪ/ in modern pronunciation, which generally develops from Middle English /iː/. But the different spellings suggest that buy may have been pronounced with another vowel at some point, or in some dialects. It's not clear to me from this entry if southwest buy- is the ancestor of the modern pronunciation as well as the spelling (I'm not sure how that development would work in terms of sound changes) or if the modern pronunciation simply comes from midland and northern bie, by.
Google-research suggests that aside from build, buy, and related words, there are no other words where "bui/buy" is pronounced like "bi." (This spelling pattern is discussed in A Survey of English Spelling, by Edward Carney, and Dictionary of the British English Spelling System, by Greg Brooks; both of these sources use the synchronic analysis that "bu" acts as a symbol for /b/.) There are some other words that are superficially similar (buoy, guild, guy, biscuit, conduit, Kuiper) but in general they have different explanations for their spelling, so they are not very useful for explaining build.
However, it might be useful to compare it to busy, for which the OED says:
The original stem vowel ĭ is shown by Old English bisig; the form
bysig (when not simply an inverted spelling with y for i in areas
where Old English y had been unrounded) probably shows late West Saxon
rounding of the stem vowel as a result of the influence of the
preceding labial consonant (compare A. Campbell Old Eng. Gram. (1959)
§318); Middle English (and modern standard English) busy continues
this form (although in the case of the modern standard form with the
pronunciation of the unrounded variant). Pronunciation with /ɪ/ is
regularly indicated for forms spelt with -u- by orthoepists from the
mid 16th cent. onwards (see E. J. Dobson Eng. Pronunc. 1500–1700 (ed.
2, 1968) II. §82).
Forms such as Middle English and early modern English besy (see γ. >forms) reflect Open Syllable Lengthening of short ĭ to long close ē.
Another relevant word is bury, from Old English byrgan. According to the Online Etymology Dictionary:
Under normal circumstances [Old English -y-] transformed into Modern
English -i- (as in bridge, kiss, listen, sister), but in bury and a
few other words (as in merry, knell) it retained a Kentish change to
"e" that took place in the late Old English period. In the West
Midlands, meanwhile, the Old English -y- sound persisted, slightly
modified over time, giving the standard modern pronunciation of blush,
much, church.
It looks like the modern pronunciation of bury comes from dialects like Kentish, while the spelling comes from dialects like those in the West Midlands.
Build, buy, busy, and bury all have a "b" before the vowel: this is the "labial consonant" mentioned by the OED. The idea that it might have influenced the development of the following vowel in words like these is plausible, since there is strong evidence that it caused the following vowel to retain rounding in words like bull. However, it seems that in some dialects, there were words that retained rounding from Old English y even when it was not directly preceded by a labial consonant, such as bruise, blush, and church.
Best Answer
If the word "kitten" were spelled "citten", that would suggest the pronunciation /ˈsɪtən/. Although English spelling and pronunciation are not perfectly correlated, the association between the pronunciation /kɪ/ and the spelling "ki" (instead of "ci") is actually quite strong; this seems a plausible explanation for why the word pronounced /ˈkɪtən/ is spelled with "ki". The same spelling is used in Modern English in words like king, kiss, kin, which are all pronounced with /kɪ/ and come from words that had spellings with "c" in Old English—the Oxford English Dictionary (OED) entries for these words list "cinyng", "cyssan", "cyn" as spellings that can be found in Old English texts.
Etymology
I hadn't thought before now about the etymological relationship between cat and kitten. The etymology of kitten seems somewhat uncertain to me, but I'm fairly sure that it's irrelevant in any case.
It seems likely that kitten comes from (a dialect of) French
According to an OED entry that was first published in 1901, the word kitten is thought to come from "Anglo-Norman *kitoun, *ketun = Old French chitoun, cheton, obsolete variant of French chaton kitten." The following two quotations from the OED entry show that the spellings kitoun and ketoun used to be used in English:
The asterisk before the Anglo-Norman forms *kitoun and *ketun indicates that these are reconstructions: we don't actually have any direct evidence for the existence of these form in Anglo-Norman. But based on other words, we know that French "ch" often corresponded to Anglo-Norman /k/.
The OED mentions
German? Germanic? Probably not
I don't know of any evidence that suggests that the word kitten comes from or was influenced by German. Relatively few English words come from German.
A large number of English words are Germanic in etymology, but what this means is that they come from the common ancestor of languages like English, German, Icelandic and Dutch—this common ancestor is older than the spelling systems of any of the descendant languages spoken today. The modern German form "Katze" did not exist in that common ancestor. However, that common ancestor may have had a word pronounced something like *kattuz (this is a reconstructed form, so the use of "k" here is a purely arbitrary convention of modern linguists, not anything that could affect the spelling of the cognate words in descendant languages). This word is obviously related to Latin cattus (the source of French chat), but according to Wiktionary, the nature of the relationship is unclear.
It's true that the modern English form kitten looks like it could have come in some way from this Germanic base. There are other English words ending in -en that look somewhat like diminutives, such as chicken and maiden. In fact, there is an OED entry for a Germanic ending -en that is supposed to be a diminutive suffix (-en, suffix1):
It's obviously tempting to interpret kitten as containing this Germanic suffix—so tempting, in fact, that whoever wrote this OED entry did in fact include the word kitten as an example, even though this seems to contradict the etymology given in the OED entry for the word kitten!
But I would assume that the OED entry for kitten is correct, and the OED entry for -en, suffix1 is wrong, because I don't think there is an easy way to explain the early spellings with -oun if we assume that the word originally comes from Germanic. If you look at the historical spellings recorded in the OED for the words chicken and maiden, there is a lot of variation over time in the spelling of the vowel before the "n", but none of the variant spellings uses the digraph "ou".
It seems possible though that the words in English that end in a Germanic suffix -en contributed to the development of the modern spelling of the end of the word kitten (the use of the spelling -en could instead be explained as just a respelling of a reduced vowel in an unstressed syllable, but there are words like like cotton, button, and iron that have a reduced vowel in the last syllable and are spelled with -on, so it's clear that vowel reduction hasn't caused all words that end in /ən/ to come to be spelled with "en" in present-day English).