English is over a thousand years old, and has been through so many changes in the meantime that even very competent speakers struggle with English as it was written a few hundred years ago, and that of a few hundred more is so different as to essentially be a different language entirely.
This has left us with a great many inconsistencies, and the fact that English borrows from different languages, at different times, with different degrees of Anglicisation, leaves us with many more (though not in this case).
Some of the reasons for particular cases are hard or impossible to track, and some are open to reasonable conjecture, while others we can make more reliable statements about.
The word put was in Middle English found as putten, puten, poten and a separate word pytan. It's believed it came from a late Old English word putung.
The word but comes from Middle English buten, boute, bouten, from Old English butan.
In Middle English, these words, along with many others with a u in them would have had a /u/ sound (like a French ou as in vous). So, they would have rhymed as their spelling suggests, though neither would sound quite like they do in most modern accents.
As you have probably noticed, accents differ greatly from each other in how they pronounce many vowels. Accents differ not just with space (which we can easily realise by listening to how different people pronounce the same word differently), but also through time, which leaves some record today (listen to recordings of people from several decades ago, especially working class people, and you may find their accent doesn't match how people of the same area talk), and also helps explain how the regional differences arose.
In the early Modern English period, the /u/ sound changed, to a /ʊ/ sound (like the oo in foot).
Then it changed to first a /ɤ/ sound, which then changed further to a /ʌ/ (the sound but has today) sound in some, but not all of the words. Generally whether it changed or not depended on the surrounding consonants, but this was inconsistent so even one-time homophones put and putt now have a different vowel.
So while but, cut, put, putt, fun, full, sugar once all had the same vowel, a change in vowel happening for some, but not all, of them split them apart. This also affected some oo words that had previously shifted sound to the same /ʊ/ sound (hence blood rhyming with dud rather than with good).
This happened in different areas at different times, and there are still accents where but and cut rhyme with put. This is also one of the reasons we've clues to what happened, since people in the mid 17th Century were noticing how the words rhymed in some accents, but not in others. (The split here is called "the foot and strut split" because those accents rhyme the words foot and strut, while others do not).
Now, while you'll often hear that spelling was inconsistent in English until relatively recently, this is only true up to a point; certainly it was a lot less firmly set than today, but there were certainly conventions followed (even if they differed by region) so it wasn't a phonetic free-for-all either.
Between this, and the lack of any clear way to differentiate the too sounds (all the more so earlier in the change), we still have the same u letter used to spell them, even though they now have different sounds.
Here's a spreadsheet with English words, IPA and syllable data
https://docs.google.com/spreadsheets/d/1EfFhhC7kcTzB8c2UhAC53txRiTLKl3R9C2AM7ee0AVM/edit#gid=104606017
So after coming across this question and wanting to know the answer myself, I managed to pull together a few different sources of information into one Excel spreadsheet, and get data for just over 31,000 words with both their IPA pronunciations, and the number of syllables. I also found some frequency data, which can be used to naively split the words into deciles, according to how often the words are roughly used - meaning that you can sort the words by both syllable length and frequency of use (which is a very rough measure of complexity.)
Caveat: the pronunciation data I've pulled is from UK English. I pulled it from a GitHub repo containing IPA information for many languages, which also contains a file containing US English words and their pronunciations, linked here.
I haven't integrated it myself because I only need the UK data*, but you can pull the data into Excel fairly easily - the fields are split by whitespace, so Text-to-columns should separate the IPA from words. If you're comfortable with Excel then it should be fairly simple to combine this with the other data to get a list of all US English pronunciations and their syllable counts.
The rest of the sources for the data are linked in the spreadsheet itself.
* Also, I did try to add in the US Word data to the Google Sheets, but Sheets complained that this would exceed the cell limit. I put together this project based on UK data before I realised that you're probably from the US, and built all the formulas around it, so it would take a bit of unpicking for me to switch it over fully. I might come back to it another day. Hope this is still of some use to you.
Best Answer
For Cambridge Dictionaries Online, at least, part of the answer may be to do with syllabification. First note that the transcriptions are phonological, as indicated by the slashes //, not phonetic, which would be indicated by square brackets []. That means that the phonetic realization might be identical even if the phonological representation is different (for any given speaker).
The generalization seems to be that the sound is represented as a /j/ if it is in the onset of a syllable, but as /i/ elsewhere. For instance:
vs.
and
In the first set of words, the sound is not in the onset of the syllable, but in its nucleus. In English syllabification, the nucleus must be vocalic. In the second set, the sound is in the onset. Since in English syllable onsets must be consonantal, it has to be represented as /j/. In the third set, the /i/ is in a syllable on its own, and hence is the nucleus of the syllable.
Words with only one consonant before the /i/ or /j/ can be divided into either two syllables or three (as /ɪˈtæl.jən/ vs. /ɪˈtæl.i.ən/ shows). Words with two consonants before the sound can only be divided into three syllables with /i/ as nucleus, since English syllabification prefers to balance consonants across syllables in certain ways. So /ˈpænθ.jən/ is not a well-formed syllabification.
As for whether there is a genuine contrast between champion and million, I think there may be in some instances. I can pronounce the latter either as /ˈmɪl.i.ən/, with three syllables, or as /ˈmɪl.jən/, with two, but /ˈtʃæmp.jən/ just sounds wrong to me. YMMV, though.