You raise a valid concern. On the one hand, we often talk of periphrastic tenses (and other constructions); on the other, some insist that a tense should be confined to a single word. Others, again, hold that tense is a property of a sentence or clause, not of a word or phrase. Can this problem be solved at all?
The short answer is: there are different models; some models are incompatible with certain other models; and we are free to choose whichever model we prefer. The term periphrastic tense is useful in a model that allows for tenses that consist of more than one word, but not in a model that doesn't. The definition of "tense" is not an objective fact that exists independent of human analysis: it is ultimately a label of convenience created by the observer. Both kinds of models have merit.
Most language users happen to think of will do as the future tense. Some linguists use other models. There is no consensus, not even among linguists, about what constitutes a tense.
Even word boundaries are not objective facts
Perhaps the most fundamental issue you raise is that of word boundaries. What were once considered two separate words may fuse into a single, new word, as in cantare habeo => chanterai. At some point in its development, the status of this phrase-or-word must have been uncertain. This shows how relative the whole terminology is.
But in most cases, a reasonable case can be made for either one or the other, so that the fundamental issue temporarily recedes to the background; it should be noted, however, that what we consider a "word" is to some extent intrinsically subjective and a matter of convention. It is just a convenient demarcation. But let's move on.
Is tense determined by form or by function?
Let me illustrate the problem by means of Latin, where terminology has been fixed for a long time. Tense comes from Latin tempus, "time"; part of the oldest concept of tenses had to do with notions of time. However, there was never a one-to-one correspondence between tenses and temporal references. The pluperfect, for example, is normally used to refer to a time before a narrated time in the past, just as in English; and yet after postquam, "after", the perfect was used, not the pluperfect. Similarly, the imperfect and pluperfect could be used to refer to an hypothetical situation in the present, as in English if I was rich... (although subjunctives were far more common). And so on.
Si domi eram, pater me puniebat. = If at_home I_was, father me punished.
"if I were at home, father would punish me."
Postquam Galliam vidi, vici. = After Gaul I_saw, I_conquered_it.
"After I had seen Gaul, I conquered it."
And yet we still call the verbs in these examples imperfect and perfect, respectively, even though they do not have their usual temporal references. The reason we do this is that the form is named after its most common function, even though it can indeed have other functions. Latin and English do this and are by no means the only languages.
Do we then look only at the form of the verb, not at its function, when defining tenses in Latin? No. What we call the passive perfect is periphrastic/analytic/compound, just as in English:
Canis sum. = Dog I_am.
"I am a dog."
Visus sum. = Seen I_am
. "I am/was seen."
You could say this is not a special tense, but two words, one being a past particple, the other a present verb; and yet this is called the passive perfect. The reason is that it functions just as the perfect does—except that it is passive. Here function determines what we call it. This happens in English too when we say I will do it is in the future tense.
Humans like symmetrical systems
So then what constitutes a tense, if we can count neither on form, nor on function, at least not reliably so? The answer is probably symmetry. If there is a present active (video "I see"), a present passive (videor, "I am (being) seen"), and a perfect active (vidi "I saw"), we would like there to be a perfect passive. Because there was no such verbal form, a phrase was made to be equivalent, (visus sum "I was/am seen in the past"). We humans like our systems neat and symmetrical if possible:
Active Passive
Present video videor
Imperfect videbam videbar
Perfect vidi [visus sum]
Future videbo videbor
Now is this label "passive perfect" merely a convention? It may have been once, but, as people start believing in it, they start using it in ways that neatly fit the system, even if the meaning of visus sum was once somewhat different. It is in some ways a self-fulfilling prophecy. Whenever a sentence in the active perfect was passivated, instead of saying "oh, I can't do that", people started thinking, "this is the passive perfect; I will use it". The same applies to I will do it in English.
All three approaches have up-sides and down-sides
Is this a perfect system of terminology? No. There are serious disadvantages. But it has been in use for a long while, and most people think of "I will do it" as fitting within a neat system of past, present, and future, because that is the most convenient and obvious partition of our verb tenses, or so we feel.
Various branches of linguistics have proposed different systems and different terminologies in the past. This is a productive and beneficial approach. Some chose to focus on form and consider the English periphrastic future not a tense at all; they will only count affixes and endings as capable of forming tenses. This system certainly has merit.
Others have emphasised function; they have gone so far as to declare that, since many forms can be used for more than one function, as with si eram... / "if I was...", only foregoing form altogether leads to a consistent approach. Hence they treat tense as a property of a clause or sentence, not of a word or phrase. That way, only combined with a word like yesterday does was acquire a past tense; in if I was at work today, you wouldn't see me here, it is a present tense, because it refers to a situation in the present, be it an hypothetical one. This approach, too, has merit.
One could use several systems at once
As an alternative, we could invent new words for these two new approaches, such as *single-word tenses for the English simple present and simple past, and time-reference or temporality for the time-reference of a clause or sentence. Many different models are possible. Insisting on one model without considering the benefits of other models seems unwise. And saying "x is A" when you mean "I find the model in which x is called A most useful" is a simplification.
Suppletion as an illustration of a convenient choice
Some systems are uncontested, even though at some point in the past a fairly arbitrary choice must have been made.
I go.
I went.
Do these two forms belong to the same verb? Yes, you will, say, because that is what you were taught, and because they "feel" like the same verb, just with odd forms. But, in the past, there were two verbs, both meaning something like going (although there were no doubt some differences between them). At some point the present form of a verb resembling go was taken, its past forms discarded (or not, if such never existed), and the past form of a verb resembling went.
We could say, "there are two defective verbs in modern English, one lacking a past form, the other a present form"; but we choose not to do so. That is to some degree arbitrary, but in this case it is just very convenient. If certain linguists would prefer to treat them as two different verbs, then let them do so, if this is somehow more convenient in a certain linguistic analysis. Or they could just say "this verb consists of two different roots", as they no doubt do.
To determine whether auxiliaries are verbs, we should examine two kinds of properties. One kind of important property relates to word forms, and the other kind of property relates to word use.
Most verbs have properties such as tense, aspect and mode.
The verb to be is a normal, complete verb. Was and were are past tense forms of to be, and is, are and am are present tense forms. Being is a continuous aspect form, and been is a perfect aspect form. The were of "if I were a rich man" and the be of "be he live or be he dead, I'll grind his bones into my bread" are subjunctive forms.
If can is a verb, then it is a defective verb. It doesn't have forms that show aspect. The forms canning and canned don't exist for the auxiliary. Neither does the infinitive to can.* However, it does have the properties of tense and mode. Could is both a past tense form and a subjunctive mode form, just as were is both a past tense form and a subjunctive form of to be.
Word form isn't the only way to show a verb's mode. The interrogative mode is usually shown by the verb's position.
The statement "he is a student" employs the indicative mode. The interrogative mode places the first word of the verb** before the subject: "is he a student?" The statement "he can be a student" is subject to the same transformation: "can he be a student?"
A clause pairs a subject with a predicate. A predicate requires a verb. We consider a statement like "he studies" to be a complete clause. In answer to the question "is he a student?", the answer "he is" also counts as a complete clause. If, in answer to "can he be a student?", the statement "he can" is a complete clause, then the can must be a verb.
Although we can see that the auxiliary can doesn't exhibit every property that most verbs have, it does exhibit properties that only verbs have. It has a form that marks tense or mode. Its position can indicate a mode that word forms cannot. It can act on its own as the predicate of a clause.
All these reasons support the idea that defective verbs are verbs. We don't have a good reason to place such words in a different grammatical category.
_______________
* Yes, there are homonyms that do have continuous, perfect and infinitive forms. We won't consider those to be the same verb.
** Most one-word verbs require that a word be added to the verb phrase so that the added word can be moved. The statement "he studies" becomes "he does study" on its way to becoming "does he study?" The notable exception to this is when the one-word verb is a form of to be.
Best Answer
In this case, I think the Wikipedia article is pretty good.
Auxiliary Verb
So, no, "let" is not an auxiliary verb by most definitions. It's a much more interesting thing--a ditransitive verb with a verbal object.