Learn English – Dissecting an English sentence using a pattern

british-englishsentence-patternstechnical

I am trying to make a script that can dissect an English sentence.

The problem is, I have no idea how to dissect an English sentence when the words are not familiar. I know what the nouns, verbs, etc are, because I recognize the words. I can give the script a large set of words so it can recognize them as well, but surely there are certain "patterns" that one can check to see what part of a sentence the noun or verb is, even though the noun or verb is not known at the time.

Are there such patterns for the English language? Can one dissect a sentence by only "recognizing" a few words?

Understanding the whole language is not necessary, as nobody really understands everything of a natural language.

Having simple patterns and "rules" should be enough. I understand that, because English is a natural language, every rule or pattern would have some kind of exception and mapping each one of those would probably take forever.
But ignoring those exceptions (apart from the most common), I could make the script understand quite a lot.

If necessary, I could theoretically then add exceptions until done (read: forever).

Here are a few examples of patterns. (I'm not sure if these are exactly what I need, but as an example they should be fine.)

  • Subject + Verb (S-V)
  • Verb + Subject (V-S)
  • Subject + Verb + Direct Object (S-V-DO)
  • Subject + Verb + Complement (S-V-SC)
  • Subject + Verb + Indirect Object + Direct Object (S-V-IO-DO)
  • Subject + Verb + Direct Object + Object Complement (S-V-DO-OC)

Recognizing in what pattern a sentence falls, is probably first thing I should aim for.
Of course, this is much easier said then done, also, for every pattern I find, questions seem to be missing for it, but that is another issue I can deal with later.

From what I can tell, the best course of action is to find the verb in the sentence, then check if the words before it match as the subject, then check the words behind it. Etc.

While finding the verb should not be that big an issue, matching a subject seems quite hard: it can't come from a database since, thanks to names, there are infinite possibilities.
Still.. shouldn't a subject follow a pattern just as well?

Also, if I were to go this route, wouldn't it end up being a maze of patterns(not that thats necesarily a bad thing)?

Best Answer

There is an entire field looking at this question called "Computational Linguistics". Looking at any online translation tool you can see that they still have a ways to go, but there has been a lot done on parsing English.

Stanford provides a robust English parser here with the homepage here.

I recommend using an established library for your script rather than writing your own. There is no reason to reinvent the wheel, especially a wheel that contains the complications of the English language.