Learn English – How to calculate number of syllables in a word using only the IPA (International Phonetic Alphabet) spelling

ipapronunciationsyllables

I want to write an algorithm to calculate the number of syllables in a word. This process is an automated one that will be run on an entire dictionary so manually counting the number of breaths, chin movements etc. as mentioned in other questions won't scale. I also don't want to visit a website like howmanysyllables to input the word because I don't want to depend on a non free and open source system.

To get around tricky words like "Wednesday", I thought it would be easier to use the IPA transcription of words instead. I have the IPA transcription for all words in my downloaded dictionary but to my dismay I discovered there seems to be no surefire method of counting syllables.

Consider these two transcriptions to IPA:

pronunciation: pɹəˌnʌn.siˈeɪ.ʃən
conscientious: ˌkɒnʃiˈɛnʃəs

For the IPA word "pronunciation", lower apostrophe, upper apostrophe and period can be used to tell where the syllable breaks occur. The IPA word for "conscientious" only seems to strictly indicate a single break. You could say the "ʃ" indicates a syllable break but I worry this isn't the case for all words.

Is there a list of rules that define syllable breaks in US English for IPA transcriptions?

Best Answer

Here's a spreadsheet with English words, IPA and syllable data https://docs.google.com/spreadsheets/d/1EfFhhC7kcTzB8c2UhAC53txRiTLKl3R9C2AM7ee0AVM/edit#gid=104606017


So after coming across this question and wanting to know the answer myself, I managed to pull together a few different sources of information into one Excel spreadsheet, and get data for just over 31,000 words with both their IPA pronunciations, and the number of syllables. I also found some frequency data, which can be used to naively split the words into deciles, according to how often the words are roughly used - meaning that you can sort the words by both syllable length and frequency of use (which is a very rough measure of complexity.)

Caveat: the pronunciation data I've pulled is from UK English. I pulled it from a GitHub repo containing IPA information for many languages, which also contains a file containing US English words and their pronunciations, linked here.

I haven't integrated it myself because I only need the UK data*, but you can pull the data into Excel fairly easily - the fields are split by whitespace, so Text-to-columns should separate the IPA from words. If you're comfortable with Excel then it should be fairly simple to combine this with the other data to get a list of all US English pronunciations and their syllable counts.

The rest of the sources for the data are linked in the spreadsheet itself.


* Also, I did try to add in the US Word data to the Google Sheets, but Sheets complained that this would exceed the cell limit. I put together this project based on UK data before I realised that you're probably from the US, and built all the formulas around it, so it would take a bit of unpicking for me to switch it over fully. I might come back to it another day. Hope this is still of some use to you.