Words don't come easy to me
Tue, 17 Sep 2019 08:52:31 +0100

Syllabits & word discovery

I've been working on an English version of my word puzzle game Silabitas. I'm going to call it Syllabits, and it's looking like this at the moment,

The basic rule is that you can only place a piece on the board if you make a word with it, so you have to connect it with something already on the board every time. So above "se" you could place "po", to make the word "pose". Or you could put "re", to make the word "resell", since "se" is also connected to "ll".

Every stage has multiple solutions, so you can try placing pieces at random at times. And when you do that, more often than not, you discover new words you didn't know about. I think this aspect of word discovery is quite fun.

Dictionaries

So it is very important that you have access to a dictionary to check those words you discovered. You can check the list of words you've made in a stage, and click to get its definition from an online dictionary. In Silabitas, that is the Diccionario de la lengua española de la Real Academia Española, and in Sil·labetes it is the Diccionari de la llengua catalana de l'Institut d'Estudis Catalans. For Syllabits, I decided to go for the Oxford Dictionary of English at Lexico.

In order to check if a word exists I'm using the word list from the system spellchecker, a 2MB file stored at /usr/share/dict/web2, augmented with even more words that I found here, for a total of 370,103 words. This includes verbs conjugations and other word transforms of the English language, like adding -er to adjectives.

However, I soon found out that this spell checker has way too many words. While playing, I would make words that later I couldn't find in the Oxford dictionary. I find this really annoying and I don't think it's a good user experience. I guess spell checkers keep accumulating words, whatever the source, and whatever they old they are. But I needed something more up to date.

Filtering words

I decided to check the words from the spell checker against Lexico. If the word is not found, it sends you to an error page. But more interestingly, if you search something like "bigger" it will redirect you to the root word, "big" in this case. The same for verbs.

So just by checking the headers, if I see the page is redirecting me to a definition, I don't need to look any further, since I know the word is in the dictionary. The only problem is that I should make at least 370K web header requests to Lexico. A request was taking on average 300ms, so it would take at least 31 hours.

I wasn't too worried about the time, but I was afraid they may think I was attacking their site or something with so many requests, although I was throwing only one request at a time. It turns out the requests were coming back even at a slower pace, but I kept filtering the spell checker list...

... until I went on vacation for a few days and when I came back they must have noticed that I was making too many requests and they started returning 429 errors. Those didn't appear last week. So I guess in a sense I helped them better their site? Should I be proud? 🙈

The 429 errors came with a "retry-after" field, set to 5 minutes. So I changed my script to retry after the amount of time requested. For the last 20% of words it took for the script more than 5 days to check them. The final filtered list contains 165,589 words.

It was worth the effort. Now you can play the game and be reassured that you are going to be able to find online the definition of the words you discover while playing it.

Words that didn't make it

Exploring the list of more than 200K words that got discarded from the spell checker list is quite interesting. This whole thing started because I noticed the word "garrafa" in the spell checker list. That's a Spanish word. I know Merriam-Webster includes many Spanish words in its dictionary, from American influence, I suppose. But "garrafa" isn't there either. Perhaps it had been there in the past, but it's been removed already. But the spell checker hasn't been updated.

Another funny word I found is "naricorn". It's not in the Oxford dictionary, but I found it in Merriam-Webster (naricorn). Interestingly, when you visit that definition, you get this message:

Love words?
You must — there are over 200,000 words in our free online dictionary, but you are looking for one that’s only in the Merriam-Webster Unabridged Dictionary.
Start your free trial today and get unlimited access to America's largest dictionary,

So it must be a rare word indeed! 😃

Some other interesting ones not in the dictionary are words like "peacefuller". I guess if you are already "full", you can be "fuller" than that 😃 Merriam-Webster does redirect you to the definition of "peaceful", but Oxford/Lexico doesn't.

There are also a lot of words beginning with "anti-" or "ultra-" that didn't make it either. Like "antiagglutinating", "antiauthoritarianism", "antibenzaldoxime", "antimeningococcic", "ultradolichocranial", "ultrafashionable", "ultrafederalist", or "ultraphotomicrograph". Again, some of these appear in Merriam-Webster. I'll let you guess which ones.

More fun coming!

Stay tuned for the release of the game! I want to release it with plenty of stages. I targeting 60 for the release. And I'm also planning to add support for iOS13 dark mode. In the meantime, try the Spanish version, Silabitas.

 
Newer|Older

Previous year