data:image/s3,"s3://crabby-images/90d4a/90d4a63bce72b0f31081018ca309addb6f653dfb" alt="Screenshots of Syllabits, Silabitas, and Sopilabitas"
Bugs in my word puzzle games
I have several word puzzle games in the App Store, and 3 of them started acting funny in recent versions of iOS: Silabitas, Sil·labetes, and Sopilabitas. The English one, Syllabits, still works fine.
What happened is that some words could not be made, as if they were not in the dictionary. For instance, there’s a stage in Silabitas where you have to construct the word “chaqueta” (jacket), but the game doesn’t recognize it, so you can’t clear the stage.
data:image/s3,"s3://crabby-images/3971d/3971d3cd3743f926387f288a723b653e860f1a40" alt="Screenshot of Silabitas chaqueta stage"
The bug appeared in recent version of iOS because the way strings are sorted has changed. Continue reading for more details.
Alphabetical order across languages
In my word games I do string comparison with the compare
function in Swift string class to check the order of letters and words. Strings are made of Unicode characters.
In Unicode, each character is assigned a unique code point. For example:
- “ñ” (LATIN SMALL LETTER N WITH TILDE) has a Unicode scalar value of
U+00F1
. - “o” (LATIN SMALL LETTER O) has a Unicode scalar value of
U+006F
.
When comparing strings or characters in Swift, the result depends on the collation algorithm being used. Collation determines how characters are ordered, taking into account locale-specific rules.
It seems that by default the string comparison was using the locale before, but that seems to have changed. If no locale is explicitly provided, Swift must be using binary Unicode ordering now, instead of a specific linguistic locale. That means that the “ñ” will appear at the end of the alphabet, after the “z”, instead of appearing after the “n”, and before the “o”, because its numerical value is greater than “z”.
To fix that I simply needed to specify the locale when calling compare
. See the unit test below:
func testSpecialSymbolsOrder() { // "z" has a Unicode scalar value of U+007A. // "ñ" has a Unicode scalar value of U+00F1. XCTAssertFalse("ñ" < "o") XCTAssertFalse("ñ" < "z") // to properly sort, let loc = Locale(identifier: "es") XCTAssertEqual(ComparisonResult.orderedAscending, "ñ".compare("o", locale: loc)) XCTAssertEqual(ComparisonResult.orderedDescending, "ñ".compare("n", locale: loc)) }
Issues with binary search
The reason why that change in iOS prevented the game from functioning correctly was that I sorted the dictionary in advance using the correct collation for the given language. And then, I used binary search to find whether the word existed in the dictionary. If the sorting changes, binary search will fail.
Imagine that you have the words “mono, moño, mote, mozo, mula”. If you are looking for “moño” with binary search and you start in the middle with “mote”, if the sorting is wrong it will believe that “moño” should appear after “mote” (because the Unicode value of “ñ” is greater than “t”), and we will fail to find the word in the dictionary.
Diacritics and other symbols
The solution wasn’t that easy in Catalan. The character “ç” was treated before as appearing after “c” and before “d”. That is correct, but when appearing in a word, the Catalan collation rules say that the “ç” should be treated as a a diacritic, that is, the “ç” is equivalent to a “c” and only placed after it when there are 2 equivalent occurrences. That means that the order should be “caca, caça, cacatua, cada”, and not “caca, cacatua, caça, cada” as the older version of iOS had given me.
Another example of such diacritic symbol in Catalan is the interpunct between els (to make a longer “l” sound): “l·l”. An example of sorting words using the correct collation would be: “filla, fil·lastomàtid, fillastra“.
But the collation algorithm is not ignoring other symbols such as the hyphen or the apostrophe, that appear in some words of the dictionary, such as “pèl-blanc”, or “d'amagatons”. In order to ignore these, I remove them from the string before doing any comparisons.
Here’s some reference code in Swift to do the search with the correct sorting. The binary search function is from Stackoverflow.
extension String { func noSymbols() -> String { return self // e.g. d'amagat .replacingOccurrences(of: "'", with: "") // e.g. pèl-blanc .replacingOccurrences(of: "-", with: "") } } class WordDictionary { let words: [String] let locale = Locale(identifier: "ca") func getWordIndex(_ word: String) -> Int { // the diacritic insensitive search treats ñ as n, and ç with c, // but we want to avoid that, so use the locale and only remove symbols let ref = words.noSymbols() let i = words.binarySearch({ let w = $0.noSymbols() let order = w.compare(ref, options: [.caseInsensitive], locale: locale) return order == .orderedAscending }) return i } }
Allow bad spelling
The compare function has also an option to ignore diacritics. When comparing words, I can’t use that option because there are certain things I don’t want to ignore. In particular, the “ç” is a letter that can be input in the game, so “caca” (poo) is different from “caça” (hunt). But the player can’t input accents or interpuncts, so in those cases I do want to ignore the differences.
So instead of relying on the compare function, I convert every string to a “potentially badly written” one where I removed everything but the “ç” (in the case of Catalan):
extension String { func unaccented() -> String { return self.lowercased() .replacingOccurrences(of: "à", with: "a") .replacingOccurrences(of: "ä", with: "a") .replacingOccurrences(of: "è", with: "e") .replacingOccurrences(of: "é", with: "e") .replacingOccurrences(of: "í", with: "i") .replacingOccurrences(of: "ï", with: "i") .replacingOccurrences(of: "î", with: "i") // only maître... .replacingOccurrences(of: "ò", with: "o") .replacingOccurrences(of: "ó", with: "o") .replacingOccurrences(of: "ö", with: "o") .replacingOccurrences(of: "ú", with: "u") .replacingOccurrences(of: "ü", with: "u") } func badWritten() -> String { return self.unaccented().noSymbols() .replacingOccurrences(of: "·", with: "") } }
If I find a match in the dictionary, I return the entry from the dictionary so the player sees the correct spelling.
Conclusion
Nothing apparently simple is really that simple. The evil is always in the details. It turns out that alphabetical order is not universal, and system functions change with time. Be sure you write unit tests for the system functions that are required by the core functionality of your app. In this instance the unit tests I already had in place saved me a lot of time.
And strings will always be the last boss in computer science! 😂
Tweet