foone , Englisch
@foone@digipres.club avatar

The way sentences containing the German character ß get longer when uppercased was specially designed to create memory problems in C programs doing string handling

henrikjernevad ,
@henrikjernevad@mastodon.social avatar

@foone That was actually the cause of a bug I spent way too long time finding. 😂 I even wrote a blog post about it.

https://henko.net/blog/i-can-be-wrong/

_nd_ ,
@_nd_@fnordon.de avatar

@foone I was part of an upgrade project where program parts written in C was moved to Java. The DB layer of the program used two columns VARCHAR(n) for text - one as-is, and one in upper case for indices; both with the same n. The client truncated the string.
The upgrade was a long project and tested extensively, but on the day of the go live the DB connection suddenly hang.
Reason: Java did The Right Thing and converted ß → SS, and the DB interface didn't deal well with too long strings.

FantasmitaAsex ,
@FantasmitaAsex@todon.eu avatar

@foone *doing string handling bad and/or assuming that everything is ASCII or Latin1

technocidal ,
@technocidal@mastodon.social avatar

@foone And now that we’ve added an uppercase ẞ all the primitive search-and-replace tactics no longer work 😀

kawa ,
@kawa@mas.to avatar

@foone In UTF-8 they'd remain the same length :3

krono ,
@krono@toot.berlin avatar

@foone In some official capacities, where things have to be uppercased (from typewritert days), the "ß" has to be transformed into "(SS)" (so as to differentiate "Assman" -> "ASSMAN" and "Aßman" -> "A(SS)MAN", and yes it is hilarious for English readers).
It is its own pumping lemma of sorts.

larsmb ,
@larsmb@mastodon.online avatar

@foone Thankfully UTF-8 provides this services for many languages now, us Germans are no longer special

JustJimWillDo ,
@JustJimWillDo@mastodon.online avatar

@foone

I doubt this, but I love that it is at least a possibility.

acb ,
@acb@mastodon.social avatar

@foone They’ve finally added an eszett to Unicode, though typographers are still debating what it should look like: http://cinga.ch/eszett/

slyecho ,
@slyecho@mdon.ee avatar

@foone Both are 2 bytes in UTF-8: c39f or 7373.

tehabe ,
@tehabe@norden.social avatar

@foone at least there is a ẞ now

Cryptomon ,
@Cryptomon@bunt.social avatar

@foone there is an uppercase ß! (But most Germans don't know and it's on no keyboard...)
I had to copy it from Wikipedia: ẞ

fuzuki ,
@fuzuki@mas.to avatar

@foone Well why wouldn't "Ss" be longer than "ss" huh? Makes complete sense doesn't it?

cato ,
@cato@chaos.social avatar

@foone I was gonna say "just use ẞ" but depending on encoding, that might also add another byte or two I guess? Then again, is this really the only case where the uppercase variant of a character would require more bytes than the lowercase variant?

stfn ,
@stfn@fosstodon.org avatar

@foone Probably not related that much, but I remember that in the early days of mobile phones, when every text message was expensive, there was an an outrage that Polish diacritics (ąęźćńż) were counted as more than one character within the 140 characters limit.

humanhorseshoes ,
@humanhorseshoes@mastodon.world avatar

@foone just use ss

  • Alle
  • Abonniert
  • Moderiert
  • Favoriten
  • random
  • haupteingang
  • Alle Magazine