The way sentences containing the German character ß get longer when uppercased... - Random

foone , vor 2 Monaten Englisch

The way sentences containing the German character ß get longer when uppercased was specially designed to create memory problems in C programs doing string handling

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

henrikjernevad , vor 2 Monaten

@foone That was actually the cause of a bug I spent way too long time finding. 😂 I even wrote a blog post about it.

https://henko.net/blog/i-can-be-wrong/

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

_nd_ , vor 2 Monaten

@foone I was part of an upgrade project where program parts written in C was moved to Java. The DB layer of the program used two columns VARCHAR(n) for text - one as-is, and one in upper case for indices; both with the same n. The client truncated the string.
The upgrade was a long project and tested extensively, but on the day of the go live the DB connection suddenly hang.
Reason: Java did The Right Thing and converted ß → SS, and the DB interface didn't deal well with too long strings.

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

FantasmitaAsex , vor 2 Monaten

@foone *doing string handling bad and/or assuming that everything is ASCII or Latin1

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

technocidal , vor 2 Monaten

@foone And now that we’ve added an uppercase ẞ all the primitive search-and-replace tactics no longer work 😀

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

kawa , vor 2 Monaten

@foone In UTF-8 they'd remain the same length :3

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

krono , vor 2 Monaten

@foone In some official capacities, where things have to be uppercased (from typewritert days), the "ß" has to be transformed into "(SS)" (so as to differentiate "Assman" -> "ASSMAN" and "Aßman" -> "A(SS)MAN", and yes it is hilarious for English readers).
It is its own pumping lemma of sorts.

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

larsmb , vor 2 Monaten

@foone Thankfully UTF-8 provides this services for many languages now, us Germans are no longer special

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

JustJimWillDo , vor 2 Monaten

@foone

I doubt this, but I love that it is at least a possibility.

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

acb , vor 2 Monaten

@foone They’ve finally added an eszett to Unicode, though typographers are still debating what it should look like: http://cinga.ch/eszett/

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

slyecho , vor 2 Monaten

@foone Both are 2 bytes in UTF-8: c39f or 7373.

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

tehabe , vor 2 Monaten

@foone at least there is a ẞ now

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

Cryptomon , vor 2 Monaten

@foone there is an uppercase ß! (But most Germans don't know and it's on no keyboard...)
I had to copy it from Wikipedia: ẞ

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

fuzuki , vor 2 Monaten

@foone Well why wouldn't "Ss" be longer than "ss" huh? Makes complete sense doesn't it?

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

cato , vor 2 Monaten

@foone I was gonna say "just use ẞ" but depending on encoding, that might also add another byte or two I guess? Then again, is this really the only case where the uppercase variant of a character would require more bytes than the lowercase variant?

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

stfn , vor 2 Monaten

@foone Probably not related that much, but I remember that in the early days of mobile phones, when every text message was expensive, there was an an outrage that Polish diacritics (ąęźćńż) were counted as more than one character within the 140 characters limit.

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...

humanhorseshoes , vor 2 Monaten

@foone just use ss

Antworten

Melden

Aktivität

Ursprüngliche URL öffnen

Original-URL kopieren

Mbin URL kopieren

Loading...