ewx: (geek)
[personal profile] ewx

http://www.greenend.org.uk/rjk/junk/cc.html in Linux Firefox:

Windows Firefox is not really any better:

IE7 gets it right:

Konqueror doesn't even try:

I don't have Safari to hand right now.

I guess for best results send precomposed characters to the browser, and put up with bad rendering where none are available l-(

(These are the Firefox and Konqueror from Debian etch.)

(deleted comment)

(no subject)

Date: 2007-11-28 02:01 pm (UTC)
From: [identity profile] sbp.livejournal.com
On Tiger too.

(no subject)

Date: 2007-11-28 01:44 pm (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
Is there a unicode "make string uppercase and leave accent marks off upper case letters" function for when that *is* what you want?

(no subject)

Date: 2007-11-28 01:58 pm (UTC)
ext_8103: (geek)
From: [identity profile] ewx.livejournal.com
There are standard rules for upper-casing and you can dispose of accents by decomposing and then removing combining characters (DisOrder does this as part of its search term normalization). Obviously this will change the meaning of some words.

(no subject)

Date: 2007-11-28 02:19 pm (UTC)
From: [identity profile] bellinghman.livejournal.com
And indeed, whether this should be done at all is a language issue - not all languages uppercase the same letters in the same way.

(Not to mention the nightmare that is the German esszet (http://en.wikipedia.org/wiki/Esszet) ß)
Edited Date: 2007-11-28 02:23 pm (UTC)

(no subject)

Date: 2007-11-28 02:34 pm (UTC)
sparrowsion: photo of male house sparrow (string-handling kitten)
From: [personal profile] sparrowsion
eg, Most languages using a Latin alphabet uppercase "i" to "I". Turkish uppercases it to "İ"
Edited Date: 2007-11-28 02:35 pm (UTC)

(no subject)

Date: 2007-11-28 02:45 pm (UTC)
ext_8103: (geek)
From: [identity profile] ewx.livejournal.com

DisOrder's users are largely native English speakers so will almost certainly prefer accents to be disregarded when comparing the search terms they typed with the track names, even if the words themselves aren't English. For one thing many of them will be unsure how to type accented characters in the first place! The worst effect is a few false positives in the search results (since there's currently no way of saying "and not this word").

Germans should be fine in any case; the standard rules case-fold ß to ss. Turks who haven't got used to the rest of the world's treatment of I will have to regenerate its mapping tables with the T mappings included, there's even a comment in the Perl script at the place they need to change.

(no subject)

Date: 2007-11-28 03:47 pm (UTC)
From: [identity profile] aardvark179.livejournal.com
I hate the typography related plotting bugs we get at work, especially from customers who work in several countries which may have differing rules.

It makes the mixed Japanese/English type bugs seem nice and easy in comparison.

(no subject)

Date: 2007-11-28 04:15 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Further to [livejournal.com profile] ewx's point above: another important thing about DisOrder in particular is that it's searching a music collection, which is quite likely to contain names such as Motörhead and Queensrÿche and Mötley Crüe and even (crosses fingers) Spin̈al Tap, in which the diacritics are purely ornamental and nobody is going to want to have to remember where they appear in order to search for the band...

(no subject)

Date: 2007-11-28 05:01 pm (UTC)
From: [identity profile] aardvark179.livejournal.com
Searching and sorting is a different but related problem, you should find the results even if the user doesn't type the accents, and they should normally be ignored for sorting purposes as well.

(no subject)

Date: 2007-11-28 05:06 pm (UTC)
ext_8103: (geek)
From: [identity profile] ewx.livejournal.com

DisOrder has two contexts in which it orders unicode strings; firstly for display purposes, in which case, as you say, the same kind of normalization as done for search is probably the most appropriate; but it also uses track names as database keys, with an ordering that accounts for their filename structure. In that case they are normalized to NFC and accents and letter case are strictly preserved.

As it happens the use of NFC there, and also the conversion of all user commands to NFC, propagate through to the display of bits of track names, meaning that user interfaces will indeed get precomposed characters where available, minimizing the effect of the browser bug I started with l-)

(no subject)

Date: 2007-11-28 05:36 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
How does it cope with Japanese track names, by the way?

(no subject)

Date: 2007-11-28 05:53 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
I've not tried any, I'm afraid...

(no subject)

Date: 2007-11-28 06:02 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
I can provide you with some examples if you like :-)

(no subject)

Date: 2007-11-28 08:46 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
All I really need is artist/album/title, and some hints as to user expectation of processing of these names, rather than actual content. Then I can stuff them into my test scripts...

(no subject)

Date: 2007-11-29 11:16 am (UTC)
From: [identity profile] senji.livejournal.com
FF 2.0.0.10 on Windows actually appears to get Spin̈al Tap right there, even down to being able to cut-and-paste it. (Hmm, and doesn't the caret seem strange when moving over a combining-character (and not moving)).

(no subject)

Date: 2007-11-30 04:47 pm (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
I think that's what I was thinking of -- if you input the strings "[capital o][combining diaeresis]" or "[capital o with diaeresis]" the browser ought to display them, but if you enter a track name in lower- or non-cased, and specify it should be displayed in upper case, you need a language-appropriate algorithm.

Not that throwing away diaeresised-Os is sensible, it just happens to correspond with something languages do do (for much the same reason).

(no subject)

Date: 2007-11-28 01:56 pm (UTC)
From: [identity profile] mobbsy.livejournal.com
Opera on Linux loses too, in a fairly similar way to your firefox on linux example.

(no subject)

Date: 2007-11-28 01:59 pm (UTC)
ext_22879: (Default)
From: [identity profile] nja.livejournal.com
Firefox in Windows Vista does it properly, Firefox in XP is dodgy:

Vista:
Image IE7
Image Firefox

XP:
Image IE7
Image Firefox

(no subject)

Date: 2007-11-28 04:20 pm (UTC)
fanf: (weather)
From: [personal profile] fanf
Why are the Windows diaereses offset to the right?

(no subject)

Date: 2007-11-28 02:00 pm (UTC)
From: [identity profile] sbp.livejournal.com
Camino is way off - the accents are a character space to the right.

(no subject)

Date: 2007-11-28 02:03 pm (UTC)
From: [identity profile] burkesworks.livejournal.com
FF (2.0.0.10) on Tiger (10.4.11) produces the following;

Image

(no subject)

Date: 2007-11-28 02:20 pm (UTC)
From: [identity profile] bellinghman.livejournal.com
Total failure there - just rendering the combining character as a separate (and not in typeface) character.

(no subject)

Date: 2007-11-28 03:15 pm (UTC)
From: [identity profile] timeplease.livejournal.com
Firefox on Ubuntu 7.10 gets it right.

(no subject)

Date: 2007-11-28 03:30 pm (UTC)
From: [identity profile] keirf.livejournal.com
Look at this, personally I start out okay if somewhat shaky, but then I get bored and render the last one as a small green alien instead.

Image

(no subject)

Date: 2007-11-28 03:55 pm (UTC)
From: [identity profile] nmg.livejournal.com
*snork*

(no subject)

Date: 2007-11-28 03:57 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
I think you win.

(no subject)

Date: 2007-11-28 08:51 pm (UTC)
From: [identity profile] lionsphil.livejournal.com
Nice one. :)

(no subject)

Date: 2007-11-28 05:00 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
An up-to-date Firefox (2.0.0.10) on my creaky old RH9 box at work manages this:

i.e. the combining characters exist in the fonts but the renderer doesn't appear to have noticed that they're supposed to be combining.

Mind you, this system has a severely funted font setup: the same Firefox also renders U+03C0 GREEK SMALL LETTER PI as if it were the symbol for the universal set.

(no subject)

Date: 2007-11-28 05:15 pm (UTC)
sparrowsion: (cat5)
From: [personal profile] sparrowsion
OK, this is really strange. Firefox/Iceweasel on my Debian etch manages much better:

(no subject)

Date: 2007-11-28 05:33 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Looks like a different font (and perhaps thus different rendering code?)

(no subject)

Date: 2007-11-29 10:59 am (UTC)
sparrowsion: (cat5)
From: [personal profile] sparrowsion
"serif", whatever the hell that really is.

(no subject)

Date: 2007-11-28 05:36 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
That's particularly curious because the capital O-diaeresis looks better when not precombined!

(no subject)

Date: 2007-11-28 08:57 pm (UTC)
From: [identity profile] lionsphil.livejournal.com
Opera 9.24 on OS X 10.4 is fine (although the umlaut on the `O' is right against the table row border) and indistinguishable between the precombined/combining forms. Likewise Opera 9.5 Alpha, and Safari 2.0.4.

Firefox 2.0.0.8 puts the umlauts to one side (the "doesn't know that they should be combined" case). Amaya 9.54 puts the combining lowercase `o' umlaut a pixel to the right of the precombined one, and the capital `O' umlaut at the same height as the lowercase `o' one, so it's vertically just below the inner top apex.

(I'd take screenshots, but my brain is desperately trying to shut down right now.)

January 2026

S M T W T F S
    123
45678910
111213141516 17
18192021222324
25262728293031

Most Popular Tags

Expand Cut Tags

No cut tags