ewx: (geek)
[personal profile] ewx

http://www.greenend.org.uk/rjk/junk/cc.html in Linux Firefox:

Windows Firefox is not really any better:

IE7 gets it right:

Konqueror doesn't even try:

I don't have Safari to hand right now.

I guess for best results send precomposed characters to the browser, and put up with bad rendering where none are available l-(

(These are the Firefox and Konqueror from Debian etch.)

(no subject)

Date: 2007-11-28 01:44 pm (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
Is there a unicode "make string uppercase and leave accent marks off upper case letters" function for when that *is* what you want?

(no subject)

Date: 2007-11-28 01:58 pm (UTC)
ext_8103: (geek)
From: [identity profile] ewx.livejournal.com
There are standard rules for upper-casing and you can dispose of accents by decomposing and then removing combining characters (DisOrder does this as part of its search term normalization). Obviously this will change the meaning of some words.

(no subject)

Date: 2007-11-28 02:19 pm (UTC)
From: [identity profile] bellinghman.livejournal.com
And indeed, whether this should be done at all is a language issue - not all languages uppercase the same letters in the same way.

(Not to mention the nightmare that is the German esszet (http://en.wikipedia.org/wiki/Esszet) ß)
Edited Date: 2007-11-28 02:23 pm (UTC)

(no subject)

Date: 2007-11-28 02:34 pm (UTC)
sparrowsion: photo of male house sparrow (string-handling kitten)
From: [personal profile] sparrowsion
eg, Most languages using a Latin alphabet uppercase "i" to "I". Turkish uppercases it to "İ"
Edited Date: 2007-11-28 02:35 pm (UTC)

(no subject)

Date: 2007-11-28 02:45 pm (UTC)
ext_8103: (geek)
From: [identity profile] ewx.livejournal.com

DisOrder's users are largely native English speakers so will almost certainly prefer accents to be disregarded when comparing the search terms they typed with the track names, even if the words themselves aren't English. For one thing many of them will be unsure how to type accented characters in the first place! The worst effect is a few false positives in the search results (since there's currently no way of saying "and not this word").

Germans should be fine in any case; the standard rules case-fold ß to ss. Turks who haven't got used to the rest of the world's treatment of I will have to regenerate its mapping tables with the T mappings included, there's even a comment in the Perl script at the place they need to change.

(no subject)

Date: 2007-11-28 03:47 pm (UTC)
From: [identity profile] aardvark179.livejournal.com
I hate the typography related plotting bugs we get at work, especially from customers who work in several countries which may have differing rules.

It makes the mixed Japanese/English type bugs seem nice and easy in comparison.

(no subject)

Date: 2007-11-28 04:15 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Further to [livejournal.com profile] ewx's point above: another important thing about DisOrder in particular is that it's searching a music collection, which is quite likely to contain names such as Motörhead and Queensrÿche and Mötley Crüe and even (crosses fingers) Spin̈al Tap, in which the diacritics are purely ornamental and nobody is going to want to have to remember where they appear in order to search for the band...

(no subject)

Date: 2007-11-28 05:01 pm (UTC)
From: [identity profile] aardvark179.livejournal.com
Searching and sorting is a different but related problem, you should find the results even if the user doesn't type the accents, and they should normally be ignored for sorting purposes as well.

(no subject)

Date: 2007-11-28 05:06 pm (UTC)
ext_8103: (geek)
From: [identity profile] ewx.livejournal.com

DisOrder has two contexts in which it orders unicode strings; firstly for display purposes, in which case, as you say, the same kind of normalization as done for search is probably the most appropriate; but it also uses track names as database keys, with an ordering that accounts for their filename structure. In that case they are normalized to NFC and accents and letter case are strictly preserved.

As it happens the use of NFC there, and also the conversion of all user commands to NFC, propagate through to the display of bits of track names, meaning that user interfaces will indeed get precomposed characters where available, minimizing the effect of the browser bug I started with l-)

(no subject)

Date: 2007-11-28 05:36 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
How does it cope with Japanese track names, by the way?

(no subject)

Date: 2007-11-28 05:53 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
I've not tried any, I'm afraid...

(no subject)

Date: 2007-11-28 06:02 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
I can provide you with some examples if you like :-)

(no subject)

Date: 2007-11-28 08:46 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
All I really need is artist/album/title, and some hints as to user expectation of processing of these names, rather than actual content. Then I can stuff them into my test scripts...

(no subject)

Date: 2007-11-29 11:16 am (UTC)
From: [identity profile] senji.livejournal.com
FF 2.0.0.10 on Windows actually appears to get Spin̈al Tap right there, even down to being able to cut-and-paste it. (Hmm, and doesn't the caret seem strange when moving over a combining-character (and not moving)).

(no subject)

Date: 2007-11-30 04:47 pm (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
I think that's what I was thinking of -- if you input the strings "[capital o][combining diaeresis]" or "[capital o with diaeresis]" the browser ought to display them, but if you enter a track name in lower- or non-cased, and specify it should be displayed in upper case, you need a language-appropriate algorithm.

Not that throwing away diaeresised-Os is sensible, it just happens to correspond with something languages do do (for much the same reason).

January 2026

S M T W T F S
    123
45678910
111213141516 17
18192021222324
25262728293031

Most Popular Tags

Expand Cut Tags

No cut tags