Saturday, September 02, 2006

The paradox of Unicode adoption

Unicode works in casual memos but not in books


Everyone can use Unicode these days, as long as you work with a reasonably new computer system and use software like a common word processor (that’s PC phrase for MS Word) or even Notepad. Everyone and his dog can also create web pages using Unicode without trying hard. You can compose E-mail in Unicode, though the odds are that many recipients will see it as distorted by the software they use; webmail systems are particularly primitive as regards to using anything beyond Ascii characters. Among Internet discussion forums, there are already many alternatives that let you write and read Unicode easily, as long as you’ve learned how to type the characters.

But if you try to write a book or an article for a printed publication, you will typically be in a deep trouble if you try to use anything beyond the Latin 1 repertoire. Everything works fine in your text processor, but as soon as it reaches the publisher’s system, characters will get munged in imaginative ways. Widely used publishing software like FrameMaker or InDesign just don’t grok Unicode yet. Troubles are also ahead if you try to enter characters beyond Latin 1 into a database in the naïve expectation that databases are generally Unicode-enabled.

In practice, you should probably accept the fact that anything beyond Latin 1 needs to be expressed using images, in a printed publication. This is fairly stupid especially if you write about extended character repertoires, as I often do. You cannot show examples of special characters in running text.

I guess there is a possible solution in many cases, but it’s not acceptable to many publishing houses and typographers: the author prepares the entire material in MS Word and converts it to PDF format. That way he can check the result easily and fix it as needed. You may need to create the PDF file using font embedding techniques, so that the file contains the fonts it needs. And there are probably pitfalls, and many authors wouldn’t know how to handle the process, but I think the real main objection is that such approaches are “primitive.” The real primitiveness, however, is in the limitations of current publishing software. Software that cannot handle characters beyond an 8-bit set in any reasonable way is comparable to a system that cannot handle letters “x” and "y,” since to many languages and cultures, some “special” or “extra” characters are just as essential as “x” and "y” are in English.

4 Comments:

Blogger Joe Clark said...

I have had no difficulty typesetting in "Unicode" characters in InDesign CS; Cf. the type-sample PDF I created.

7:38 AM  
Blogger jaime said...

You are right! In the era of multi-core chips, 32 nm litography and petaflop supercomputers, there are still some silly problems.

For example, I don't know why you can turn a computer off, with open files.

Just tell a normal person that his 300 GB hard disk data doesn't exist anymore, after a computer glitch. He will never understand how, one second before all his files were there and one second after, there is none.

It's very easy, technologically, to solve this problem. Just add a 16 GB USB drive to the HDD and record the data in both devices. Add the logic to the USB drive so the data is recorded there as a circular list.

You will have always the last 16 GB recorded two times. If you will, you can add more 16 GB USB drives, very cheaply.

And I say "16 GB" now, but in a short time we will have 32, 64, 128 GB USB drives, for the same price or less.

Another silly problem: I'm working with my two computers, a laptop and a desktop machine, plugged in the same outlet. The dog runs under the table and unplugs my two computers. I can keep working with my laptop but my desktop computer is dead. Why? Why don't they put a battery inside every desktop computer, just as in laptops?

There's a lot of space inside a dektop computer. How much does a battery cost? How much does your data cost?

With 1,000 million computers in the world, how many persons will lose their data in the next 24 hours? They surely would have paid, happily, an extra 10 or 20 o 30 dollars just to keep their data safe.

Engineers should take a course in "human computing", that is, computers used by normal people with common problems like a dog or a child unplugging your computer... or you, stretching your legs under your desk and tripping the on/off switch of your voltage regulator.

9:32 PM  
Blogger Patricia Camacho said...

What is Unicode exactly about?. Never heard of it.


















thunderbolt drives

1:55 PM  
Blogger keith brown said...

A wonderful blog. Detail information which are amazingly helpful. The instruction, one by one, is very easy to solve and also to remember when it is needed. Thanks a lot. If anybody needs help to get rid of laptop or computer related problems then visit Laptop Repair.

8:45 PM  

Post a Comment

<< Home