Home arrow static arrow Java Programming [Archive] - UTF-8: compiler complains (Eclipse not)
Warning: Creating default object from empty value in /www/htdocs/w008deb8/wiki/components/com_staticxt/staticxt.php on line 51
Java Programming [Archive] - UTF-8: compiler complains (Eclipse not)
This topic has 9 replies on 1 page.

Posts:3,369
Registered: 24.10.97
UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 3:15 AM



 
The origin of my problem is the following method:

	/**     * Convert special character in a text to the HTML escape sequence.     * (e.g. & = &amp;)     * 	 * @param text Text to be converted.	 * @return Input text with converted special characters.     * @see <a href="http://www.w3.org/MarkUp/Guide/Advanced.html">special characters</a>     * @see <a href="http://de.selfhtml.org/html/referenz/zeichen.htm">HTML-Zeichenreferenz</a>	 */    public static String escapeSpecialCharacters(String text) {        String result = text;                if (text != null) {            StringBuffer sb = new StringBuffer(text.length());            StringCharacterIterator sci = new StringCharacterIterator(text);            char c = sci.first();            while (c != CharacterIterator.DONE) {                switch (c) {                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '?' : sb.append("�"); break;                    case '&' : sb.append("&"); break; //Ampersand                    case '"' : sb.append("""); break; //quotation mark                    case '<'>' : sb.append(">"); break; //Greater than                    case '\\' : sb.append("\"); break; //backslash                    case '?' : sb.append("�"); break; //Copyright sign                    case '?' : sb.append("�"); break; //Trademark                    case '?' : sb.append("�"); break; //Registered trademark                    case '?' : sb.append("�"); break; //paragraph                    case '?' : sb.append("�"); break; //currency: EURO                    case '?' : sb.append("�"); break; //currency: cent                    case '?' : sb.append("�"); break; //currency: Pound                    case '$' : sb.append("$"); break; //currency: Dollar                    default: sb.append(c); break;                }                c = sci.next();            }//next character            result = sb.toString();        }                return result;    }//escapeSpecialCharacters()

I added the last 9 cases and compiled it within Eclipse. After compilation i had an error that one case was duplicate. I saw that insetad of the EURO symbol (€) a question mark (?). I replaced it again, compiled ... same miracle replacement! Then i switched in Eclipse the standard encoding from ISO-8859-1 to UTF-8 and voila - it worked. Then i wanted to create a JAR file and compiled the same code with my (Encipse internal) ANT file and then the compiler had 100 errors, one here by example:

[javac] D:\data\iComps\icf\prg\java\src\de\icomps\html\HTML.java:157: unclosed character literal
[javac] case '??' : sb.append("�"); break;
[javac] ^

I changed the encoding in my ANT build file to URF-8:

<?xml version="1.0" encoding="UTF-8"?>

and tried again: same errors ...

So, can I use UTF-8 to compile Java sources? If yes: how?

 

Posts:11,200
Registered: 7/22/99
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 3:24 AM (reply 1 of 9)



 
By default javac assumes the platform's native character encoding, which on your platform is not UTF-8. You can change the encoding with the -encoding parameter, like:

javac -encoding UTF-8 MyClass.java

By the way, for the sake of portability you should not use non-ascii characters in source code exactly because of the problem you've discovered. Instead you should use unicode escapes, like \u20ac for €.
 

Posts:4,000
Registered: 24.02.01
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 3:26 AM (reply 2 of 9)



 
That switch screams for a map, by the way.
 

Posts:3,369
Registered: 24.10.97
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 3:49 AM (reply 3 of 9)



 
Cool, the unicode characters in my code (\u20ac for euro) does the trick! is there an online table of all (most) unicode characters - at least those of the ISO-8859-1 charset?

Or is there a Java method where i can get the unicode string from a character that i type on my keyboard, e.g.

String unicode = x.getUnicodeString("ä"); //should return "\u00e4"

 

Posts:11,200
Registered: 7/22/99
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 4:01 AM (reply 4 of 9)



 
You'll find the code charts for all unicode characters at http://www.unicode.org/charts/
ISO-8859-1 consists of "basic latin" and "latin 1 supplement". I noticed you have the trade mark symbol -- it's in in Letterlike Symbols section.

Or is there a Java method where i can get the unicode string from a
character that i type on my keyboard, e.g.

Integer.toHexString(character) works but there's a better way. The Sun SDK ships with a tool just for converting files or just some text from one encoding to java unicode escapes or reverse: native2ascii.
http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/native2ascii.html

for example, "native2ascii -encoding utf-8 sourcefile destinationfile"
 

Posts:3,369
Registered: 24.10.97
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 4:22 AM (reply 5 of 9)



 
thanks, but the backslash doesn't work:

case '\u005c' : sb.append("\"); break; //backslash (\)


[javac] D:\data\iComps\icf\prg\java\src\de\icomps\html\HTML.java:175: unclosed character literal
[javac] case '\u005c' : sb.append("\"); break; //backslash (\)
[javac] ^

Do i need to escape it via:

case '\u005c\u005c' : sb.append("\"); break; //backslash (\)

?
 

Posts:11,200
Registered: 7/22/99
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 4:26 AM (reply 6 of 9)



 
thanks, but the backslash doesn't work:

This is because the \uXXXX escapes are processed before the program text is given to the compiler, so what the compiler will see is '\' which is illegal. '\u005c\u005c' is fine but why spell out ascii characters, '
' is much clearer :)
 

Posts:3,369
Registered: 24.10.97
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 5:01 AM (reply 7 of 9)



 
Okay, i guess i got it now:

	/**     * Convert special characters in a text to the HTML escape sequence. (e.g. & = &amp;amp;)     * 	 * @param text Text to be converted.	 * @return Input text with converted special characters.     * @see <a href="http://de.selfhtml.org/html/referenz/zeichen.htm">HTML-Zeichenreferenz</a>     * @see <a href="http://www.unicode.org/charts/">The Unicode Standard 4.0</a>	 */    public static String encodeSpecialCharacters(String text) {        String result = text;                if (text != null) {            StringBuffer sb = new StringBuffer(text.length());            StringCharacterIterator sci = new StringCharacterIterator(text);            char c = sci.first();            while (c != CharacterIterator.DONE) {                switch (c) {                    case '&' : sb.append("&amp;"); break; //Ampersand (&)                    case '"' : sb.append("""); break; //quotation mark (")                    case '<'>' : sb.append(">"); break; //Greater than (>)                    case '\\' : sb.append("\"); break; //backslash (\)                    case '\u0024' : sb.append("$"); break; //currency: Dollar ($)                    case '\u00a2' : sb.append("&cent;"); break; //currency: cent (�)                    case '\u00a3' : sb.append("&pound;"); break; //currency: pund (�)                    case '\u00a5' : sb.append("&yen;"); break; //currency: yen (�)                    case '\u00a7' : sb.append("&sect;"); break; //paragraph (�)                    case '\u00a9' : sb.append("&copy;"); break; //Copyright sign (�)                    case '\u00ae' : sb.append("&reg;"); break; //Registered trademark (�)                    case '\u00c4' : sb.append("&Auml;"); break; //�                    case '\u00d6' : sb.append("&Ouml;"); break; //�                    case '\u00dc' : sb.append("&Uuml;"); break; //�                    case '\u00df' : sb.append("&szlig;"); break; //�                    case '\u00e4' : sb.append("ä"); break; //�                    case '\u00f6' : sb.append("&ouml;"); break; //�                    case '\u00fc' : sb.append("&uuml;"); break; //�                    case '\u20a4' : sb.append("&pound;"); break; //currency: lira (₤)                    case '\u20ac' : sb.append("&euro;"); break; //currency: EURO                    case '\u2122' : sb.append("&trade;"); break; //Trademark                    default: sb.append(c); break;                }                c = sci.next();            }//next character            result = sb.toString();        }                return result;    }//encodeSpecialCharacters()


:-)
 

Posts:11,200
Registered: 7/22/99
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 7:55 AM (reply 8 of 9)



 
Cool :) But you do realise there's a straightforward programmatic conversion from the unicode value of the character to the HTML character entity code, right?

I don't know your requirements but to me it seems silly convert only German characters plus certain symbols, and ignore the characters used in other languages...
 

Posts:3,258
Registered: 00-08-28
Re: UTF-8: compiler complains (Eclipse not)  
Jun 17, 2004 7:58 AM (reply 9 of 9)



 
Cool :) But you do realise there's a straightforward
programmatic conversion from the unicode value of the
character to the HTML character entity code, right?

I don't know your requirements but to me it seems
silly convert only German characters plus certain
symbols, and ignore the characters used in other
languages...

Thats exactly what I was wondering. Why to use custom code when you donot need it.
 
This topic has 9 replies on 1 page.