Home arrow static arrow Java Programming [Archive] - how to parse encoding retrieved from the Web
Warning: Creating default object from empty value in /www/htdocs/w008deb8/wiki/components/com_staticxt/staticxt.php on line 51
Java Programming [Archive] - how to parse encoding retrieved from the Web
This topic has 3 replies on 1 page.

Posts:6
Registered: 8/9/04
how to parse encoding retrieved from the Web  
Aug 9, 2004 5:10 AM



 
Hello

I have writen a progrem that sends a request to a Web site and now I need to parse the HTTP response.
in the body.
I am guessing that the page is encoded in UTF-8 as I can see the following tag in the resonse:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

Now, the page contains several text elements that I would like to extract in my Java program. The text is in several languages, like Greek, Russian, Spanish and some in French.
For example, the text: G´┐Żnero which means Gender in Spanish, or the text Γένος which means Gender in Greek.
I have obtained those words and hold a reference to them in objects of type String.
I would like to decode them into ASCII and write them into a text file.

My questions are:
1) How can I know in what encoding the words are stored in my String objects?
2) Once I know their encoding, how can I decode them into ASCII?

You can assume that my problems lie with all the letters with value > 127

Jason
 

Posts:11,200
Registered: 7/22/99
Re: how to parse encoding retrieved from the Web  
Aug 9, 2004 5:25 AM (reply 1 of 3)



 
1) How can I know in what encoding the words are
stored in my String objects?

It's UTF-16, but this doesn't matter unless your input has characters with Unicode code points over 0xFFFF.

2) Once I know their encoding, how can I decode them
into ASCII?
You can't; ASCII does not support any characters with Unicode code points over 0x7F (such as accented or Greek letters).
 

Posts:8,813
Registered: 10/4/00
Re: how to parse encoding retrieved from the Web  
Aug 9, 2004 5:28 AM (reply 2 of 3)



 
I don't think ASCII has codes for Γένος.

There's a reason they call it ASCII
American National Standard Code for Information Interchange
 

Posts:3,258
Registered: 00-08-28
Re: how to parse encoding retrieved from the Web  
Aug 9, 2004 9:11 AM (reply 3 of 3)



 
I don't think ASCII has codes for
Γένος.

There's a reason they call it ASCII
American National Standard Code for Information
Interchange

Yep you are right there. The closest the OP could get to supporting most of the European languages within 7 bits is GSM 7 bit encoding. Used widely for SMPP but I am sure he could use it for his own benefit as well.
 
This topic has 3 replies on 1 page.