Home arrow static arrow Java Programming [Archive] - CharBuffer corrupting my text file
Warning: Creating default object from empty value in /www/htdocs/w008deb8/wiki/components/com_staticxt/staticxt.php on line 51
Java Programming [Archive] - CharBuffer corrupting my text file
This topic has 7 replies on 1 page.

Posts:40
Registered: 5/27/04
CharBuffer corrupting my text file  
Jun 18, 2004 6:12 AM



 
Hi all,

The code below appears to be fine and compiles with no errors, but occasionally it will corrupt the text file it is reading from. Any ideas why?

thanks,

David.

private void showTitleTag(String file) {	FileInputStream fis = null;	try {	  fis = new FileInputStream(file);		  FileChannel fc = fis.getChannel();	  ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());	  CharBuffer cb = Charset.forName("8859_1").newDecoder().decode(bb);	  Matcher m = Pattern.compile("<title\\s*>(.*?)</title\\s*>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL).matcher(cb);		  if (m.find()) {	     String title1 = (m.group(1)).trim().replace('\n', ' ');	     String title = title1.replace('\r', ' ');	     System.out.println(title1);	     titlesList.addItem(title1);	     titlesList.setSelectedIndex(0);			     	  }	  else	     System.out.println("No title found.");	}	catch (Exception e) {	  e.printStackTrace();	}	finally {	  try { fis.close(); }	  catch (Exception e) {}	}} // end showTitleTag
 

Posts:13,769
Registered: 00-11-29
Re: CharBuffer corrupting my text file  
Jun 18, 2004 6:20 AM (reply 1 of 7)



 
In what way? Are you using the correct encoding?
 

Posts:40
Registered: 5/27/04
Re: CharBuffer corrupting my text file  
Jun 18, 2004 6:30 AM (reply 2 of 7)



 
To be honest I'm not sure what encoding is being used. The code was suggested by someone on the forums, it was new to me at the time.

When I run the program a few times over part of the code has dissappeared and a few lines of black blocks appear at the the end of the remaining html code.

I'm using the code below just now, to allow me to work on the rest of the application. It's way too basic of course.

David

public String getTitleTag(String fname) {     String line;     BufferedReader in = nulltry {          in = new BufferedReader(new FileReader(fname));     } catch (Exception e) {          System.out.println("Failed to open " + fname);     }     try {          while ((line = in.readLine()) != null) {               if (line.indexOf("<title>") >= 0) {                    try {                         in.close();                    }                     catch (Exception e) {                     }                // trim whitespace and remove carriage return and newline                line = line.trim().replace('\n',' ').replace('\r',' ');                return line;               }          }     }      catch (Exception e) {          System.out.println("Trouble reading file " + fname);     }     try {          in.close();      }       catch (Exception e) {      }      return "NA";}
 

Posts:13,769
Registered: 00-11-29
Re: CharBuffer corrupting my text file  
Jun 18, 2004 7:13 AM (reply 3 of 7)



 
What kind of characters are in this HTML? You are trying to read it in with ISO-8859-1, Latin-1 encoding. Does that make sense to you?
 

Posts:40
Registered: 5/27/04
Re: CharBuffer corrupting my text file  
Jun 18, 2004 7:28 AM (reply 4 of 7)



 
The html file is standard english the format is run of the mill.

As for encoding, the only line which I think refers to that is:
CharBuffer cb = Charset.forName("8859_1").newDecoder().decode(bb);


There is no other reference to encoding in my application. I have updated the previous code to the following, I think it will do the job for the majority of html fles.

	public String getTitleTag(String fname) {		String line;		BufferedReader in = nulltry {			in = new BufferedReader(new FileReader(fname));		} catch (Exception e) {			System.out.println("Failed to open " + fname);		} 		try {			while ((line = in.readLine()) != null) {				if ((line.indexOf("<title>") >= 0)					|| (line.indexOf("<TITLE>") >= 0)) {					if ((line.indexOf("</title>") >= 0)						|| (line.indexOf("</TITLE>") >= 0)) {						try {							in.close();						} catch (Exception e) {						}						// trim whitespace and remove carriage return and newline						line =							line.trim().replace('\n', ' ').replace('\r', ' ');						line = line.substring( line.indexOf("<title>"), line.indexOf("</title>") );						line =							line								.replaceAll("<title>", "")								.replaceAll("</title>", "")								.replaceAll("<TITLE>", "")								.replaceAll("</TITLE>", ""); 						return line;					} // end if				}			}		} catch (Exception e) {			System.out.println("Trouble reading file " + fname);		} 		try {			in.close();		} catch (Exception e) {		}		return "No page available";	}
 

Posts:13,769
Registered: 00-11-29
Re: CharBuffer corrupting my text file  
Jun 18, 2004 8:20 AM (reply 5 of 7)



 
The html file is standard english the format is run of
the mill.

Then it's unlikely you'll need that encoding. But I could be wrong. Perhaps HTML has special characters?

As for encoding, the only line which I think refers to
that is:
CharBuffer cb =Charset.forName("8859_1").newDecoder().decode(bb);


There is no other reference to encoding in my
application. I have updated the previous code to the
following, I think it will do the job for the majority
of html fles.

Are you suggesting that beause it's only used once, that it can't be the problem? Coding isn't a fuzzy thing. IIt's not like the number of times you do something determines whether it happens. If you write a line of code to do something, it does it when that line is executed.

You are currently decoding the input using the Latin-1 encoding. It there is something in that file that doesn't fit that encoding, it will produce garbage characters.
 

Posts:50
Registered: 12/9/97
parse out special chars  
Jun 18, 2004 8:23 AM (reply 6 of 7)



 
2 things:

1) make sure to use the correct encoding
2) parse out any potential special characters for that encoding. special characters will corrupt when not properly accounted for.

--alexandra
 

Posts:40
Registered: 5/27/04
Re: parse out special chars  
Jun 21, 2004 5:30 AM (reply 7 of 7)



 
Thanks for the feedback,

You've helped me understand the encoding. I'll do some research into the options I available.

Thanks again,

David.
 
This topic has 7 replies on 1 page.