Home arrow static arrow Java Programming [Archive] - split text file
Warning: Creating default object from empty value in /www/htdocs/w008deb8/wiki/components/com_staticxt/staticxt.php on line 51
Java Programming [Archive] - split text file
This topic has 10 replies on 1 page.

Posts:6
Registered: 6/22/04
split text file  
Jun 22, 2004 12:52 PM



 
Hi, I have this very bit .txt file with about 22,000 lines. I have searched the forum and the only one i could find was a file splitter that split files according to how many lines the user wanted in each of the files that were split. Unfortunately I am trying to figure out how to split this large text file so that all the contents would get distributed within within a certain number of files. I really don't care how many lines are in each file as long as it gets evenly distributed among the other files..i.e. If i wanted my 1 large txt file split into 5 smaller txt files, it would distribute the contents evenly to each of the 5 files.

would i have to readLine the large txt file, store everything in separate arrays, and write those arrays to each txt file? this doesn't seem like an efficient way. is there an easier way to do this? all suggestions welcomed. thanks in advance.
 

Posts:121
Registered: 12/16/03
Re: split text file  
Jun 22, 2004 1:15 PM (reply 1 of 10)



 
This is how i would do it.

I would load all the data into memory.
Count how much data there is.

Then divid the amount of lines of data counted into 5 but you must convert this to the next number so 12.1 would become 13 otherwise you would loose data.

Then write the data using loops. So the first loop will write to one file, then the second loop would write to the next

Sorry that i couldnt give you any code.
 

Posts:50
Registered: 12/9/97
Re: split text file - low overhead solution  
Jun 22, 2004 1:16 PM (reply 2 of 10)



 
Here's a solution in which NO arrays being kept in memory - the only thing in memory is the current line.

Let's not think about the total number of lines - it is not relevant.

For however many files you want - say 5 - open those 5 file handles before you begin looping through your source (input) file.

Put each of those 5 file handles in an Array. Then as you loop through the source file, keep a counter. Each time you read a line from the main source file, write it out to the file at the index of the array, and round-robin through each of those filehandles. Then, your lines will be evenly distributed regardless of how many lines you get - it is all determined by a variable you can pass into the program/method that dictates how many output files into which to split the input.

===========
Pseudocode:
===========

// This can be passed in - however many output files to which the source is split
numOutputFiles = 5;

// Create the same number of filehandles as numOutputFiles specifies
open file1, file2, file3, file4, file5;

// Create an Array to hold the references to those filehandles
Array[5] fileHandles = {file1, file2, file3, file5, file5};

// Initialize a loop counter
int loopCounter = 0;

// a temporary holder to "point" the output stream to the correct output file
File currentOutputFile = null;

// loop through your input file

while (sourceFile.nextLine != null)
{
loopCounter++;

if (loopCounter == (numOutputFiles +1) // you've reached 5, loop back
{
loopCounter = 1;
}

currentOutputFile = fileHandles[loopCounter]; // gets the output file at that index of the array

currentOutputFile.write(sourceFile.nextLine);
}

// don't forget to close all file handles

Hope this helps!

Hi, I have this very bit .txt file with about 22,000
lines. I have searched the forum and the only one i
could find was a file splitter that split files
according to how many lines the user wanted in each of
the files that were split. Unfortunately I am trying
to figure out how to split this large text file so
that all the contents would get distributed within
within a certain number of files. I really don't care
how many lines are in each file as long as it gets
evenly distributed among the other files..i.e. If i
wanted my 1 large txt file split into 5 smaller txt
files, it would distribute the contents evenly to each
of the 5 files.

would i have to readLine the large txt file, store
everything in separate arrays, and write those arrays
to each txt file? this doesn't seem like an efficient
way. is there an easier way to do this? all
suggestions welcomed. thanks in advance.
 

Posts:50
Registered: 12/9/97
Re: split text file - low overhead solution  
Jun 22, 2004 1:17 PM (reply 3 of 10)



 
OK, 1 very low-footprint array is kept in memory. ;-) 5 filehandles is much lower footprint than all of the file data!

Cheers,
Alexandra
 

Posts:121
Registered: 12/16/03
Re: split text file - low overhead solution  
Jun 22, 2004 1:19 PM (reply 4 of 10)



 
alexxandra solution is far better then mine!
 

Posts:6
Registered: 6/22/04
Re: split text file - low overhead solution  
Jun 23, 2004 7:40 AM (reply 5 of 10)



 
thanks alexxandra. i gave u 2 duke dollars.

this is what i've done. I thought I had it but I get 1 error. I marked with an arrow where the problem is. I've been messing with it for about 30 minutes and I can't seem to get it to work.

try {
BufferedReader in = new BufferedReader(new FileReader("c:/test/mac.txt"));

final int numOutputFiles = 5;
int loopCounter=0;
Object currentOutputFile = null;
String s = in.readLine();

PrintWriter out0 = new PrintWriter(new FileOutputStream("c:/test/Mac0.txt"));
PrintWriter out1 = new PrintWriter(new FileOutputStream("c:/test/Mac1.txt"));
PrintWriter out2 = new PrintWriter(new FileOutputStream("c:/test/Mac2.txt"));
PrintWriter out3 = new PrintWriter(new FileOutputStream("c:/test/Mac3.txt"));
PrintWriter out4 = new PrintWriter(new FileOutputStream("c:/test/Mac4.txt"));

Object[] fileHandles = {out0, out1, out2, out3, out4};

while (s != null)
{
loopCounter++;

if (loopCounter == numOutputFiles)
loopCounter=1;

currentOutputFile = fileHandles[loopCounter-1];
-> currentOutputFile.write(s); <-
}

out0.close();
out1.close();
out2.close();
out3.close();
out4.close();

}
catch (Exception ex) { ex.printStackTrace(); }

i get an error saying that the "write" method isn't found in the java.lang.object class. I don't know what else to use or how to go about fixing this.

 

Posts:945
Registered: 3/13/02
Re: split text file - low overhead solution  
Jun 23, 2004 7:51 AM (reply 6 of 10)



 
Object[] fileHandles = {out0, out1, out2, out3, out4}; // no // make itPrintWriter [] fileHandles = {out0, out1, out2, out3, out4}; // yes... while (s != null) {...   fileHandles[loopCounter-1].printLine(s);} 
 

Posts:6
Registered: 6/22/04
Re: split text file - low overhead solution  
Jun 23, 2004 8:58 AM (reply 7 of 10)



 
^ I changed everything as u told me to but i still get the error.

cannont resolve symbol method printLine (java.lang.String)

that's the only error that comes up. any suggestions on how to fix this.
 

Posts:945
Registered: 3/13/02
Re: split text file - low overhead solution  
Jun 23, 2004 10:15 AM (reply 8 of 10)



 
I'm sorry, it's
println(s);
not
printLine(s)
 

Posts:945
Registered: 3/13/02
Re: split text file - low overhead solution  
Jun 23, 2004 10:31 AM (reply 9 of 10)



 
You'd be well served to look at the APIs for direction. They are quite helpful.
 

Posts:6
Registered: 6/22/04
Re: split text file - low overhead solution  
Jun 23, 2004 10:37 AM (reply 10 of 10)



 
thanks. everyone. it works great. you have all saved me atleast a week of headache.
 
This topic has 10 replies on 1 page.