What is the best way to validate whether a .txt file is:
In fact a .txt file and not another type of file with only the extension changed.
T
As it sounds like you're looking for a general sort of formatting option, could I recommend regular expressions to you? You can do all sorts of different kinds of matching using regex. I've written a simple example below [for all those regex experts out there, have mercy on me if I didn't use the perfect expression ;) ]. You could put the REGEX and MAX_LINES_TO_READ constants into a properties file and modify that to make it even more generalized.
You would basically test your ".txt" file for a maximum number of lines (however many lines are needed to establish the formatting is good - you could also use regular expressions for a header line or do multiple different regular expressions as needed to test the formatting) and if all those lines matched, the file would be flagged as "valid".
This is just an example for you to possibly run with. You should implement proper exception handling other than just catching "Exception" for one.
For testing your regular expressions in Java, http://www.regexplanet.com/simple/index.html works very nice.
Here's the "ValidateTxtFile" source...
import java.io.*;
public class ValidateTxtFile {
private final int MAX_LINES_TO_READ = 5;
private final String REGEX = ".{15}[ ]{5}.{15}[ ]{5}[-]\\d{2}\\.\\d{2}[ ]{9}\\d{2}/\\d{2}/\\d{4}";
public void testFile(String fileName) {
int lineCounter = 1;
try {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = br.readLine();
while ((line != null) && (lineCounter <= MAX_LINES_TO_READ)) {
// Validate the line is formatted correctly based on regular expressions
if (line.matches(REGEX)) {
System.out.println("Line " + lineCounter + " formatted correctly");
}
else {
System.out.println("Invalid format on line " + lineCounter + " (" + line + ")");
}
line = br.readLine();
lineCounter++;
}
} catch (Exception ex) {
System.out.println("Exception occurred: " + ex.toString());
}
}
public static void main(String args[]) {
ValidateTxtFile vtf = new ValidateTxtFile();
vtf.testFile("transactions.txt");
}
}
Here's what's in "transactions.txt"...
Electric Electric Co. -50.99 12/28/2011
Food Food Store -80.31 12/28/2011
Clothes Clothing Store -99.36 12/28/2011
Entertainment Bowling -30.4393 12/28/2011
Restaurant Mcdonalds -10.35 12/28/11
The output when I ran the app was...
Line 1 formatted correctly
Line 2 formatted correctly
Line 3 formatted correctly
Invalid format on line 4 (Entertainment Bowling -30.4393 12/28/2011)
Invalid format on line 5 (Restaurant Mcdonalds -10.35 12/28/11)
EDIT 12/29/2011 about 10:00am
Not sure if there is a performance concern on this or not, but just as an FYI I duplicated the entries in "transactions.txt" several times to build a text file with about 1.3 million rows in it and I was able to get through the whole file in about 7 seconds on my PC. I changed the System.out's to just show a grand total count at the end of invalid (524,288) and valid (786,432) formatted entries. "transactions.txt" was about 85mb in size.