txt file format validation in java

后端 未结 1 817
说谎
说谎 2021-01-06 09:45

What is the best way to validate whether a .txt file is:

  • In fact a .txt file and not another type of file with only the extension changed.

  • T

相关标签:
1条回答
  • 2021-01-06 10:37

    As it sounds like you're looking for a general sort of formatting option, could I recommend regular expressions to you? You can do all sorts of different kinds of matching using regex. I've written a simple example below [for all those regex experts out there, have mercy on me if I didn't use the perfect expression ;) ]. You could put the REGEX and MAX_LINES_TO_READ constants into a properties file and modify that to make it even more generalized.

    You would basically test your ".txt" file for a maximum number of lines (however many lines are needed to establish the formatting is good - you could also use regular expressions for a header line or do multiple different regular expressions as needed to test the formatting) and if all those lines matched, the file would be flagged as "valid".

    This is just an example for you to possibly run with. You should implement proper exception handling other than just catching "Exception" for one.

    For testing your regular expressions in Java, http://www.regexplanet.com/simple/index.html works very nice.

    Here's the "ValidateTxtFile" source...

    import java.io.*;
    
    public class ValidateTxtFile {
    
        private final int MAX_LINES_TO_READ = 5;
    
        private final String REGEX = ".{15}[ ]{5}.{15}[ ]{5}[-]\\d{2}\\.\\d{2}[ ]{9}\\d{2}/\\d{2}/\\d{4}";
    
        public void testFile(String fileName) {
    
            int lineCounter = 1;
    
            try {
    
                BufferedReader br = new BufferedReader(new FileReader(fileName));
    
                String line = br.readLine();
    
                while ((line != null) && (lineCounter <= MAX_LINES_TO_READ)) {
    
                    // Validate the line is formatted correctly based on regular expressions                
                    if (line.matches(REGEX)) {
                        System.out.println("Line " + lineCounter + " formatted correctly");
                    }
                    else {
                        System.out.println("Invalid format on line " + lineCounter + " (" + line + ")");
                    }
    
                    line = br.readLine();
                    lineCounter++;
                }
    
            } catch (Exception ex) {
                System.out.println("Exception occurred: " + ex.toString());
            }
        }
    
        public static void main(String args[]) {
    
            ValidateTxtFile vtf = new ValidateTxtFile();
    
            vtf.testFile("transactions.txt");
        }   
    }
    

    Here's what's in "transactions.txt"...

    Electric            Electric Co.        -50.99         12/28/2011
    Food                Food Store          -80.31         12/28/2011
    Clothes             Clothing Store      -99.36         12/28/2011
    Entertainment       Bowling             -30.4393       12/28/2011
    Restaurant          Mcdonalds           -10.35         12/28/11
    

    The output when I ran the app was...

    Line 1 formatted correctly
    Line 2 formatted correctly
    Line 3 formatted correctly
    Invalid format on line 4 (Entertainment       Bowling             -30.4393       12/28/2011)
    Invalid format on line 5 (Restaurant          Mcdonalds           -10.35         12/28/11)
    


    EDIT 12/29/2011 about 10:00am
    Not sure if there is a performance concern on this or not, but just as an FYI I duplicated the entries in "transactions.txt" several times to build a text file with about 1.3 million rows in it and I was able to get through the whole file in about 7 seconds on my PC. I changed the System.out's to just show a grand total count at the end of invalid (524,288) and valid (786,432) formatted entries. "transactions.txt" was about 85mb in size.

    0 讨论(0)
提交回复
热议问题