Would a regex like this work for these lines of text?

霸气de小男生 提交于 2019-12-09 06:54:25
twolfe18
/Allo Allo! \(1982\) \{A Barrel Full of Airmen \(\#7\.7\)\}/

Can you use split instead and just have it split on the tabs? Or get the opencsv library and use it.

Perhaps something like

....

String[] temp;
String the_line;
BufferedReader in = new BufferedReader(new FileReader("file.txt")); 

while ((the_line = in.readLine()) != null)
{
    temp = the_line.split("\t");
    ....
}

....

Remember the #1 rule of programming: keep it simple! Why do you really need a regex for the whole thing?

Seems to me that you have a nicely defined tabular format... is it in tsv?

If not, you could read line by line, split based on the spaces for the first 3 columns, then only your last column would need a regexp to parse.

Try this

        BufferedReader reader = new BufferedReader(new FileReader("yourFile"));

        Pattern p = Pattern.compile("([0-9\\.]+)[\\s]+([0-9]+)[\\s]+([0-9]\\.[0-9])[\\s]+([^\\s].*$)");

        String line;
        while( (line = reader.readLine()) != null ) {
            Matcher m = p.matcher(line);
            if ( m.matches() ) {
                 System.out.println(m.group(1));
                 System.out.println(m.group(2));
                 System.out.println(m.group(3));
                 System.out.println(m.group(4));
            }

        }

Assuming the third group is only one digit a . and then only one digit

This is a much simpler regex to do what you want to do

([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)

If you need to cater for the whitespace at the end of the line as well then as \s*

([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)\s*

I just corrected a small mistake of using \S instead of [\d.]

Maybe: [a-zA-Z ]+\!\(\d{4}\) \{[a-zA-Z0-9 \(\)\#\.]+\}

Not sure what you're trying to accomplish so this is a kinda guess...

For better help you have to give better details: Some more example lines, What kind of data this is, do you just want a match or do you want specific capture groups?

No it would not.

  1. [ \t] would have to become [ \t]+ or \s+; your numbers are right-aligned using spaces (in addition to tabs, if any) in the sample input
  2. backslashes must be double-escaped inside string literals

Given that you desire the title result for "'Allo 'Allo" to be Title = Allo Allo! (1982) {Lines of Communication (#7.5)} try:

pattern = "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";

or (simplified like Fadrian suggested):

pattern = "([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";

Read more about Backslashes, escapes, and quoting in the section with that name of the Pattern javadoc page.

Don't use regex to parse text. Regex is intented to match patterns in text, not to parse text in parts/components.

If the text file example in your question is an actual and unchanged example, then the following basic kickoff example of a "parser" should just work (as a bonus, it also instantly executes the needed JDBC code). I've copypasted your data unchanged into c:\test.txt.

public static void main(String... args) throws Exception {
    final String SQL = "INSERT INTO movie (distribution, votes, rank, title) VALUES (?, ?, ?, ?)";
    Connection connection = null;
    PreparedStatement statement = null;
    BufferedReader reader = null;        

    try {
        connection = database.getConnection();
        statement = connection.prepareStatement(SQL);
        reader = new BufferedReader(new InputStreamReader(new FileInputStream("/test.txt")));

        // Loop through file.
        for (String line; (line = reader.readLine()) != null;) {
            if (line.isEmpty()) continue; // I am not sure if those odd empty lines belongs in your file, else this if-check can be removed.

            // Gather data from lines.
            String distribution = line.substring(0, 10);
            int votes = Integer.parseInt(line.substring(12, 18).trim());
            double rank = Double.parseDouble(line.substring(20, 24).trim());
            String title = line.substring(26).trim().replace("\"", ""); // You also want to get rid of those double quotes, huh? I am however not sure why, maybe you initially had problems with it in your non-prepared SQL string...

            // Just to show what you've gathered.
            System.out.printf("%s, %5d, %.1f, %s%n", distribution, votes, rank, title);

            // Now add batch to statement.
            statement.setString(1, distribution);
            statement.setInt(2, votes);
            statement.setDouble(3, rank);
            statement.setString(4, title);
            statement.addBatch();
        }

        // Execute batch insert!
        statement.executeBatch();
    } finally {
        // Gently close expensive resources, you don't want to leak them!
        if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
        if (statement != null) try { statement.close(); } catch (SQLException logOrIgnore) {}
        if (connection != null) try { connection.close(); } catch (SQLException logOrIgnore) {}
    }
}

See, it just works. No need for overcomplicated regex.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!