Would a regex like this work for these lines of text?

Regex:

String regexp = "([0-9.]{1,15})[ \t]*([0-9]{1,15})[ \t]*([0-9.]{1,15})[ \t]*(\"(.*?)\"\\s+\\((\\d{4})\\)\\s+\\{(.*?)\\})";

Text:

1000000103      50   4.5  #1 Single (2006)
2...1.2.12       8   2.7  $1,000,000 Chance of a Lifetime (1986)
11..2.2..2       8   5.0  $100 Taxi Ride (2001)
....13.311       9   7.1  $100,000 Name That Tune (1984)
3..21...22      10   4.6  $2 Bill (2002)
30010....3      18   2.7  $25 Million Dollar Hoax (2004)
2000010002     111   5.6  $40 a Day (2002)
2000000..4      26   1.6  $5 Cover (2009)
.0..2.0122      15   7.8  $9.99 (2003)
..2...1113       8   7.5  $weepstake$ (1979)
0000000125    3238   8.7   Allo  Allo! (1982)
1....22.12       8   6.5   Allo  Allo! (1982) {A Barrel Full of Airmen (#7.7)

I'm trying to use Java and MySQL together. I'm learning it for a project that I'm planning. I want the desired output to be like this:

distribution = first column
rank = second column
votes = thirst column 
title = fourth column

The first three work fine. I have trouble with the fourth one.

no well there are suppose to be curly brackets this is like the first few entries ill paste a few more it may make it easier to realize what i'm trying to show you. So here they are:

0...001122      16   7.8  "'Allo 'Allo!" (1982) {Gruber Does Some Mincing (#3.2)}
100..01103      21   7.4  "'Allo 'Allo!" (1982) {Hans Goes Over the Top (#4.1)}
....022100      11   6.9  "'Allo 'Allo!" (1982) {Hello Hans (#7.4)}
0....03022      21   8.4  "'Allo 'Allo!" (1982) {Herr Flick's Revenge (#2.6)}
......8..1       6   7.0  "'Allo 'Allo!" (1982) {Hitler's Last Heil (#8.3)}
.....442..       5   6.5  "'Allo 'Allo!" (1982) {Intelligence Officers (#6.5)}
....1123.2       9   6.9  "'Allo 'Allo!" (1982) {It's Raining Italians (#6.2)}
....1.33.3      10   7.8  "'Allo 'Allo!" (1982) {Leclerc Against the Wall (#5.18)}
....22211.       8   6.4  "'Allo 'Allo!" (1982) {Lines of Communication (#7.5)}

The code i'm using:

  stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," +
      "votes integer," + "rank float," + "title char(250));");
  String regexp ="([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";
  Pattern pattern = Pattern.compile(regexp);
  String line;
  String data= "";
  while ((line = bf.readLine()) != null) {
    data = line.replaceAll("'", " ");
    String data2 = data.replaceAll("\"", "");
    //System.out.println(data2);
    Matcher matcher = pattern.matcher(data2);
    if (matcher.find()) {
        String distribution = matcher.group(1);
        String votes = matcher.group(2);
        String rank = matcher.group(3);
        String title = matcher.group(4);
        //System.out.println(distribution + " " + votes + " " + rank + " " + title);
        String todo = ("INSERT into mytable " +
            "(Distribution, Votes, Rank, Title) "+
            "values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"')");
        stmt = con.createStatement();
        int r = stmt.executeUpdate(todo);
    }
  }

twolfe18

/Allo Allo! \(1982\) \{A Barrel Full of Airmen \(\#7\.7\)\}/

Can you use split instead and just have it split on the tabs? Or get the opencsv library and use it.

Perhaps something like

....

String[] temp;
String the_line;
BufferedReader in = new BufferedReader(new FileReader("file.txt")); 

while ((the_line = in.readLine()) != null)
{
    temp = the_line.split("\t");
    ....
}

....

Remember the #1 rule of programming: keep it simple! Why do you really need a regex for the whole thing?

Seems to me that you have a nicely defined tabular format... is it in tsv?

If not, you could read line by line, split based on the spaces for the first 3 columns, then only your last column would need a regexp to parse.

Try this

        BufferedReader reader = new BufferedReader(new FileReader("yourFile"));

        Pattern p = Pattern.compile("([0-9\\.]+)[\\s]+([0-9]+)[\\s]+([0-9]\\.[0-9])[\\s]+([^\\s].*$)");

        String line;
        while( (line = reader.readLine()) != null ) {
            Matcher m = p.matcher(line);
            if ( m.matches() ) {
                 System.out.println(m.group(1));
                 System.out.println(m.group(2));
                 System.out.println(m.group(3));
                 System.out.println(m.group(4));
            }

        }

Assuming the third group is only one digit a . and then only one digit

This is a much simpler regex to do what you want to do

([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)

If you need to cater for the whitespace at the end of the line as well then as \s*

([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)\s*

I just corrected a small mistake of using \S instead of [\d.]

Maybe: [a-zA-Z ]+\!\(\d{4}\) \{[a-zA-Z0-9 \(\)\#\.]+\}

Not sure what you're trying to accomplish so this is a kinda guess...

For better help you have to give better details: Some more example lines, What kind of data this is, do you just want a match or do you want specific capture groups?

No it would not.

[ \t] would have to become [ \t]+ or \s+; your numbers are right-aligned using spaces (in addition to tabs, if any) in the sample input
backslashes must be double-escaped inside string literals

Given that you desire the title result for "'Allo 'Allo" to be Title = Allo Allo! (1982) {Lines of Communication (#7.5)} try:

pattern = "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";

or (simplified like Fadrian suggested):

pattern = "([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";

Read more about Backslashes, escapes, and quoting in the section with that name of the Pattern javadoc page.

Don't use regex to parse text. Regex is intented to match patterns in text, not to parse text in parts/components.

If the text file example in your question is an actual and unchanged example, then the following basic kickoff example of a "parser" should just work (as a bonus, it also instantly executes the needed JDBC code). I've copypasted your data unchanged into c:\test.txt.

public static void main(String... args) throws Exception {
    final String SQL = "INSERT INTO movie (distribution, votes, rank, title) VALUES (?, ?, ?, ?)";
    Connection connection = null;
    PreparedStatement statement = null;
    BufferedReader reader = null;        

    try {
        connection = database.getConnection();
        statement = connection.prepareStatement(SQL);
        reader = new BufferedReader(new InputStreamReader(new FileInputStream("/test.txt")));

        // Loop through file.
        for (String line; (line = reader.readLine()) != null;) {
            if (line.isEmpty()) continue; // I am not sure if those odd empty lines belongs in your file, else this if-check can be removed.

            // Gather data from lines.
            String distribution = line.substring(0, 10);
            int votes = Integer.parseInt(line.substring(12, 18).trim());
            double rank = Double.parseDouble(line.substring(20, 24).trim());
            String title = line.substring(26).trim().replace("\"", ""); // You also want to get rid of those double quotes, huh? I am however not sure why, maybe you initially had problems with it in your non-prepared SQL string...

            // Just to show what you've gathered.
            System.out.printf("%s, %5d, %.1f, %s%n", distribution, votes, rank, title);

            // Now add batch to statement.
            statement.setString(1, distribution);
            statement.setInt(2, votes);
            statement.setDouble(3, rank);
            statement.setString(4, title);
            statement.addBatch();
        }

        // Execute batch insert!
        statement.executeBatch();
    } finally {
        // Gently close expensive resources, you don't want to leak them!
        if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
        if (statement != null) try { statement.close(); } catch (SQLException logOrIgnore) {}
        if (connection != null) try { connection.close(); } catch (SQLException logOrIgnore) {}
    }
}

See, it just works. No need for overcomplicated regex.

来源：https://stackoverflow.com/questions/2360418/would-a-regex-like-this-work-for-these-lines-of-text

标签

java

regex

lines