问题
Regex:
String regexp = "([0-9.]{1,15})[ \t]*([0-9]{1,15})[ \t]*([0-9.]{1,15})[ \t]*(\"(.*?)\"\\s+\\((\\d{4})\\)\\s+\\{(.*?)\\})";
Text:
1000000103 50 4.5 #1 Single (2006) 2...1.2.12 8 2.7 $1,000,000 Chance of a Lifetime (1986) 11..2.2..2 8 5.0 $100 Taxi Ride (2001) ....13.311 9 7.1 $100,000 Name That Tune (1984) 3..21...22 10 4.6 $2 Bill (2002) 30010....3 18 2.7 $25 Million Dollar Hoax (2004) 2000010002 111 5.6 $40 a Day (2002) 2000000..4 26 1.6 $5 Cover (2009) .0..2.0122 15 7.8 $9.99 (2003) ..2...1113 8 7.5 $weepstake$ (1979) 0000000125 3238 8.7 Allo Allo! (1982) 1....22.12 8 6.5 Allo Allo! (1982) {A Barrel Full of Airmen (#7.7)
I'm trying to use Java and MySQL together. I'm learning it for a project that I'm planning. I want the desired output to be like this:
distribution = first column
rank = second column
votes = thirst column
title = fourth column
The first three work fine. I have trouble with the fourth one.
no well there are suppose to be curly brackets this is like the first few entries ill paste a few more it may make it easier to realize what i'm trying to show you. So here they are:
0...001122 16 7.8 "'Allo 'Allo!" (1982) {Gruber Does Some Mincing (#3.2)} 100..01103 21 7.4 "'Allo 'Allo!" (1982) {Hans Goes Over the Top (#4.1)} ....022100 11 6.9 "'Allo 'Allo!" (1982) {Hello Hans (#7.4)} 0....03022 21 8.4 "'Allo 'Allo!" (1982) {Herr Flick's Revenge (#2.6)} ......8..1 6 7.0 "'Allo 'Allo!" (1982) {Hitler's Last Heil (#8.3)} .....442.. 5 6.5 "'Allo 'Allo!" (1982) {Intelligence Officers (#6.5)} ....1123.2 9 6.9 "'Allo 'Allo!" (1982) {It's Raining Italians (#6.2)} ....1.33.3 10 7.8 "'Allo 'Allo!" (1982) {Leclerc Against the Wall (#5.18)} ....22211. 8 6.4 "'Allo 'Allo!" (1982) {Lines of Communication (#7.5)}
The code i'm using:
stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," +
"votes integer," + "rank float," + "title char(250));");
String regexp ="([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";
Pattern pattern = Pattern.compile(regexp);
String line;
String data= "";
while ((line = bf.readLine()) != null) {
data = line.replaceAll("'", " ");
String data2 = data.replaceAll("\"", "");
//System.out.println(data2);
Matcher matcher = pattern.matcher(data2);
if (matcher.find()) {
String distribution = matcher.group(1);
String votes = matcher.group(2);
String rank = matcher.group(3);
String title = matcher.group(4);
//System.out.println(distribution + " " + votes + " " + rank + " " + title);
String todo = ("INSERT into mytable " +
"(Distribution, Votes, Rank, Title) "+
"values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"')");
stmt = con.createStatement();
int r = stmt.executeUpdate(todo);
}
}
回答1:
/Allo Allo! \(1982\) \{A Barrel Full of Airmen \(\#7\.7\)\}/
回答2:
Can you use split instead and just have it split on the tabs? Or get the opencsv library and use it.
Perhaps something like
....
String[] temp;
String the_line;
BufferedReader in = new BufferedReader(new FileReader("file.txt"));
while ((the_line = in.readLine()) != null)
{
temp = the_line.split("\t");
....
}
....
回答3:
Remember the #1 rule of programming: keep it simple! Why do you really need a regex for the whole thing?
Seems to me that you have a nicely defined tabular format... is it in tsv?
If not, you could read line by line, split based on the spaces for the first 3 columns, then only your last column would need a regexp to parse.
回答4:
Try this
BufferedReader reader = new BufferedReader(new FileReader("yourFile"));
Pattern p = Pattern.compile("([0-9\\.]+)[\\s]+([0-9]+)[\\s]+([0-9]\\.[0-9])[\\s]+([^\\s].*$)");
String line;
while( (line = reader.readLine()) != null ) {
Matcher m = p.matcher(line);
if ( m.matches() ) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
System.out.println(m.group(4));
}
}
Assuming the third group is only one digit a . and then only one digit
回答5:
This is a much simpler regex to do what you want to do
([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)
If you need to cater for the whitespace at the end of the line as well then as \s*
([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)\s*
I just corrected a small mistake of using \S instead of [\d.]
回答6:
Maybe:
[a-zA-Z ]+\!\(\d{4}\) \{[a-zA-Z0-9 \(\)\#\.]+\}
Not sure what you're trying to accomplish so this is a kinda guess...
For better help you have to give better details: Some more example lines, What kind of data this is, do you just want a match or do you want specific capture groups?
回答7:
No it would not.
[ \t]
would have to become[ \t]+
or\s+
; your numbers are right-aligned using spaces (in addition to tabs, if any) in the sample input- backslashes must be double-escaped inside string literals
Given that you desire the title result for "'Allo 'Allo"
to be Title = Allo Allo! (1982) {Lines of Communication (#7.5)}
try:
pattern = "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";
or (simplified like Fadrian suggested):
pattern = "([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";
Read more about Backslashes, escapes, and quoting in the section with that name of the Pattern
javadoc page.
回答8:
Don't use regex to parse text. Regex is intented to match patterns in text, not to parse text in parts/components.
If the text file example in your question is an actual and unchanged example, then the following basic kickoff example of a "parser" should just work (as a bonus, it also instantly executes the needed JDBC code). I've copypasted your data unchanged into c:\test.txt
.
public static void main(String... args) throws Exception {
final String SQL = "INSERT INTO movie (distribution, votes, rank, title) VALUES (?, ?, ?, ?)";
Connection connection = null;
PreparedStatement statement = null;
BufferedReader reader = null;
try {
connection = database.getConnection();
statement = connection.prepareStatement(SQL);
reader = new BufferedReader(new InputStreamReader(new FileInputStream("/test.txt")));
// Loop through file.
for (String line; (line = reader.readLine()) != null;) {
if (line.isEmpty()) continue; // I am not sure if those odd empty lines belongs in your file, else this if-check can be removed.
// Gather data from lines.
String distribution = line.substring(0, 10);
int votes = Integer.parseInt(line.substring(12, 18).trim());
double rank = Double.parseDouble(line.substring(20, 24).trim());
String title = line.substring(26).trim().replace("\"", ""); // You also want to get rid of those double quotes, huh? I am however not sure why, maybe you initially had problems with it in your non-prepared SQL string...
// Just to show what you've gathered.
System.out.printf("%s, %5d, %.1f, %s%n", distribution, votes, rank, title);
// Now add batch to statement.
statement.setString(1, distribution);
statement.setInt(2, votes);
statement.setDouble(3, rank);
statement.setString(4, title);
statement.addBatch();
}
// Execute batch insert!
statement.executeBatch();
} finally {
// Gently close expensive resources, you don't want to leak them!
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
if (statement != null) try { statement.close(); } catch (SQLException logOrIgnore) {}
if (connection != null) try { connection.close(); } catch (SQLException logOrIgnore) {}
}
}
See, it just works. No need for overcomplicated regex.
来源:https://stackoverflow.com/questions/2360418/would-a-regex-like-this-work-for-these-lines-of-text