Regular Expression to find “lastname, firstname middlename” format

后端 未结 5 1783
挽巷
挽巷 2021-01-18 05:22

I am trying to find the format \"abc, def g\" which is a name format \"lastname, firstname middlename\". I think the best suited method is regex but I do not have any idea i

5条回答
  •  渐次进展
    2021-01-18 06:05

    Your sample input is "lastname, firstname middlename" - with that, you can use the following regexp to extract lastname, firstname and middlename (with the addition that there might be multiple white spaces, and that there might be both capital and non-capital letters in the strings - also, all parts are mandatory):

    String input = "Lastname,   firstname   middlename";
    String regexp = "([A-Za-z]+),\\s+([A-Za-z]+)\\s+([A-Za-z]+)";
    
    Pattern pattern = Pattern.compile(regexp);
    Matcher matcher = pattern.matcher(input);
    matcher.find();
    System.out.println("Lastname  : " + matcher.group(1));
    System.out.println("Firstname : " + matcher.group(2));
    System.out.println("Middlename: " + matcher.group(3));
    

    Short summary:

    ([A-Za-z]+)   First capture group - matches one or more letters to extract the last name
    ,\\s+         Capture group is followed by a comma and one or more spaces
    ([A-Za-z]+)   Second capture group - matches one or more letters to extract the first name
    \\s+          Capture group is followed by one or more spaces
    ([A-Za-z]+)   Third capture group - matches one or more letters to extract the middle name
    

    This only works if your names contain latin letters only - probably you should use a more open match for the characters:

    String input = "Müller,   firstname  middlename";
    String regexp = "(.+),\\s+(.+)\\s+(.+)";
    

    This matches any character for lastname, firstname and middlename.

    If the spaces are optional (only the first occurrence can be optional, otherwise we can not distinguish between firstname and middlename), then use * instead of +:

    String input = "Müller,firstname  middlename";
    String regexp = "(.+),\\s*(.+)\\s+(.+)";
    

    As @Elliott mentions, there might be other possibilities like using String.split() or String.indexOf() with String.substring() - regular expressions are often more flexible, but harder to maintain, especially for complex expressions.

    In either case, implement unit tests with as much different inputs (including invalid ones) as possible so that you can verify that your algorithm is still valid after you modify it.

提交回复
热议问题