How do I generate text matching a regular expression from a regular expression? [closed]

别来无恙 提交于 2019-12-18 10:37:09

问题


Yup, you read that right. I needs something that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.

Just a an example: that library would be capable of taking '[ab]*c' as input, and generate samples such as:

abc
abbbc
bac

etc.

Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.


回答1:


I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)

This is the way you use it:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);



回答2:


I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:

  1. Write a parser for regular expressions (you may want to start out with a restricted class of regexes).

  2. Use the result to construct an NFA.

  3. (Optional) Convert the NFA to a DFA.

  4. Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.

The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.




回答3:


Here's a few implementations of such a beast, but none of them in Java (and all but the closed-source Microsoft one very limited in their regexp feature support).




回答4:


based on Wilfred Springer's solution together with http://www.brics.dk/~amoeller/automaton/ i build another generator. It do not use recursion. It take as input the patter/regularExpression minimum String length and maximum String length. The result is an accepted String between min and max length. It also allow some of the XML "short hand character classes". I use this for an XML Sample Generator that build valid String for facets.

public static final String generate(final String pattern, final int minLength, final int maxLength) {
    final String regex = pattern
            .replace("\\d", "[0-9]")        // Used d=Digit
            .replace("\\w", "[A-Za-z0-9_]") // Used d=Word
            .replace("\\s", "[ \t\r\n]");   // Used s="White"Space
    final Automaton automaton = new RegExp(regex).toAutomaton();
    final Random random = new Random(System.nanoTime());
    final List<String> validLength = new LinkedList<>();
    int len = 0;
    final StringBuilder builder = new StringBuilder();
    State state = automaton.getInitialState();
    Transition[] transitions;
    while(len <= maxLength && (transitions = state.getSortedTransitionArray(true)).length != 0) {
        final int option = random.nextInt(transitions.length);
        if (state.isAccept() && len >= minLength && len <= maxLength) validLength.add(builder.toString());
        final Transition t = transitions[option]; // random transition
        builder.append((char) (t.getMin()+random.nextInt(t.getMax()-t.getMin()+1))); len ++;
        state = t.getDest();
    }
    if(validLength.size() == 0) throw new IllegalArgumentException(automaton.toString()+" , "+minLength+" , "+maxLength);
    return validLength.get(random.nextInt(validLength.size()));
}



回答5:


Here is a Python implementation of a module like that: http://www.mail-archive.com/python-list@python.org/msg125198.html It should be portable to Java.



来源:https://stackoverflow.com/questions/1578789/how-do-i-generate-text-matching-a-regular-expression-from-a-regular-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!