(C#) Improving speed of custom getBetweenAll

一世执手 提交于 2019-12-05 13:53:53

As the original GetBetweenAll, we can use a regular expression. To match only the shortest "inner" appearances of the enclosing strings, we have to use a negative lookahead on the start string and a non-greedy quantifier for the content.

public static string[] getBetweenAll(this string main, 
    string strstart, string strend, bool preserve = false)
{
    List<string> results = new List<string>();

    string regularExpressionString = string.Format("{0}(((?!{0}).)+?){1}", 
        Regex.Escape(strstart), Regex.Escape(strend));
    Regex regularExpression = new Regex(regularExpressionString, RegexOptions.IgnoreCase);

    var matches = regularExpression.Matches(main);

    foreach (Match match in matches)
    {
        if (preserve)
        {
            results.Add(match.Value);
        }
        else
        {
            results.Add(match.Groups[1].Value);
        }
    }

    return results.ToArray();
}

Do it using stack. As soon as you see opening token, start adding characters to the stack. As soon as you see closing token - pop everything from your stack and it will be your characters of interest.

Now once you have base case implemented, you can improve it to work using recursion. If you see another opening token before closing token - start collecting characters to new stack until you see a closing token.

This will give you complexity of O(N) since you need to pass all content only once.

You will also need to handle the case if you see closing token before the opening token, but it is not clear from your question what program should do then.

I've found this does what I want, but in another way! A function that does PreviousIndexOf(string source, string token, int offset) would still be greatly appreciated for other stuff!

public static List<string> GetBetweenAll(this string main, string start, string finish, bool preserve = false,  int index = 0)
{
    List<string> matches = new List<string>();
    Match gbMatch = new Regex(Regex.Escape(start) + "(.+?)" + Regex.Escape(finish)).Match(main, index);
    while (gbMatch.Success)
    {
        matches.Add((preserve ? start : string.Empty) + gbMatch.Groups[1].Value + (preserve ? finish : string.Empty));
        gbMatch = gbMatch.NextMatch();
    }
    return matches;
}
public static string[] getBetweenAllBackwards(this string main, string strstart, string strend, bool preserve = false)
{
    List<string> all = Reverse(main).GetBetweenAll(Reverse(strend), Reverse(strstart), preserve);
    for (int i = 0; i < all.Count; i++)
    {
        all[i] = Reverse(all[i]);
    }
    return all.ToArray();
}
public static string Reverse(string s)
{
    char[] charArray = s.ToCharArray();
    Array.Reverse(charArray);
    return new string(charArray);
}

I wrote a simple method which is four times faster than yours (but without the preserve argument till now):

public static string[] getBetweenAll2(this string main, string strstart, string strend, bool preserve = false)
{
    List<string> results = new List<string>();

    int lenStart = strstart.Length;

    int indexStart = 0;
    while (true)
    {
        indexStart = main.IndexOf(strstart, indexStart);
        if (indexStart < 0)
            break;

        int indexEnd = main.IndexOf(strend, indexStart);

        if (indexEnd < 0)
            break;

        results.Add(main.Substring(indexStart+ lenStart, indexEnd- indexStart- lenStart));
        indexStart = indexEnd;
    }
    return results.ToArray();
}

This gives you the numbers 1, 2, 3 and 4 out of your string <1><2><3><4>

Does this do what you want?

[Edit]

Finds nested things:

public static string[] getBetweenAll2(this string main, string strstart, string strend, bool preserve = false)
{
    List<string> results = new List<string>();

    int lenStart = strstart.Length; 
    int lenEnd = strend.Length;

    int index = 0;

    Stack<int> starPos = new Stack<int>();

    while (true)
    {
        int indexStart = main.IndexOf(strstart, index);
        int indexEnd = main.IndexOf(strend, index);

        if (indexStart != -1 && indexStart < indexEnd)
        {
            index = indexStart + lenStart;
            starPos.Push(index);
        }
        else if (indexEnd != -1 && (indexEnd < indexStart || indexStart == -1))
        {
            if (starPos.Count == 1)
            {
                int startOfInterst = starPos.Pop();
                results.Add(main.Substring(startOfInterst, indexEnd - startOfInterst));
            } else if(starPos.Count>0)
            {
                starPos.Pop();
            }
            index = indexEnd + lenEnd;
        }
        else
        {
            break;
        }
    }
    return results.ToArray();
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!