I've written a custom extension method in c# that is an improvement of the extensionmethod string[] getBetweenAll(string source, string startstring, string endstring);
Originally this extensionmethod found all substrings between two strings, for example:
string source = "<1><2><3><4>";
source.getBetweenAll("<", ">");
//output: string[] {"1", "2", "3", "4"}
But if you had another occurrence of < in the beginning it would just get between that and the whole string
string source = "<<1><2><3><4>";
source.getBetweenAll("<", ">");
//output: string[] {"<1><2><3><4"}
So I re-wrote it to be more exact and search backwards from ">" to find the first occurrence of "<"
Now I got it working, but the problem here is that it is way too slow because the search method skips back every character of the whole string for each occurrence. Do you know how I could improve the speed of this function? Or is it not possible?
Here is the entire code so far http://pastebin.com/JEZmyfSG I've added comments where the code needs speed improvement
public static List<int> IndexOfAll(this string main, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -len;
while (true)
{
start = main.IndexOf(searchString, start + len);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}
public static string[] getBetweenAll(this string main, string strstart, string strend, bool preserve = false)
{
List<string> results = new List<string>();
List<int> ends = main.IndexOfAll(strend);
foreach (int end in ends)
{
int start = main.previousIndexOf(strstart, end); //This is where it has to search the whole source string every time
results.Add(main.Substring(start, end - start) + (preserve ? strend : string.Empty));
}
return results.ToArray();
}
//This is the slow function (depends on main.Length)
public static int previousIndexOf(this string main, string find, int offset)
{
int wtf = main.Length ;
int x = main.LastIndexOf(find, wtf);
while (x > offset)
{
x = main.LastIndexOf(find, wtf);
wtf -= 1;
}
return x;
}
I suppose another way of doing PreviousIndexOf(string, int searchfrom); would improve the speeds.. Like IndexOf() except backwards and with supplied start offset
As the original GetBetweenAll, we can use a regular expression. To match only the shortest "inner" appearances of the enclosing strings, we have to use a negative lookahead on the start string and a non-greedy quantifier for the content.
public static string[] getBetweenAll(this string main,
string strstart, string strend, bool preserve = false)
{
List<string> results = new List<string>();
string regularExpressionString = string.Format("{0}(((?!{0}).)+?){1}",
Regex.Escape(strstart), Regex.Escape(strend));
Regex regularExpression = new Regex(regularExpressionString, RegexOptions.IgnoreCase);
var matches = regularExpression.Matches(main);
foreach (Match match in matches)
{
if (preserve)
{
results.Add(match.Value);
}
else
{
results.Add(match.Groups[1].Value);
}
}
return results.ToArray();
}
Do it using stack. As soon as you see opening token, start adding characters to the stack. As soon as you see closing token - pop everything from your stack and it will be your characters of interest.
Now once you have base case implemented, you can improve it to work using recursion. If you see another opening token before closing token - start collecting characters to new stack until you see a closing token.
This will give you complexity of O(N) since you need to pass all content only once.
You will also need to handle the case if you see closing token before the opening token, but it is not clear from your question what program should do then.
I've found this does what I want, but in another way! A function that does PreviousIndexOf(string source, string token, int offset) would still be greatly appreciated for other stuff!
public static List<string> GetBetweenAll(this string main, string start, string finish, bool preserve = false, int index = 0)
{
List<string> matches = new List<string>();
Match gbMatch = new Regex(Regex.Escape(start) + "(.+?)" + Regex.Escape(finish)).Match(main, index);
while (gbMatch.Success)
{
matches.Add((preserve ? start : string.Empty) + gbMatch.Groups[1].Value + (preserve ? finish : string.Empty));
gbMatch = gbMatch.NextMatch();
}
return matches;
}
public static string[] getBetweenAllBackwards(this string main, string strstart, string strend, bool preserve = false)
{
List<string> all = Reverse(main).GetBetweenAll(Reverse(strend), Reverse(strstart), preserve);
for (int i = 0; i < all.Count; i++)
{
all[i] = Reverse(all[i]);
}
return all.ToArray();
}
public static string Reverse(string s)
{
char[] charArray = s.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
I wrote a simple method which is four times faster than yours (but without the preserve argument till now):
public static string[] getBetweenAll2(this string main, string strstart, string strend, bool preserve = false)
{
List<string> results = new List<string>();
int lenStart = strstart.Length;
int indexStart = 0;
while (true)
{
indexStart = main.IndexOf(strstart, indexStart);
if (indexStart < 0)
break;
int indexEnd = main.IndexOf(strend, indexStart);
if (indexEnd < 0)
break;
results.Add(main.Substring(indexStart+ lenStart, indexEnd- indexStart- lenStart));
indexStart = indexEnd;
}
return results.ToArray();
}
This gives you the numbers 1, 2, 3 and 4 out of your string <1><2><3><4>
Does this do what you want?
[Edit]
Finds nested things:
public static string[] getBetweenAll2(this string main, string strstart, string strend, bool preserve = false)
{
List<string> results = new List<string>();
int lenStart = strstart.Length;
int lenEnd = strend.Length;
int index = 0;
Stack<int> starPos = new Stack<int>();
while (true)
{
int indexStart = main.IndexOf(strstart, index);
int indexEnd = main.IndexOf(strend, index);
if (indexStart != -1 && indexStart < indexEnd)
{
index = indexStart + lenStart;
starPos.Push(index);
}
else if (indexEnd != -1 && (indexEnd < indexStart || indexStart == -1))
{
if (starPos.Count == 1)
{
int startOfInterst = starPos.Pop();
results.Add(main.Substring(startOfInterst, indexEnd - startOfInterst));
} else if(starPos.Count>0)
{
starPos.Pop();
}
index = indexEnd + lenEnd;
}
else
{
break;
}
}
return results.ToArray();
}
来源:https://stackoverflow.com/questions/35104592/c-improving-speed-of-custom-getbetweenall