how does the String.Split method determine separator precedence when passed multiple multi-character separators?

随声附和 提交于 2019-12-23 08:03:40

问题


If you have this code:

"......".Split(new String[]{"...", ".."}, StringSplitOptions.None);

The resulting array elements are:

 1. ""
 2. ""
 3. ""

Now if you reverse the order of the separators,

"......".Split(new String[]{"..", "..."}, StringSplitOptions.None);

The resulting array elements are:

 1. ""
 2. ""
 3. ""
 4. ""

From these 2 examples I feel inclined to conclude that the Split method recursively tokenizes as it goes through each element of the array from left to right.

However, once we throw in separators that contain alphanumeric characters into the equation, it is clear that the above theory is wrong.

  "5.x.7".Split(new String[]{".x", "x."}, StringSplitOptions.None)

results in: 1. "5" 2. ".7"

   "5.x.7".Split(new String[]{"x.", ".x"}, StringSplitOptions.None)

results in: 1. "5" 2. ".7"

This time we obtain the same output, which means that the rule theorized based on the first set of examples no longer applies. (ie: if separator precedence was always determined based on the position of the separator within the array, then in the last example we would have obtained "5." & "7" instead of "5" & ".7".

As to why I am wasting my time trying to guess how .NET standard API's work, it's because I want to implement similar functionality for my java apps, but neither StringTokenizer nor org.apache.commons.lang.StringUtils provide the ability to split a String using multiple multi-character separators (and even if I were to find an API that does provide this ability, it would be hard to know if it always tokenizes using the same algorithm used by the String.Split method.


回答1:


From MSDN:

To avoid ambiguous results when strings in separator have characters in common, the Split operation proceeds from the beginning to the end of the value of the instance, and matches the first element in separator that is equal to a delimiter in the instance. The order in which substrings are encountered in the instance takes precedence over the order of elements in separator.

So, for the first case ".." and "..." are found on the same position and their order in separator is used to determine the used one. For the second case, ".x" is found before "x." and the order of elements in separator does not apply.




回答2:


I've had a quick look at this.. and it would appear that the private method MakeSeparatorList in the string class actually retrieves an array of indexes.. but it will match the first one it finds.

So, because .x comes before x. in both of your examples, that index is stored.

This is the code I used to test:

var s = "5.x.7";

string[] separators = new string[] { "x.", ".x" };
int[] sepList = new int[1024];
int[] lengthList = new int[1024];

MethodInfo dynMethod = s.GetType().GetMethods(BindingFlags.NonPublic | BindingFlags.Instance).Last(x => x.Name == "MakeSeparatorList");
dynMethod.Invoke(s, new object[] { separators, sepList, lengthList });

Debugger.Break();

See this screenshot:

(My screenshot isn't showing? :/)

Notice how the index is 1 (which results in .x) even though .x is the second entry in the array.




回答3:


string .split does splits the first matching character matching to the argument. In simple Question : lets say you provided the option split("a", "b") and the String contains "appaleisbigapll" the algorithm is simple that is start with first character and matching with either of a or b. if it found these it does split and start with next character. in your example

5.x.7 with ".x", "x.". It rules with "or" operator so it finds .x first and checking the remaining .7 now as there is no matching character left so it leaves .7 as it is. Result 5 and .7

Same happening in the second question it founds .x and as the rule says .x or x. it continue with .7 the precedence is not applied here. And for your first set of example yes it does the split operation recursively.



来源:https://stackoverflow.com/questions/14762440/how-does-the-string-split-method-determine-separator-precedence-when-passed-mult

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!