Parsing formatted string

前端 未结 6 759
渐次进展
渐次进展 2020-12-03 14:12

I am trying to create a generic formatter/parser combination.

Example scenario:

  • I have a string for string.Format(), e.g. var format = \"{0}-{1}\
相关标签:
6条回答
  • 2020-12-03 14:19

    While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.

    One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.

    public static class StringExtensions
    {
        public static string[] ParseExact(
            this string data, 
            string format)
        {
            return ParseExact(data, format, false);
        }
    
        public static string[] ParseExact(
            this string data, 
            string format, 
            bool ignoreCase)
        {
            string[] values;
    
            if (TryParseExact(data, format, out values, ignoreCase))
                return values;
            else
                throw new ArgumentException("Format not compatible with value.");
        }
    
        public static bool TryExtract(
            this string data, 
            string format, 
            out string[] values)
        {
            return TryParseExact(data, format, out values, false);
        }
    
        public static bool TryParseExact(
            this string data, 
            string format, 
            out string[] values, 
            bool ignoreCase)
        {
            int tokenCount = 0;
            format = Regex.Escape(format).Replace("\\{", "{");
    
            for (tokenCount = 0; ; tokenCount++)
            {
                string token = string.Format("{{{0}}}", tokenCount);
                if (!format.Contains(token)) break;
                format = format.Replace(token,
                    string.Format("(?'group{0}'.*)", tokenCount));
            }
    
            RegexOptions options = 
                ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
    
            Match match = new Regex(format, options).Match(data);
    
            if (tokenCount != (match.Groups.Count - 1))
            {
                values = new string[] { };
                return false;
            }
            else
            {
                values = new string[tokenCount];
                for (int index = 0; index < tokenCount; index++)
                    values[index] = 
                        match.Groups[string.Format("group{0}", index)].Value;
                return true;
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-03 14:20

    A simple solution might be to

    • replace all format tokens with (.*)
    • escape all other special charaters in format
    • make the regex match non-greedy

    This would resolve the ambiguities to the shortest possible match.

    (I'm not good at RegEx, so please correct me, folks :))

    0 讨论(0)
  • 2020-12-03 14:22

    Assuming "-" is not in the original strings, can you not just use Split?

    var arr2 = formattedString.Split('-');
    

    Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.

    0 讨论(0)
  • 2020-12-03 14:36

    It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:

    String.Format("{0}-{1}", "hello-world", "stack-overflow");
    

    How would you "Unformat" it?

    0 讨论(0)
  • 2020-12-03 14:38

    You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.

    Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.

    If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.

    IMO, that's the best way to do this.

    0 讨论(0)
  • 2020-12-03 14:38

    After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:

    Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
    ...
    var arr = new string [] {"asdf", "qwer" };
    var res = string.Format(format, arr);
    unFormatLookup.Add(res,arr);
    

    and in Unformat method, you can simply pass a string and look up that string and return the array used:

    string [] Unformat(string res)
    {
      string [] arr;
      unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
      return arr; 
    }
    
    0 讨论(0)
提交回复
热议问题