An algorithm to “spacify” CamelCased strings

问题

Pretty basic, I'm just curious how others might implement this algorithm and would like to see if there are any clever tricks to optimize the algorithm...I just had to implement this for a project that I am working on.

Given a string in CamelCase, how would you go about "spacifying" it?

e.g. given FooBarGork I want Foo Bar Gork back.

Here is my algorithm in C#:


static void Main(string[] args)
{
    Console.WriteLine(UnCamelCase("FooBarGork"));
}
public static string UnCamelCase(string str)
{
    StringBuilder sb = new StringBuilder();
    for (int i =  0; i < str.Length; i++)
    {
        if (char.IsUpper(str, i) && i > 0) sb.Append(" ");
        sb.Append(str[i]);
    }
    return sb.ToString();
}

Since you have to visit every character once, I believe the best case is O(n). How would you implement this?

回答1:

I can already sense the flames, but I like regex for this kind of stuff.

public static string UnCamelCase(string str)
{
    return Regex.Replace(str, "([a-z])([A-Z])", "$1 $2");
}

(This may not be faster than your implementation, but to me it is more clear.)

And obviously, this would be even faster (at runtime)

private static Regex _unCamelRegex = new Regex("([a-z])([A-Z])", RegexOptions.Compiled);

public static string UnCamelCase(string str)
{
    return _unCamelRegex.Replace(str, "$1 $2");
}

This would handle the issue brought up by Pete Kirkham below (as far as camel-cased strings like HTTPRequest):

private static Regex _unCamelRegex1 = new Regex("([a-z])([A-Z])", RegexOptions.Compiled);
private static Regex _unCamelRegex2 = new Regex("([A-Z]+)([A-Z])([a-z])", RegexOptions.Compiled);

public static string UnCamelCase(string str)
{
    return _unCamelRegex2.Replace(_unCamelRegex1.Replace(str, "$1 $2"), "$1 $2$3");
}

This one takes HTTPRequestFOOBarGork and returns HTTP Request FOO Bar Gork

So I tested the iterative method against the regular expression method using the OPs implementation (with the 'start at 1 and skip the > 0 check' change) and my second reply (the one with the static compiled Regex object). Note that the results do not include the compilation time of the Regex. For 2 million calls (using the same FooBarGork input):

Iterative: 00:00:00.80
Regex: 00:00:06.71

So it is obvious that the iterative approach is much more efficient. I've included a fixed version of the OPs implementation (as suggested by Jason Punyon, any credit should go to him) that also takes into account a null or empty argument:

public static string UnCamelCaseIterative(string str)
{
    if (String.IsNullOrEmpty(str))
        return str;

    /* Note that the .ToString() is required, otherwise the char is implicitly
     * converted to an integer and the wrong overloaded ctor is used */
    StringBuilder sb = new StringBuilder(str[0].ToString());
    for (int i = 1; i < str.Length; i++)
    {
        if (char.IsUpper(str, i))
            sb.Append(" ");
        sb.Append(str[i]);
    }
    return sb.ToString();
}

回答2:

Why not start i at 1?

You'll get to eliminate the && i>0 check...

回答3:

Usually my decamelisation methods are a bit more complex, as "HTTPRequest" should become "HTTP Request" rather than "H T T P Request", and different applications handle digits differently too.

回答4:

And here's a PHP example

function spacify($str) {
  return preg_replace('/([a-z])([A-Z])/', "\1 \2", $str);
}

回答5:

Looking at your code, it seems that it's somehow been mangled (when you copied it over perhaps). Apart from fixing the for loop, I assume you're just missing an if statement with a char.IsUpper call around the sb.Append(" ") bit. Otherwise it's all fine of course. You're not going to get any better than O(n) for a generic string.

Now there is obviously a one-line RegEx replace call to accomplish this, but really there's no reason to do such things for such a simple task. Always best to avoid RegEx when you can for the purposes of readability.

回答6:

I'd probably do it in a similar way, just maybe instead of a stringbuilder go with:

str=str.replace(str[i], " "+str[i]);

I'm pretty sure your way ends up being more efficient though.

回答7:

I'd go with...

public static string UnCamelCase(string str) {
    Regex reg = new Regex("([A-Z])");

    return reg.Replace(str, " $1").Trim();
}

回答8:

Some regex flavors know the "\u" (upper-case) and "\U" (lower-case) character classes. They can replace this:

(?<=\U)(?=\u)

with a space. For those who you might not know these classes, this will do:

(?<=[a-z])(?=[A-Z])   // replace with a single space again

Explanation: The regex matches the spot between a lower-case and an upper-case character. CamelCasedWords are the only constructs where this usually happens.

CamelCasedWord
    ^^   ^^           // match occurs between the ^

回答9:

Somthing like this (Python)?

>>> s = 'FooBarGork'
>>> s[0] + re.sub(r'([A-Z])', r' \1', s[1:])
'Foo Bar Gork'

回答10:

Not very exciting but:

    public static string UnCamelCase(string str)
    {
        StringBuilder sb = new StringBuilder();

        foreach (char c in str.ToCharArray())
        {
            if (System.Convert.ToInt32(c) <= 90) sb.Append(" ");
            sb.Append(c);
        }
        return sb.ToString().Trim();
    }


        //Console.WriteLine(System.Convert.ToInt32('a')); // 97
        //Console.WriteLine(System.Convert.ToInt32('z')); // 122
        //Console.WriteLine(System.Convert.ToInt32('A')); // 65
        //Console.WriteLine(System.Convert.ToInt32('Z')); // 90

回答11:

Here's how the mootools javascript library does it (although they 'hyphenate', it's pretty easy to swap the hyphen for a space.

/*
Property: hyphenate
    Converts a camelCased string to a hyphen-ated string.

Example:
    >"ILikeCookies".hyphenate(); //"I-like-cookies"
*/

hyphenate: function(){
    return this.replace(/\w[A-Z]/g, function(match){
        return (match.charAt(0) + '-' + match.charAt(1).toLowerCase());
    });
},

回答12:

echo "FooBarGork" | sed -r 's/([A-Z])/ \1/g;s/^ //'

回答13:

To get index of Of Upper case

short syntax

Regex.Match("hello,World!", @"(\p{Lu})").Index

result 6

long example

using System.Text.RegularExpressions;

namespace namespace.Helpers
{
    public static class Helper
    {
        public static int IndexOfUppercase(this string str, int startIndex = 0)
        {
            return str.IndexOfRegex(@"(\p{Lu})", startIndex);
        }

        public static int IndexOfRegex(this string str, string regex, int startIndex )
        {
            return str.Substring(startIndex).IndexOfRegex(regex);
        }

        public static int IndexOfRegex(this string str, string regex)
        {
            var match = Regex.Match(str, regex);
            if (match.Success)
            {
                return match.Index;
            }
            return -1;
        }
    }
}

来源：https://stackoverflow.com/questions/484085/an-algorithm-to-spacify-camelcased-strings

标签

algorithm

language-agnostic

string