C# Sanitize File Name

前端 未结 12 1435
谎友^
谎友^ 2020-12-04 06:06

I recently have been moving a bunch of MP3s from various locations into a repository. I had been constructing the new file names using the ID3 tags (thanks, TagLib-Sharp!),

相关标签:
12条回答
  • 2020-12-04 06:29

    Based on Andre's excellent answer but taking into account Spud's comment on reserved words, I made this version:

    /// <summary>
    /// Strip illegal chars and reserved words from a candidate filename (should not include the directory path)
    /// </summary>
    /// <remarks>
    /// http://stackoverflow.com/questions/309485/c-sharp-sanitize-file-name
    /// </remarks>
    public static string CoerceValidFileName(string filename)
    {
        var invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
        var invalidReStr = string.Format(@"[{0}]+", invalidChars);
    
        var reservedWords = new []
        {
            "CON", "PRN", "AUX", "CLOCK$", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4",
            "COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4",
            "LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
        };
    
        var sanitisedNamePart = Regex.Replace(filename, invalidReStr, "_");
        foreach (var reservedWord in reservedWords)
        {
            var reservedWordPattern = string.Format("^{0}\\.", reservedWord);
            sanitisedNamePart = Regex.Replace(sanitisedNamePart, reservedWordPattern, "_reservedWord_.", RegexOptions.IgnoreCase);
        }
    
        return sanitisedNamePart;
    }
    

    And these are my unit tests

    [Test]
    public void CoerceValidFileName_SimpleValid()
    {
        var filename = @"thisIsValid.txt";
        var result = PathHelper.CoerceValidFileName(filename);
        Assert.AreEqual(filename, result);
    }
    
    [Test]
    public void CoerceValidFileName_SimpleInvalid()
    {
        var filename = @"thisIsNotValid\3\\_3.txt";
        var result = PathHelper.CoerceValidFileName(filename);
        Assert.AreEqual("thisIsNotValid_3__3.txt", result);
    }
    
    [Test]
    public void CoerceValidFileName_InvalidExtension()
    {
        var filename = @"thisIsNotValid.t\xt";
        var result = PathHelper.CoerceValidFileName(filename);
        Assert.AreEqual("thisIsNotValid.t_xt", result);
    }
    
    [Test]
    public void CoerceValidFileName_KeywordInvalid()
    {
        var filename = "aUx.txt";
        var result = PathHelper.CoerceValidFileName(filename);
        Assert.AreEqual("_reservedWord_.txt", result);
    }
    
    [Test]
    public void CoerceValidFileName_KeywordValid()
    {
        var filename = "auxillary.txt";
        var result = PathHelper.CoerceValidFileName(filename);
        Assert.AreEqual("auxillary.txt", result);
    }
    
    0 讨论(0)
  • 2020-12-04 06:33

    I have had success with this in the past.

    Nice, short and static :-)

        public static string returnSafeString(string s)
        {
            foreach (char character in Path.GetInvalidFileNameChars())
            {
                s = s.Replace(character.ToString(),string.Empty);
            }
    
            foreach (char character in Path.GetInvalidPathChars())
            {
                s = s.Replace(character.ToString(), string.Empty);
            }
    
            return (s);
        }
    
    0 讨论(0)
  • 2020-12-04 06:39

    I think the problem is that you first call Path.GetDirectoryName on the bad string. If this has non-filename characters in it, .Net can't tell which parts of the string are directories and throws. You have to do string comparisons.

    Assuming it's only the filename that is bad, not the entire path, try this:

    public static string SanitizePath(string path, char replaceChar)
    {
        int filenamePos = path.LastIndexOf(Path.DirectorySeparatorChar) + 1;
        var sb = new System.Text.StringBuilder();
        sb.Append(path.Substring(0, filenamePos));
        for (int i = filenamePos; i < path.Length; i++)
        {
            char filenameChar = path[i];
            foreach (char c in Path.GetInvalidFileNameChars())
                if (filenameChar.Equals(c))
                {
                    filenameChar = replaceChar;
                    break;
                }
    
            sb.Append(filenameChar);
        }
    
        return sb.ToString();
    }
    0 讨论(0)
  • 2020-12-04 06:42
    string clean = String.Concat(dirty.Split(Path.GetInvalidFileNameChars()));
    
    0 讨论(0)
  • 2020-12-04 06:48

    I wanted to retain the characters in some way, not just simply replace the character with an underscore.

    One way I thought was to replace the characters with similar looking characters which are (in my situation), unlikely to be used as regular characters. So I took the list of invalid characters and found look-a-likes.

    The following are functions to encode and decode with the look-a-likes.

    This code does not include a complete listing for all System.IO.Path.GetInvalidFileNameChars() characters. So it is up to you to extend or utilize the underscore replacement for any remaining characters.

    private static Dictionary<string, string> EncodeMapping()
    {
        //-- Following characters are invalid for windows file and folder names.
        //-- \/:*?"<>|
        Dictionary<string, string> dic = new Dictionary<string, string>();
        dic.Add(@"\", "Ì"); // U+OOCC
        dic.Add("/", "Í"); // U+OOCD
        dic.Add(":", "¦"); // U+00A6
        dic.Add("*", "¤"); // U+00A4
        dic.Add("?", "¿"); // U+00BF
        dic.Add(@"""", "ˮ"); // U+02EE
        dic.Add("<", "«"); // U+00AB
        dic.Add(">", "»"); // U+00BB
        dic.Add("|", "│"); // U+2502
        return dic;
    }
    
    public static string Escape(string name)
    {
        foreach (KeyValuePair<string, string> replace in EncodeMapping())
        {
            name = name.Replace(replace.Key, replace.Value);
        }
    
        //-- handle dot at the end
        if (name.EndsWith(".")) name = name.CropRight(1) + "°";
    
        return name;
    }
    
    public static string UnEscape(string name)
    {
        foreach (KeyValuePair<string, string> replace in EncodeMapping())
        {
            name = name.Replace(replace.Value, replace.Key);
        }
    
        //-- handle dot at the end
        if (name.EndsWith("°")) name = name.CropRight(1) + ".";
    
        return name;
    }
    

    You can select your own look-a-likes. I used the Character Map app in windows to select mine %windir%\system32\charmap.exe

    As I make adjustments through discovery, I will update this code.

    0 讨论(0)
  • 2020-12-04 06:49

    there are a lot of working solutions here. just for the sake of completeness, here's an approach that doesn't use regex, but uses LINQ:

    var invalids = Path.GetInvalidFileNameChars();
    filename = invalids.Aggregate(filename, (current, c) => current.Replace(c, '_'));
    

    Also, it's a very short solution ;)

    0 讨论(0)
提交回复
热议问题