Creating hash for folder

后端 未结 7 1139
梦谈多话
梦谈多话 2020-12-13 05:05

i need to create hash for folder, that contains some files. I already done this task for each of files, but i searching the way to create one hash for all files in folder.

相关标签:
7条回答
  • 2020-12-13 05:22

    Concatenate filenames and files content in one big string and hash that, or do the hashing in chunks for performance.

    Sure you need to take few things into account:

    • You need to sort files by name, so you don't get two different hashes in case files order changes.
    • Using this method you only take the filenames and content into account. if the filename doesn't count you may sort by content first then hash, if more attributes (ctime/mtime/hidden/archived..) matters, include them in the to-be-hashed string.
    0 讨论(0)
  • 2020-12-13 05:29

    Create tarball of files, hash the tarball.

    > tar cf hashes *.abc
    > md5sum hashes

    Or hash the individual files and pipe output into hash command.

    > md5sum *.abc | md5sum

    Edit: both approaches above do not sort the files so may return different hash for each invocation, depending upon how the shell expands asterisks.

    0 讨论(0)
  • 2020-12-13 05:29

    If you already have hashes for all the files, just sort the hashes alphabetically, concatenate them and hash them again to create an uber hash.

    0 讨论(0)
  • 2020-12-13 05:42

    Here's a solution that uses streaming to avoid memory and latency issues.

    By default the file paths are included in the hashing, which will factor not only the data in the files, but the file system entries themselves, which avoids hash collisions. This post is tagged security, so this ought to be important.

    Finally, this solution puts you in control the hashing algorithm and which files get hashed and in what order.

    public static class HashAlgorithmExtensions
    {
        public static async Task<byte[]> ComputeHashAsync(this HashAlgorithm alg, IEnumerable<FileInfo> files, bool includePaths = true)
        {
            using (var cs = new CryptoStream(Stream.Null, alg, CryptoStreamMode.Write))
            {
                foreach (var file in files)
                {
                    if (includePaths)
                    {
                        var pathBytes = Encoding.UTF8.GetBytes(file.FullName);
                        cs.Write(pathBytes, 0, pathBytes.Length);
                    }
    
                    using (var fs = file.OpenRead())
                        await fs.CopyToAsync(cs);
                }
    
                cs.FlushFinalBlock();
            }
    
            return alg.Hash;
        }
    }
    

    An example that hashes all the files in a folder:

    async Task<byte[]> HashFolder(DirectoryInfo folder, string searchPattern = "*", SearchOption searchOption = SearchOption.TopDirectoryOnly)
    {
        using(var alg = MD5.Create())
            return await alg.ComputeHashAsync(folder.EnumerateFiles(searchPattern, searchOption));
    }
    
    0 讨论(0)
  • 2020-12-13 05:43

    Dunc's answer works well; however, it does not handle an empty directory. The code below returns the MD5 'd41d8cd98f00b204e9800998ecf8427e' (the MD5 for a 0 length character stream) for an empty directory.

    public static string CreateDirectoryMd5(string srcPath)
    {
        var filePaths = Directory.GetFiles(srcPath, "*", SearchOption.AllDirectories).OrderBy(p => p).ToArray();
    
        using (var md5 = MD5.Create())
        {
            foreach (var filePath in filePaths)
            {
                // hash path
                byte[] pathBytes = Encoding.UTF8.GetBytes(filePath);
                md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);
    
                // hash contents
                byte[] contentBytes = File.ReadAllBytes(filePath);
    
                md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
            }
    
            //Handles empty filePaths case
            md5.TransformFinalBlock(new byte[0], 0, 0);
    
            return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
        }
    }
    
    0 讨论(0)
  • 2020-12-13 05:46

    Quick and Dirty folder hash that does not go down to suborders or read binary data. It is based on file and sub-folder names.

    Public Function GetFolderHash(ByVal sFolder As String) As String
        Dim oFiles As List(Of String) = IO.Directory.GetFiles(sFolder).OrderBy(Function(x) x.Count).ToList()
        Dim oFolders As List(Of String) = IO.Directory.GetDirectories(sFolder).OrderBy(Function(x) x.Count).ToList()
        oFiles.AddRange(oFolders)
    
        If oFiles.Count = 0 Then
            Return ""
        End If
    
        Dim oDM5 As System.Security.Cryptography.MD5 = System.Security.Cryptography.MD5.Create()
        For i As Integer = 0 To oFiles.Count - 1
            Dim sFile As String = oFiles(i)
            Dim sRelativePath As String = sFile.Substring(sFolder.Length + 1)
            Dim oPathBytes As Byte() = System.Text.Encoding.UTF8.GetBytes(sRelativePath.ToLower())
    
            If i = oFiles.Count - 1 Then
                oDM5.TransformFinalBlock(oPathBytes, 0, oPathBytes.Length)
            Else
                oDM5.TransformBlock(oPathBytes, 0, oPathBytes.Length, oPathBytes, 0)
            End If
        Next
    
        Return BitConverter.ToString(oDM5.Hash).Replace("-", "").ToLower()
    End Function
    
    0 讨论(0)
提交回复
热议问题