I am writing a C# application that needs to read about 130,000 (String, Int32) pairs at startup to a Dictionary. The pairs are stored in a .txt file, and are thus easily mod
If you want to have the data relatively safely stored, you can encrypt the contents. If you just encrypt it as a string and decrypt it before your current parsing logic, you should be safe. And, this should not impact performance that much.
See Encrypt and decrypt a string for more information.
Encryption comes at the cost of key management. And, of course, even the fastest encryption/decryption algorithms are slower than no encryption at all. Same with compression, which will only help if you are I/O-bound.
If performance is your main concern, start looking at where the bottleneck actually is. If the culprit really is the Convert.ToInt32() call, I imagine you can store the Int32 bits directly and get away with a simple cast, which should be faster than parsing a string value. To obfuscate the strings, you can xor each byte with some fixed value, which is fast but provides nothing more than a roadbump for a determined attacker.
Is it safe enough to use BinaryFormatter
instead of storing the contents directly in the text file? Obviously not. Because others can easily "destroy" the file by opening it by notepad and add something, even though he can see strange characters only. It's better if you store it in a database. But if you insist your solution, you can easily improve the performance a lot, by using Parallel Programming
in C#4.0 (you can easily get a lot of useful examples by googling it). Something looks like this:
//just an example
Dictionary<string, int> source = GetTheDict();
var grouped = source.GroupBy(x =>
{
if (x.Key.First() >= 'a' && x.Key.First() <= 'z') return "File1";
else if (x.Key.First() >= 'A' && x.Key.First() <= 'Z') return "File2";
return "File3";
});
Parallel.ForEach(grouped, g =>
{
ThreeStreamsToWriteToThreeFilesParallelly(g);
});
Another alternative solution of Parallel
is creating several threads, reading from/writing to different files will be faster.
Perhaps something like:
static void Serialize(string path, IDictionary<string, int> data)
{
using (var file = File.Create(path))
using (var writer = new BinaryWriter(file))
{
writer.Write(data.Count);
foreach(var pair in data)
{
writer.Write(pair.Key);
writer.Write(pair.Value);
}
}
}
static IDictionary<string,int> Deserialize(string path)
{
using (var file = File.OpenRead(path))
using (var reader = new BinaryReader(file))
{
int count = reader.ReadInt32();
var data = new Dictionary<string, int>(count);
while(count-->0) {
data.Add(reader.ReadString(), reader.ReadInt32());
}
return data;
}
}
Note this doesn't do anything re encryption; that is a separate concern. You might also find that adding deflate into the mix reduces file IO and increases performance:
static void Serialize(string path, IDictionary<string, int> data)
{
using (var file = File.Create(path))
using (var deflate = new DeflateStream(file, CompressionMode.Compress))
using (var writer = new BinaryWriter(deflate))
{
writer.Write(data.Count);
foreach(var pair in data)
{
writer.Write(pair.Key);
writer.Write(pair.Value);
}
}
}
static IDictionary<string,int> Deserialize(string path)
{
using (var file = File.OpenRead(path))
using (var deflate = new DeflateStream(file, CompressionMode.Decompress))
using (var reader = new BinaryReader(deflate))
{
int count = reader.ReadInt32();
var data = new Dictionary<string, int>(count);
while(count-->0) {
data.Add(reader.ReadString(), reader.ReadInt32());
}
return data;
}
}
Well, using a BinaryFormatter isn't really a safe way to store the pairs, as you can write a very simple program to deserialize it (after, say, running reflector on your code to get the type)
How about encrypting the txt? With something like this for example ? (for maximum performance, try without compression)
interesting question. I did some quick tests and you are right - BinaryFormatter is surprisingly slow:
When I coded it with a StreamReader/StreamWriter with comma separated values I got:
But then I tried just using a BinaryWriter/BinaryReader:
The code for that looks like this:
public void Serialize(Dictionary<string, int> dictionary, Stream stream)
{
BinaryWriter writer = new BinaryWriter(stream);
writer.Write(dictionary.Count);
foreach (var kvp in dictionary)
{
writer.Write(kvp.Key);
writer.Write(kvp.Value);
}
writer.Flush();
}
public Dictionary<string, int> Deserialize(Stream stream)
{
BinaryReader reader = new BinaryReader(stream);
int count = reader.ReadInt32();
var dictionary = new Dictionary<string,int>(count);
for (int n = 0; n < count; n++)
{
var key = reader.ReadString();
var value = reader.ReadInt32();
dictionary.Add(key, value);
}
return dictionary;
}
As others have said though, if you are concerned about users tampering with the file, encryption, rather than binary formatting is the way forward.