Best practices for serializing objects to a custom string format for use in an output file

▼魔方 西西 提交于 2019-11-27 00:09:57

Here is a generic fashion for creating CSV from a list of objects, using reflection:

    public static string ToCsv<T>(string separator, IEnumerable<T> objectlist)
    {
        Type t = typeof(T);
        FieldInfo[] fields = t.GetFields();

        string header = String.Join(separator, fields.Select(f => f.Name).ToArray());

        StringBuilder csvdata = new StringBuilder();
        csvdata.AppendLine(header);

        foreach (var o in objectlist) 
            csvdata.AppendLine(ToCsvFields(separator, fields, o));

        return csvdata.ToString();
    }

    public static string ToCsvFields(string separator, FieldInfo[] fields, object o)
    {
        StringBuilder linie = new StringBuilder();

        foreach (var f in fields)
        {
            if (linie.Length > 0)
                linie.Append(separator);

            var x = f.GetValue(o);

            if (x != null)
                linie.Append(x.ToString());
        }

        return linie.ToString();
    }

Many variations can be made, such as writing out directly to a file in ToCsv(), or replacing the StringBuilder with an IEnumerable and yield statements.

Here is a simplified version of Per Hejndorf's CSV idea (without the memory overhead as it yields each line in turn). Due to popular demand it also supports both fields and simple properties by use of Concat.

Update 18 May 2017

This example was never intended to be a complete solution, just advancing the original idea posted by Per Hejndorf. To generate valid CSV you need to replace any text delimiter characters, within the text, with a sequence of 2 delimiter characters. e.g. a simple .Replace("\"", "\"\"").

Update 12 Feb 2016

After using my own code again in a project today, I realised I should not have taken anything for granted when I started from the example of @Per Hejndorf. It makes more sense to assume a default delimiter of "," (comma) and make the delimiter the second, optional, parameter. My own library version also provides a 3rd header parameter that controls whether a header row should be returned as sometimes you only want the data.

e.g.

public static IEnumerable<string> ToCsv<T>(IEnumerable<T> objectlist, string separator = ",", bool header = true)
{
    FieldInfo[] fields = typeof(T).GetFields();
    PropertyInfo[] properties = typeof(T).GetProperties();
    if (header)
    {
        yield return String.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p=>p.Name)).ToArray());
    }
    foreach (var o in objectlist)
    {
        yield return string.Join(separator, fields.Select(f=>(f.GetValue(o) ?? "").ToString())
            .Concat(properties.Select(p=>(p.GetValue(o,null) ?? "").ToString())).ToArray());
    }
}

so you then use it like this for comma delimited:

foreach (var line in ToCsv(objects))
{
    Console.WriteLine(line);
}

or like this for another delimiter (e.g. TAB):

foreach (var line in ToCsv(objects, "\t"))
{
    Console.WriteLine(line);
}

Practical examples

write list to a comma-delimited CSV file

using (TextWriter tw = File.CreateText("C:\testoutput.csv"))
{
    foreach (var line in ToCsv(objects))
    {
        tw.WriteLine(line);
    }
}

or write it tab-delimited

using (TextWriter tw = File.CreateText("C:\testoutput.txt"))
{
    foreach (var line in ToCsv(objects, "\t"))
    {
        tw.WriteLine(line);
    }
}

If you have complex fields/properties you will need to filter them out of the select clauses.


Previous versions and details below:

Here is a simplified version of Per Hejndorf's CSV idea (without the memory overhead as it yields each line in turn) and has only 4 lines of code :)

public static IEnumerable<string> ToCsv<T>(string separator, IEnumerable<T> objectlist)
{
    FieldInfo[] fields = typeof(T).GetFields();
    yield return String.Join(separator, fields.Select(f => f.Name).ToArray());
    foreach (var o in objectlist)
    {
        yield return string.Join(separator, fields.Select(f=>(f.GetValue(o) ?? "").ToString()).ToArray());
    }
}

You can iterate it like this:

foreach (var line in ToCsv(",", objects))
{
    Console.WriteLine(line);
}

where objects is a strongly typed list of objects.

This variation includes both public fields and simple public properties:

public static IEnumerable<string> ToCsv<T>(string separator, IEnumerable<T> objectlist)
{
    FieldInfo[] fields = typeof(T).GetFields();
    PropertyInfo[] properties = typeof(T).GetProperties();
    yield return String.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p=>p.Name)).ToArray());
    foreach (var o in objectlist)
    {
        yield return string.Join(separator, fields.Select(f=>(f.GetValue(o) ?? "").ToString())
            .Concat(properties.Select(p=>(p.GetValue(o,null) ?? "").ToString())).ToArray());
    }
}

As rule of thumb I advocate only overriding toString as a tool for debugging, if it's for business logic it should be an explicit method on the class/interface.

For simple serialization like this I'd suggest having a separate class that knows about your CSV output library and your business objects that does the serialization rather than pushing the serialization into the business objects themselves.

This way you end up with a class per output format that produces a view of your model.

For more complex serialization where you're trying to write out an object graph for persistence I'd consider putting it in the business classes - but only if it makes for cleaner code.

The problem with the solutions I found so far is that they don't let you export a subset of properties, but only the entire object. Most of the time, when we need to export data in CSV, we need to "tailor" its format in a precise way, so I created this simple extension method that allows me to do that by passing an array of parameters of type Func<T, string> to specify the mapping.

public static string ToCsv<T>(this IEnumerable<T> list, params Func<T, string>[] properties)
{
    var columns = properties.Select(func => list.Select(func).ToList()).ToList();

    var stringBuilder = new StringBuilder();

    var rowsCount = columns.First().Count;

    for (var i = 0; i < rowsCount; i++)
    {
        var rowCells = columns.Select(column => column[i]);

        stringBuilder.AppendLine(string.Join(",", rowCells));
    }

    return stringBuilder.ToString();
}

Usage:

philosophers.ToCsv(x => x.LastName, x => x.FirstName)

Generates:

Hayek,Friedrich
Rothbard,Murray
Brent,David

I had an issue the HiTech Magic's variation were two properties with the same value, only one would get populated. This seems to have fixed it:

        public static IEnumerable<string> ToCsv<T>(string separator, IEnumerable<T> objectlist)
    {
        FieldInfo[] fields = typeof(T).GetFields();
        PropertyInfo[] properties = typeof(T).GetProperties();
        yield return String.Join(separator, fields.Select(f => f.Name).Union(properties.Select(p => p.Name)).ToArray());
        foreach (var o in objectlist)
        {
            yield return string.Join(separator, (properties.Select(p => (p.GetValue(o, null) ?? "").ToString())).ToArray());
        }
    }

Gone Coding's answer was very helpful. I made some changes to it in order to handle text gremlins that would hose the output.

 /******************************************************/
    public static IEnumerable<string> ToCsv<T>(IEnumerable<T> objectlist, string separator = ",", bool header = true)
    {
       FieldInfo[] fields = typeof(T).GetFields();
       PropertyInfo[] properties = typeof(T).GetProperties();
       string str1;
       string str2;

       if(header)
       {
          str1 = String.Join(separator, fields.Select(f => f.Name).Concat(properties.Select(p => p.Name)).ToArray());
          str1 = str1 + Environment.NewLine;
          yield return str1;
       }
       foreach(var o in objectlist)
       {
          //regex is to remove any misplaced returns or tabs that would
          //really mess up a csv conversion.
          str2 = string.Join(separator, fields.Select(f => (Regex.Replace(Convert.ToString(f.GetValue(o)), @"\t|\n|\r", "") ?? "").Trim())
             .Concat(properties.Select(p => (Regex.Replace(Convert.ToString(p.GetValue(o, null)), @"\t|\n|\r", "") ?? "").Trim())).ToArray());

          str2 = str2 + Environment.NewLine;
          yield return str2;
       }
    }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!