Join 2 DataTables with many columns

十年热恋 提交于 2019-12-06 17:01:01

问题


I have a question, want to join 2 tables with same column. Table 1 has Name, LastName Columns and many other columns, Table 2 has Name, Comment and many other Columns. I want to join them with Name column and as Result should be Name, LastName, Comment and other Columns. I tried with outer left Linq but don't know how to write select new because don't know how many other columns i have.

My Table 1:

Name1   LastName ...
Niki   Row      ...
Hube   Slang    ...
Koke   Mi       ... 
...    ...      ...
...    ...      ...

Table 2:

Name  Comment   ...
Koke   "Hello"  ...
Niki   "Hi"     ...

Result should be:

Name   LastName   Comment ...
Niki    Row        "Hi"   ...
Hube    Sland             ...
Koke    Mi         "Hello"...
...     ...               ...

So i tried to concat the rows with each others. But it said that the array from table1 is longer than array from table 2. Is there another way to join it?

        foreach (DataRow tbE in Table1.Rows)
        {
            foreach (DataRow opT in Table2.Rows)
            {
                if (tbE["Name"].ToString() == opT["Name"].ToString())
                {
                    var row = Result.NewRow();
                    row.ItemArray = tbE.ItemArray
                                       .Concat(opT.ItemArray).ToArray();

                    Result.Rows.Add(row);
                }
                else
                    Result.ImportRow(tbE);

            }
        } 
        Result.Columns.Remove(Name); 

回答1:


You could use this method here which i've written from scratch recently for SO for another question(so it's not really tested). It allows to merge multiple tables by a common key. If no key is specified it will just use the default DataTable.Merge method:

public static DataTable MergeAll(this IList<DataTable> tables, String primaryKeyColumn)
{
    if (!tables.Any())
        throw new ArgumentException("Tables must not be empty", "tables");
    if(primaryKeyColumn != null)
        foreach(DataTable t in tables)
            if(!t.Columns.Contains(primaryKeyColumn))
                throw new ArgumentException("All tables must have the specified primarykey column " + primaryKeyColumn, "primaryKeyColumn");

    if(tables.Count == 1)
        return tables[0];

    DataTable table = new DataTable("TblUnion");
    table.BeginLoadData(); // Turns off notifications, index maintenance, and constraints while loading data
    foreach (DataTable t in tables)
    {
        table.Merge(t); // same as table.Merge(t, false, MissingSchemaAction.Add);
    }
    table.EndLoadData();

    if (primaryKeyColumn != null)
    {
        // since we might have no real primary keys defined, the rows now might have repeating fields
        // so now we're going to "join" these rows ...
        var pkGroups = table.AsEnumerable()
            .GroupBy(r => r[primaryKeyColumn]);
        var dupGroups = pkGroups.Where(g => g.Count() > 1);
        foreach (var grpDup in dupGroups)
        { 
            // use first row and modify it
            DataRow firstRow = grpDup.First();
            foreach (DataColumn c in table.Columns)
            {
                if (firstRow.IsNull(c))
                {
                    DataRow firstNotNullRow = grpDup.Skip(1).FirstOrDefault(r => !r.IsNull(c));
                    if (firstNotNullRow != null)
                        firstRow[c] = firstNotNullRow[c];
                }
            }
            // remove all but first row
            var rowsToRemove = grpDup.Skip(1);
            foreach(DataRow rowToRemove in rowsToRemove)
                table.Rows.Remove(rowToRemove);
        }
    }

    return table;
}

You can call it in this way:

var tables = new[] { Table1, Table2 };
tables.MergeAll("Name");

Edit: here's the screenshot from the debugger with your sample-data:

So it works :)

Sample data and test here:

var Table1 = new DataTable();
var Table2 = new DataTable();
Table1.Columns.Add("Name");
Table1.Columns.Add("LastName");

Table2.Columns.Add("Name");
Table2.Columns.Add("Comment");

Table1.Rows.Add("Niki", "Row");
Table1.Rows.Add("Hube", "Slang");
Table1.Rows.Add("Koke", "Mi");

Table2.Rows.Add("Koke", "Hello");
Table2.Rows.Add("Niki", "Hi");

var tables = new DataTable[] { Table1, Table2 };
DataTable merged = tables.MergeAll("Name");



回答2:


Here is a bit of my contribution. This partial code can be used to Join any two DataTables on specified column names. (You do not need to know the rest of the columns) Here are some of the features:

  1. The resultant DataTable will not have duplicate columns for those used in join. e.g. if you join on "Name" column, you will have only one "Name" column in the end, instead of one copy from each table.
  2. In case of duplicate columns that are NOT used in join, the duplicate column in the second table will be renamed by appending "_2" to the end. It can behave in other ways, just change that part of code.
  3. Multiple Join columns are supported. For this purpose, a JoinKey class is created for them to be comparable by LINQ.
  4. This code is kind of a mixture of code I found online and my trial and error. I am new to LINQ so feel free to critique~

    public class JoinKey
    {
        List<object> objects { get; set; }
    
        public JoinKey(List<object> objects)
        {
            this.objects = objects;
        }
    
        public override bool Equals(object obj)
        {
            if (obj == null || obj.GetType() != typeof(JoinKey))
                return false;
            return objects.SequenceEqual(((JoinKey)obj).objects);
        }
    
        public override int GetHashCode()
        {
            int hash = 0;
            foreach (var foo in objects)
            {
                hash = hash * 31 + foo.GetHashCode();
            }
            return hash;
        }
    }
    
    public enum JoinType
    {
        Inner = 0,
        Left = 1
    }
    
        //Joins two tables and spits out the joined new DataTable. Tables are joined on onCol column names
        //If the right table has column name clashes with the left column, the column names will be appended "_2" and added to joined table
        public static DataTable Join(DataTable left, DataTable right, JoinType joinType, params string[] onCol)
        {
            Func<DataRow, object> getKey = (row) =>
            {
                return new JoinKey(onCol.Select(str => row[str]).ToList());
            };
            var dt = new DataTable(left.TableName);
            var colNumbersToRemove = new List<int>();
            //Populate the columns
            foreach (DataColumn col in left.Columns)
            {
                if (dt.Columns[col.ColumnName] == null)
                    dt.Columns.Add(new DataColumn(col.ColumnName, col.DataType, col.Expression, col.ColumnMapping));
            }
            for (int colIdx = 0; colIdx < right.Columns.Count; ++colIdx)
            {
                var col = right.Columns[colIdx];
                //if this is joined column, it will be removed.
                if (onCol.Contains(col.ColumnName))
                {
                    colNumbersToRemove.Add(colIdx);
                }
                else
                {
                    //if this is duplicate column, it will be renamed.
                    if (dt.Columns[col.ColumnName] != null)
                    {
                        col.ColumnName += "_2";
                    }
                    dt.Columns.Add(new DataColumn(col.ColumnName, col.DataType, col.Expression, col.ColumnMapping));
                }
            }
    
            if (joinType == JoinType.Left)
            {
                var res = from l in left.AsEnumerable()
                          join r in right.AsEnumerable()
                          on getKey(l) equals getKey(r) into temp
                          from r in temp.DefaultIfEmpty()
                          select l.ItemArray.Concat(((r == null) ? (right.NewRow().ItemArray) : r.ItemArray).Minus(colNumbersToRemove)).ToArray();
                foreach (object[] values in res)
                    dt.Rows.Add(values);
            }
            else
            {
                //Inner Join
                var res = from l in left.AsEnumerable()
                          join r in right.AsEnumerable()
                          on getKey(l) equals getKey(r) into temp
                          from r in temp
                          select l.ItemArray.Concat(((r == null) ? (right.NewRow().ItemArray) : r.ItemArray).Minus(colNumbersToRemove)).ToArray();
                foreach (object[] values in res)
                    dt.Rows.Add(values);
            }
            return dt;
        }
    


来源:https://stackoverflow.com/questions/13156626/join-2-datatables-with-many-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!