How to join two tables in PowerQuery with one of many columns matching?

拈花ヽ惹草 提交于 2020-12-11 05:03:57

问题


Let's assume we have two PowerQuery queries:

  1. a query named Query1 with columns Col1_1 and Col1_2
  2. a query named Query2 with columns Col2_1 and Col2_2

I know it's possible to join these two queries with a merge query like this:

let
    Source = Table.NestedJoin(Query1,{"Col1_1", "Col1_2"},Query2,{"Col2_1", "Col2_2"},"Query2",JoinKind.LeftOuter)
in
    Source

In SQL, this could be represented as

SELECT
    *
FROM
    Query1
    LEFT JOIN Query2 ON Query1.Col1_1 = Query2.Col2_1 AND Query1.Col1_2 = Query2.Col2_2

Question: is it possible to join the two queries if at least one of the two columns match? In SQL, this could be represented as

SELECT
    *
FROM
    Query1
    LEFT JOIN Query2 ON Query1.Col1_1 = Query2.Col2_1 OR Query1.Col1_2 = Query2.Col2_2

回答1:


Instead of doing a table join, another possibility would be to add a custom column using the logic you want:

Table.AddColumn(Source, "JoinQuery2",
    (Q1) => Table.SelectRows(Query2,
                each Q1[Col1_1] = [Col2_1] or Q1[Col1_2] = [Col2_2]
            )
)

You can then expand this column of tables just like if you did a merge.


I got this idea from this blog post: Join Conditions In Power Query, Part 1




回答2:


As far as I know there are no options to alter the default method of matching in Join functions in PQ. You could however do two joins on each of the columns you want, then combine the results of those joins.

This would result in duplicate matches when both col1 and col2 match, which I'm not sure if that is the intended result. If not, you can use PQ to setup indexes to catch these duplicates and remove them.

Assuming Query2 also has had an Index added that looks like this:

let
    Source = Query1,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index",{"col1"},Query2,{"col1"},"col1Join",JoinKind.LeftOuter),
    #"Merged Queries1" = Table.NestedJoin(#"Merged Queries",{"col2"},Query2,{"col2"},"col2Join",JoinKind.LeftOuter),
    #"Added Custom" = Table.AddColumn(#"Merged Queries1", "MergeTables", each Table.Combine({[col1Join],[col2Join]})),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "MergeTables", {"col1", "col2", "Index"}, {"Query2.col1", "Query2.col2", "Query2.Index"}),
    #"Removed Duplicates" = Table.Distinct(#"Expanded Custom", {"Index", "Query2.Index"}),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Duplicates",{"Index", "col1Join", "col2Join", "Query2.Index"})
in
    #"Removed Columns"

Not a very scale-able solution, but I think it works correctly?



来源:https://stackoverflow.com/questions/54654147/how-to-join-two-tables-in-powerquery-with-one-of-many-columns-matching

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!