In DAX (not powerquery) drop duplicates based on column

对着背影说爱祢 提交于 2021-01-28 07:10:23

问题


In my PowerBI desktop, I have table that is calculated from over other tables with a structure like this:

Input table:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Ted</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
    <tr>
      <td>Bill</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

Now, I want to keep only the first record for each unique email. My expected output table using DAX is:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

I was trying to use RANKX and FILTER, but not having any success.


回答1:


Sadly, the answer to this question is that there is no way in DAX to refer to the rows position relative to the other rows in the table. The only option is to use some column value for sorting purpose.

What we could do with the existing two columns table is to get the MAX or MIN Firstname per each Email. So we can write a calculated table like follows, where T is the input table and T Unique is the generated table.

T Unique = 
ADDCOLUMNS(
    ALL( T[Email] ),
    "Firstname",
        CALCULATE(
            MAX( T[Firstname ] )
        )
)

But this doesn't satisfy the requirement.

To obtain the desired result we need to add a column to the input table, with an index or a timestamp.

For this example I added an Index column using the following M code in Power Query, that is generated automatically by referencing the original table and then clicking on Add column -> Index column button

let
    Source = T,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type)
in
    #"Added Index"

So I obtained the T Index table.

Now we can write the following calculated table that uses the new column to retrieve the first row for each Email

T Index Unique = 
ADDCOLUMNS(
    ALL( 'T Index'[Email] ),
    "Firstname",
        VAR MinIndex =
            CALCULATE(
                MIN( 'T Index'[Index] )
            )
        RETURN
            CALCULATE(
                MAX( 'T Index'[Firstname ] ),
                'T Index'[Index] = MinIndex
            )
)

that generates the requested table

In a real case scenario, the best place to add the new column is directly into the code that generates the input table.



来源:https://stackoverflow.com/questions/65363786/in-dax-not-powerquery-drop-duplicates-based-on-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!