In DAX (not powerquery) drop duplicates based on column

问题

In my PowerBI desktop, I have table that is calculated from over other tables with a structure like this:

Input table:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Ted</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
    <tr>
      <td>Bill</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

Now, I want to keep only the first record for each unique email. My expected output table using DAX is:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

I was trying to use RANKX and FILTER, but not having any success.

回答1:

Sadly, the answer to this question is that there is no way in DAX to refer to the rows position relative to the other rows in the table. The only option is to use some column value for sorting purpose.

What we could do with the existing two columns table is to get the MAX or MIN Firstname per each Email. So we can write a calculated table like follows, where T is the input table and T Unique is the generated table.

T Unique = 
ADDCOLUMNS(
    ALL( T[Email] ),
    "Firstname",
        CALCULATE(
            MAX( T[Firstname ] )
        )
)

But this doesn't satisfy the requirement.

To obtain the desired result we need to add a column to the input table, with an index or a timestamp.

For this example I added an Index column using the following M code in Power Query, that is generated automatically by referencing the original table and then clicking on Add column -> Index column button

let
    Source = T,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type)
in
    #"Added Index"

So I obtained the T Index table.

Now we can write the following calculated table that uses the new column to retrieve the first row for each Email

T Index Unique = 
ADDCOLUMNS(
    ALL( 'T Index'[Email] ),
    "Firstname",
        VAR MinIndex =
            CALCULATE(
                MIN( 'T Index'[Index] )
            )
        RETURN
            CALCULATE(
                MAX( 'T Index'[Firstname ] ),
                'T Index'[Index] = MinIndex
            )
)

that generates the requested table

In a real case scenario, the best place to add the new column is directly into the code that generates the input table.

来源：https://stackoverflow.com/questions/65363786/in-dax-not-powerquery-drop-duplicates-based-on-column

标签

powerbi

dax

powerbi-desktop