[SalesForce] DISTINCT is not working as expected in Marketing Cloud

I'm experiencing problems with adding distinct in SQL automation. With SQL shown below I still receive duplicate clientids. Primary key would not work for me since during the next run of this automation we might need to populate target DE with the same IDs.

select 
distinct f.clientId as clientid, f.theaterId, f.movieId, 'LateFilmStart' as eventType
from 
FavoritesMovies f
join
eventReference e
on f.theaterId = e.Theater where  
e.Status = 'active' and e.Event = 'LateFilmStart'

Best Answer

Distinct works as if all fields in the select statement are primary keys. So this means in your query you are looking for distinct values for clientid, theatreid,movieId and eventType. Which is why you are seeing the duplications.

so for instance:

ClientID  |  TheatreID  |  movieID  |  eventType
123       |     234     |    456    |  LateFilmStart
123       |     2334    |    456    |  LateFilmStart
123       |     2334    |    456    |  LateFilmStart

Would output 2 records as the TheatreIDs are different, meaning they are both distinct values - but the third one would be removed as it is 100% duplicate of record 2.

My recommendation would be to add in a ROW_NUMBER() column into a subquery to remove duplicates:

select 
x.clientid, x.theaterId, x.movieId, x.eventType
from (
  select
  f.clientId as clientid, f.theaterId, f.movieId, 'LateFilmStart' as eventType,
  ROW_NUMBER() OVER(PARTITION BY f.clientID ORDER BY f.clientId) as rn
  from 
  FavoritesMovies f
) x
join
eventReference e
on x.theaterId = e.Theater 
where  
e.Status = 'active' 
and e.Event = 'LateFilmStart'
and x.rn = 1

By creating a subquery containing your FavoriteMovies data, you can now utilize ROW_NUMBER() to assign a number to all those with the same clientId. You then add in a where clause to your query to only pull those with a rn of 1. Meaning each clientId will only exist once as all further rows 2+ will not be pulled into the final results.

Related Topic