I'm experiencing problems with adding distinct in SQL automation. With SQL shown below I still receive duplicate clientids. Primary key would not work for me since during the next run of this automation we might need to populate target DE with the same IDs.
select
distinct f.clientId as clientid, f.theaterId, f.movieId, 'LateFilmStart' as eventType
from
FavoritesMovies f
join
eventReference e
on f.theaterId = e.Theater where
e.Status = 'active' and e.Event = 'LateFilmStart'
Best Answer
Distinct works as if all fields in the select statement are primary keys. So this means in your query you are looking for distinct values for clientid, theatreid,movieId and eventType. Which is why you are seeing the duplications.
so for instance:
Would output 2 records as the TheatreIDs are different, meaning they are both distinct values - but the third one would be removed as it is 100% duplicate of record 2.
My recommendation would be to add in a ROW_NUMBER() column into a subquery to remove duplicates:
By creating a subquery containing your
FavoriteMovies
data, you can now utilize ROW_NUMBER() to assign a number to all those with the same clientId. You then add in a where clause to your query to only pull those with arn
of 1. Meaning each clientId will only exist once as all further rows 2+ will not be pulled into the final results.