I'm experiencing problems with adding distinct in SQL automation. With SQL shown below I still receive duplicate clientids. Primary key would not work for me since during the next run of this automation we might need to populate target DE with the same IDs.
select distinct f.clientId as clientid, f.theaterId, f.movieId, 'LateFilmStart' as eventType from FavoritesMovies f join eventReference e on f.theaterId = e.Theater where e.Status = 'active' and e.Event = 'LateFilmStart'
Distinct works as if all fields in the select statement are primary keys. So this means in your query you are looking for distinct values for clientid, theatreid,movieId and eventType. Which is why you are seeing the duplications.
so for instance:
Would output 2 records as the TheatreIDs are different, meaning they are both distinct values - but the third one would be removed as it is 100% duplicate of record 2.
My recommendation would be to add in a ROW_NUMBER() column into a subquery to remove duplicates:
By creating a subquery containing your
FavoriteMoviesdata, you can now utilize ROW_NUMBER() to assign a number to all those with the same clientId. You then add in a where clause to your query to only pull those with a
rnof 1. Meaning each clientId will only exist once as all further rows 2+ will not be pulled into the final results.