[SalesForce] DISTINCT is not working as expected in Marketing Cloud

I'm experiencing problems with adding distinct in SQL automation. With SQL shown below I still receive duplicate clientids. Primary key would not work for me since during the next run of this automation we might need to populate target DE with the same IDs.

select 
distinct f.clientId as clientid, f.theaterId, f.movieId, 'LateFilmStart' as eventType
from 
FavoritesMovies f
join
eventReference e
on f.theaterId = e.Theater where  
e.Status = 'active' and e.Event = 'LateFilmStart'

Best Answer

Distinct works as if all fields in the select statement are primary keys. So this means in your query you are looking for distinct values for clientid, theatreid,movieId and eventType. Which is why you are seeing the duplications.

so for instance:

ClientID  |  TheatreID  |  movieID  |  eventType
123       |     234     |    456    |  LateFilmStart
123       |     2334    |    456    |  LateFilmStart
123       |     2334    |    456    |  LateFilmStart

Would output 2 records as the TheatreIDs are different, meaning they are both distinct values - but the third one would be removed as it is 100% duplicate of record 2.

My recommendation would be to add in a ROW_NUMBER() column into a subquery to remove duplicates:

select 
x.clientid, x.theaterId, x.movieId, x.eventType
from (
  select
  f.clientId as clientid, f.theaterId, f.movieId, 'LateFilmStart' as eventType,
  ROW_NUMBER() OVER(PARTITION BY f.clientID ORDER BY f.clientId) as rn
  from 
  FavoritesMovies f
) x
join
eventReference e
on x.theaterId = e.Theater 
where  
e.Status = 'active' 
and e.Event = 'LateFilmStart'
and x.rn = 1

By creating a subquery containing your FavoriteMovies data, you can now utilize ROW_NUMBER() to assign a number to all those with the same clientId. You then add in a where clause to your query to only pull those with a rn of 1. Meaning each clientId will only exist once as all further rows 2+ will not be pulled into the final results.

Related Solutions

[SalesForce] SQL Query: Data view with data extension

The query specified will return multiple rows if more than one open event is found for the particular subscriber. You can add the keyword distinct to your query to insure only unique row of data are returned by your query. This will eliminate the key violation.

SELECT distinct
a.id as Id
, a.email as email
, a.FirstName as FirstName

from contact_salesforce a

left join _Open o
on o.SubscriberKey = a.id

Normally, I avoid using the distinct keyword and opt for aggregating the query results in some way. But since this query is only selecting columns from the contact_salesforce data extension, using distinct will work in this case.

Best Answer

Related Solutions

[SalesForce] SQL Query: Data view with data extension

Related Topic