delete from YOUR_TABLE where id not in (select id from (select max(id) as id from YOUR_TABLE group by UNIQUE_FIELD) as b);
[......]
delete from YOUR_TABLE where id not in (select id from (select max(id) as id from YOUR_TABLE group by UNIQUE_FIELD) as b);
[......]
在Hive的是用中,我们经常会有这种需求:
按照同一个id进行Group By,然后对另一个字段去重,例如下面得数据:
id pic
1 1.jpg
2 2.jpg
1 1.jpg
此时,是用DISTINCT或者2 col得Group By都是不行得,我们可以用这个UDAF:collect_set(col),它将对同一个group by 得key进行set去重后,转换为一个array。
再举一个例子,我们可以对pic进行去重,拼接:
SELECT id, CONCAT_W[......]