Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

COLLECT_SET() in Hive, keep duplicates?

0

57 views

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregate all of the items in a column that have the same key into an array, with duplicates.

I.E.:

hash_id | num_of_cats
=====================
ad3jkfk            4
ad3jkfk            4
ad3jkfk            2
fkjh43f            1
fkjh43f            8
fkjh43f            8
rjkhd93            7
rjkhd93            4
rjkhd93            7

should return:

hash_agg | cats_aggregate
===========================
ad3jkfk   Array<int>(4,4,2)
fkjh43f   Array<int>(1,8,8)
rjkhd93   Array<int>(7,4,7)

asked June 22, 2011 2:23 pm CDT
posted via StackOverflow

0 Answers

Be the first to answer this question

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions
Hive with Lucene
January 31, 2011
Hadoop/hive metastore
August 7, 2009