Best unofficial Apache Server developers community |
I'd like to combine this into 'collapsed' rows which look like this:
At the moment the source document is a hive 'table', [which is essentially the same as a flat text file I suppose] - and I am wondering what is the most efficient way to accomplish this?
posted via StackOverflow
|
|
 
|
If you are loading the data into hive from a mapreduce job you may be able to adjust that MR to do the transformation on the data for you, and load it into the table as you want it (array or ; delimited, ect) If you are looking to be able to update/adjust the data, HIVE probably isn't the best option for that. You may want to look at HBase and do an 'aggregation' to generate the data as you want it to load into HBase. Anytime the same Key/ColumnFamily/Column value is generated, it overwrites it if it exists so it will 'update' the value. I use this in production to generate data throughout the day that constantly updates. In either case, to manipulate the structure of large quantities of data, you will probably want to use a mapreduce job and have it do the restructuring for you. |