Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Hive - How can I write a create statement for a variable length, existing, hdfs file??

0

52 views

So, I have an existing hdfs directory, containing a bunch of files. These files are all tab delimited.

I have a hive statement....

create external table
   mytable(
      key string,
      name string,
      address string,
      ssn string)
row format delimited fields
terminated by '09', lines terminted by '10'
STORED AS TEXTFILE location '/MyHiveFiles/data';

This works pretty well, except for all of the extra fields. The file also contains between 0 and x extra data elements after the ssn field. They are still tab delimited, and '\n' record delimited. I could add a bunch of 'valuex string' (where x is the increment of extra elements)... but I don't know how many there might eventually be, and that seems messy anyway.

Is there a way to tell hive to just put all the remaining fields of that row into ONE field, like 'others string'? Even if it is tab delimted in the hive return value... I am ok with that.

Thanks, in advance.

asked May 12, 2011 7:17 am CDT
posted via StackOverflow

1 Answers

0
Best answer
 

Creating a table in Hive essentially just creates the Metadata telling hive how to interpret the files. Hive doesn't 'know' about the rest of the data.

If you add another column as an array and specify COLLECTION ITEMS TERMINATED BY '\0002' (\0002 or some other character) then the tabs will not terminate the array collection and should all be returned as a single element, including tabs. Haven't tested this yet. :)

answered May 13, 2011 1:18 pm CDT

Your answer

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions
File blocks on HDFS
February 4, 2011