Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Using Apache Hive as a MapReduce Input Format and/or Scraping Hive Metadata

0

72 views

Our environment is heavy into storing data in hive. I find myself currently working on something that it outside the scope though. I have a mapreduce written, but it requires a lot of direct user inputs for information that could easily be scraped from Hive. That said, when I query hive for extended table data, all of the extended information is thrown out in 1 or 2 columns as a giant blob of almost-JSON. Is there either a convenient way to parse this information, or better yet, get it directly in a more direct manor?

Alternatively, if I could get pointed to documentation on manually using the CombinedHiveInputFormat, that would simplify my code a lot more. But it seems like that InputFormat is solely used inside of Hive, using it's custom structs.

Ultimately, what I want is to know table names, columns (not including partitions), and partition locations for the split a mapper is working on. If there is yet another way to accomplish this, I am eager to know.

Thanks! John

asked April 14, 2011 2:44 pm CDT
posted via StackOverflow

0 Answers

Be the first to answer this question

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions
Hive with Lucene
January 31, 2011
Hadoop/hive metastore
August 7, 2009
Scraping HTML with Regex
January 24, 2011