Apache Hive is a very powerful tool for processing data stored in Apache Hadoop. Structured and unstructured data can be accessed, processed, and manipulated using a SQL-like query language. This architecture allows anyone with reasonable SQL knowledge to write complex jobs with little to no knowledge of Hadoop, HDFS, and Hive.
[Read More]Welcome to my site. Check the links at the bottom for where else you can find me.
Old Scuba Pictures Live on Again
I finally got around to doing a big batch upload of scuba pics to Flickr. The big writeup of hurricane Wilma is available again too.
Complex Counts in Hive
This came up on the Hive mailing list and I’m putting it here as a reminder to try it out. Here’s how to do complex count statements to simplify queries.
SELECT
type
, count(*)
, count(DISTINCT u)
, count(CASE WHEN plat=1 THEN u ELSE NULL END)
, count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL END)
, count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
, count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
FROM
t
WHERE
dt in ("2012-1-12-02", "2012-1-12-03")
GROUP BY
type
ORDER BY
type
;