Hive Assignment

Hive Assignment 

incedo
Input:

1) Create a database “Olympic”, then within this database create a table “player” in following structure:

pid: Int, pname: String, gender: String, age: Int, height: Int, eid: Int

Note: create the partition on “gender” column.


2) Create a table “event” within this database in following structure:

eid: Int, ename: String, year: Int, season: String, city: String

Note: create the partition on “season” column and bucketing on “year” column.


3) Write a query to insert data into two table from two text files “player.txt” and “event.txt” respectively.

4) Do the query on the following scenarios:

a. List the players details who not participated in Olympic event.

b. List the number of players participated in each event

c. List the number of Male and Female Players participated in event

d. List down number of players participated in Beijing Olympic,

e. List down player details who have highest age and joined in Olympic Event.

f. Find out the player details who have long name.

g. Find out the player details whose name starts with “Ma” and participate in “Summer” season event 

Solution:

1.

create database olympic;

use olympic;

create table player (pid int,pname string,gender string,age int,height int,eid int)row format delimited fields terminated by ',';

2.

create table event (eid int,ename string,year int,season string,city string)row format delimited fields terminated by ',';

create table eventp (eid int,ename string,year int,city string) partitioned by (season string) clustered by (year) into 2 buckets row format delimited fields terminated by ',' location '/user/cloudera/eventdir';

3.

load data local inpath '/home/cloudera/event.txt' into table event;

load data local inpath '/home/cloudera/player.txt' into table player;

Set hive.exec.dynamic.partition.mode = nonstrict;

Set hive.enforce.bucketing=true;

insert into eventp partition(season) select eid,ename,year,city,season from event;

4.

a.

SET hive.auto.convert.join=false;

SELECT t1.*,t2.* FROM player t1 LEFT JOIN event t2 ON t1.eid=t2.eid where t2.eid is null;

b.

select eid,count(pid) from player group by(eid);

c.

select gender,count(pid) from player group by(gender);

d.

SELECT count(t2.city) FROM player t1 LEFT JOIN event t2 ON t1.eid=t2.eid where t2.city = "Beijing";

e.

SELECT t1.* FROM player t1 JOIN event t2 ON t1.eid = t2.eid WHERE t1.age in (SELECT MAX(age) FROM player);

f.

SELECT t1.* FROM player t1 WHERE LENGTH(t1.pname) in (SELECT MAX(LENGTH(pname)) FROM player);

g.

SELECT t1.* FROM player t1 JOIN event t2 ON t1.eid = t2.eid WHERE t2.season = 'Summer' AND t1.pname LIKE 'Ma%';


Comments