Hadoop PIG Notes
Basic
- Pig is a scripting platform for processing and analysing large data sets.
- very usefulfor people who did not have java knowledge
- used for high level data flow and processing the data available on HDFS.
- PIG is named pig because like the animal, it can consume and process any type of data, and has lots of usage in data cleansing.
- Internally, whatever you write in Pig, it internally converts to Map reduce(MR) jobs.
- Pig is client side installation, it need not sit on hadoop cluster.
- Pig script will execute a set of commands, which will be converted to Map Reduce(MR) jobs and submitted to hadoop running locally or remotely.
- A hadoop cluster will not care whether the job was submitted from pig or from some other environment.
- map reduce programs get executed only when the DUMP or STORE command is called(more on this later).
Labels: Data flow in Pig, Pig Execution Modes, Pig Usage, Pig Vs Traditional Hadoop Map Reduce(MR), Transformations in Pig
