大象教程
首页
Spark
Hadoop
HDFS
MapReduce
Hive
Pig 教程
Pig 教程
Pig 体系结构
Pig 安装
Pig 执行
Pig Grunt Shell
Pig Latin 基础
Pig 读取数据
Pig 存储数据
Pig Dump 运算符
Pig Describe 运算符
Pig Explain 运算符
Pig illustrate 运算符
Pig GROUP 运算符
Pig Cogroup 运算符
Pig JOIN 运算符
Pig Cross 运算符
Pig Union 运算符
Pig SPLIT 运算符
Pig FILTER 运算符
Pig DISTINCT 运算符
Pig FOREACH 运算符
Pig ORDER BY 运算符
Pig LIMIT 运算符
Pig eval(求值) 函数
Pig Load & Store 函数
Pig Bag & Tuple 函数
Pig 字符串(String) 函数
Pig 日期时间函数
Pig 数学函数
#Pig GROUP 运算符 ##GROUP 运算符 GROUP运算符用于组中的数据的一个或多个关系。它收集具有相同key的数据。 **语法** 下面给出的是组运算符的语法。 ```bash grunt> Group_data = GROUP Relation_name BY age; ``` 假设我们在HDFS目录/pig_data/中有一个名为student_details.txt的文件,如下所示。 ``` 001,Rajiv,Reddy,21,9848022337,Hyderabad 002,siddarth,Battacharya,22,9848022338,Kolkata 003,Rajesh,Khanna,22,9848022339,Delhi 004,Preethi,Agarwal,21,9848022330,Pune 005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 006,Archana,Mishra,23,9848022335,Chennai 007,Komal,Nayak,24,9848022334,trivendram 008,Bharathi,Nambiayar,24,9848022333,Chennai ``` 并且我们已将这个文件以关系名称Student_details加载到Apache Pig中,如下所示。 ```bash grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); ``` 现在,让我们按age对关系中的记录/元组进行分组,如下所示。 ```bash grunt> group_data = GROUP student_details by age; ``` **验证** 如下所示,使用DUMP运算符验证关系group_data。 ```bash grunt> Dump group_data; ``` 然后,您将获得输出,显示名为group_data的关系的内容,如下所示。在这里您可以观察到生成的模式有两列- 一个是age,通过它我们将关系分组。 另一个是一个bag,其中包含元组,带有相应年龄的学生记录。 ```bash (21,{(4,Preethi,Agarwal,21,9848022330,Pune),(1,Rajiv,Reddy,21,9848022337,Hydera bad)}) (22,{(3,Rajesh,Khanna,22,9848022339,Delhi),(2,siddarth,Battacharya,22,984802233 8,Kolkata)}) (23,{(6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336 ,Bhuwaneshwar)}) (24,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334, trivendram)}) ``` 使用describe命令对数据进行分组后,可以看到表的架构,如下所示。 ```bash grunt> Describe group_data; group_data: {group: int,student_details: {(id: int,firstname: chararray,lastname: chararray,age: int,phone: chararray,city: chararray)}} ``` 以同样的方式,您可以使用Illustra命令获得该模式的样本插图,如下所示。 ```bash $ Illustrate group_data; ``` 它将产生以下输出: ```bash ------------------------------------------------------------------------------------------------- |group_data| group:int | student_details:bag{:tuple(id:int,firstname:chararray,lastname:chararray,age:int,phone:chararray,city:chararray)}| ------------------------------------------------------------------------------------------------- | | 21 | { 4, Preethi, Agarwal, 21, 9848022330, Pune), (1, Rajiv, Reddy, 21, 9848022337, Hyderabad)}| | | 2 | {(2,siddarth,Battacharya,22,9848022338,Kolkata),(003,Rajesh,Khanna,22,9848022339,Delhi)}| ------------------------------------------------------------------------------------------------- ``` ##按多列分组 让我们按age和city分组关系,如下所示。 ```bash grunt> group_multiple = GROUP student_details by (age, city); ``` 您可以使用Dump运算符验证名为group_multiple的关系的内容,如下所示。 ```bash grunt> Dump group_multiple; ((21,Pune),{(4,Preethi,Agarwal,21,9848022330,Pune)}) ((21,Hyderabad),{(1,Rajiv,Reddy,21,9848022337,Hyderabad)}) ((22,Delhi),{(3,Rajesh,Khanna,22,9848022339,Delhi)}) ((22,Kolkata),{(2,siddarth,Battacharya,22,9848022338,Kolkata)}) ((23,Chennai),{(6,Archana,Mishra,23,9848022335,Chennai)}) ((23,Bhuwaneshwar),{(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)}) ((24,Chennai),{(8,Bharathi,Nambiayar,24,9848022333,Chennai)}) (24,trivendram),{(7,Komal,Nayak,24,9848022334,trivendram)}) ``` ##全部分组 您可以按如下所示的所有列对关系进行分组。 ```bash grunt> group_all = GROUP student_details All; ``` 现在,验证关系group_all的内容,如下所示。 ```bash grunt> Dump group_all; (all,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334 ,trivendram), (6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336,Bhuw aneshwar), (4,Preethi,Agarwal,21,9848022330,Pune),(3,Rajesh,Khanna,22,9848022339,Delhi), (2,siddarth,Battacharya,22,9848022338,Kolkata),(1,Rajiv,Reddy,21,9848022337,Hyd erabad)}) ```
加我微信交流吧