LEI'S DESIGN SERVICES

EXPERIENCED IN CHIP DESIGN VERIFICATION

Chapter 2:  MapReduce Number Count Using Java

 

 Figure 1:  Number Counting MapReduce Diagram

 

2.1  Dataset

            Source: Downloaded winning numbers from www.calottery.com on 7/6/2015 for

California Mega Millions Lottery.

 

            File size:  106 KB

            Number of records:  1047

 

2.2    Problem Statement

a)         Use MapReduce to find the winning number frequency count in various periods annually and store the numbers along with the frequency counts in output reports.  The periods are as follow:

                   From 2005 to 2005,

                    From 2006 to 2005,

                    From 2007 to 2005,

                    From 2008 to 2005,

                    From 2009 to 2005,

                    From 2010 to 2005,

                    From 2011 to 2005,

                    From 2012 to 2005,

                    From 2013 to 2005, and

                    From 2014 to 2005.

 

b)        In each period, use MapReduce to gather data with various mathematical methods:

           i)                    the 6 most frequent winning numbers from all 6 winning numbers together,

           ii)                   the 6 most frequent winning numbers from all 5 winning numbers together, except the mega number,

           iii)                 the 5 most frequent winning numbers from all 5 winning numbers and the most frequent winning number from the mega number,

           iv)                 the most frequent winning number from each individual drawing number, altogether 6 numbers;  For example, top number from winning drawing #1,  top number from winning drawing #2, top number from winning drawing #3, top number from winning drawing #4, top number from winning drawing #5, and top number from mega number.

c)        For each of the methods in b), run the jar file generated from the Java MapReduce files. 

 

2.3    Code

            For example, code for counting number strings for all 5 winning numbers, except the mega number which is one of the methods I used, is listed below:

            The code consists of the Java main number counting class, the mapper class, and the reducer class as shown in Figure 1.

            Below is one of the example of the Java main class:

   import org.apache.hadoop.conf.Configuration;

   import org.apache.hadoop.conf.Configured;

   import org.apache.hadoop.fs.Path;

   import org.apache.hadoop.io.IntWritable;

   import org.apache.hadoop.io.Text;

   import org.apache.hadoop.mapreduce.Job;

   import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

   import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

   import org.apache.hadoop.util.GenericOptionsParser;

   import org.apache.hadoop.util.Tool;

   import org.apache.hadoop.util.ToolRunner;

 

   //Driver code for number String counting

   public class lottoStats  extends Configured implements Tool{

    

               public int run(String[] args) throws Exception

               {

                  Configuration conf = new Configuration();

                  String[] otherArgs = new GenericOptionsParser(conf, args).get

RemainingArgs();

                     if (args.length != 2) {                  

    

                     System.err.println("Usage:  hadoop jar lottoStats.jar

 com.lotto.lottoStats <input path> <output path>");

                System.err.printf("Argument length: %d", args.length);             

                     System.exit(-1);

                  }

                 

                 

                  Job job = new Job(conf, "NumberCounter");

                 

                  job.setJarByClass(lottoStats.class);

                  job.setMapperClass(lottoMapper.class);

                  job.setReducerClass(lottoReducer.class);

                 

                  job.setMapOutputKeyClass(Text.class);

                  job.setMapOutputValueClass(IntWritable.class);

                 

                  job.setOutputKeyClass(Text.class);

                  job.setOutputValueClass(IntWritable.class);

                 

                  FileInputFormat.addInputPath(job,  new Path(otherArgs[0]));

                  FileOutputFormat.setOutputPath(job,  new Path(otherArgs[1]));

                 

                  System.exit(job.waitForCompletion(true) ? 0 : 1);

                  boolean success = job.waitForCompletion(true);

                  return success ? 0 : 1;

                 

      }

 

       public static void main(String[] args) throws Exception {

           lottoStats driver = new lottoStats();

           int exitCode = ToolRunner.run(driver, args);

           System.exit(exitCode);   

           }

   }

   

Below is an example of the mapper class:

 

package com.lotto;

 

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

import java.util.StringTokenizer;

 

public class lottoMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

            private Text num = new Text();

            public void map(LongWritable key, Text value, Context context) throws

               IOException, InterruptedException {

                        String line = value.toString();

                        StringTokenizer tokenizer = new StringTokenizer(line, " ");

                        int i = 0; //initialize the counter

                       

                        while (tokenizer.hasMoreTokens()) {

                                            

                                    num.set(tokenizer.nextToken());

                                   

                                            if ((i >= 5) && (i <= 10))  {

//all 5 winning numbers, except the mega number

                                                         context.write(num,  new IntWritable(1));

                                                      }                                    

                                if (i < 11) { 

                                     i = i + 1;

                                            

                                }

                                 else

                                             i = 0; //reset counter to 0

                        }

            }

}  //end of lottoMapper class

 

 

Finally, below is the reducer class example:

 

package com.lotto;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

 

public class lottoReducer extends Reducer<Text, IntWritable, Text,

IntWritable> {

   public void reduce(Text key, Iterable<IntWritable> values, Context context)

      throws IOException, InterruptedException {

   int sum = 0;

   

   for (IntWritable val : values) {

                sum += val.get(); 

//accumulate the number string counts for each unique number string

                

             }

             

             context.write(key,  new IntWritable(sum));

   }

} //end of lottoReducer class

 

 

2.4    Execution

a)  The code in 2.3 generates  a jar file that is called CaliforniaLotto.jar.  The unix command to generate the number counts for all 5 winning numbers is:

hadoop jar CaliforniaLotto.jar com.lotto.lottoStats /input/2006to2005.txt /output/2006to2005_all6

where com.lotto.lottoStats is the package, the sampling input file contains all the past winning lotto number from 2006 to 2005 sampling period is /input/2006to2005.txt, and the output directory is /output/2006to2005all6.

 

Section 2.2 b) method ii) will use the above command to generate a list of numbers in the output file: /output/2006to2005_all6/part-r-00000, as an example in 2.5 Execution section.

 

b)  In addition, Section 2.2 b) method i) will generate the 6 most frequent winning numbers from all 6 winning numbers together with the following command:

hadoop jar CaliforniaLottoall5Nmega.jar com.lottoall5Nmega.lottoall5NmegaStats /input/2006to2005.txt /output/2006to2005_all5Nmega

where com.lottoall5Nmega.lottoall5NmegaStats is the package, the input file contains all the past winning lotto numbers is /input/2006to2005.txt, and the output directory is /output/2006to2005all5Nmega.

 

c)  Section 2.2 b) method iii) will generate the 5 most frequent winning numbers from all 5 winning numbers and the most frequent winning number from the mega number with the following commands:

hadoop jar CaliforniaLotto.jar com.lotto.lottoStats /input/2006to2005.txt /output/2006to2005_all6

hadoop jar CaliforniaLottomega.jar com.lotto6.lotto6Stats /input/2006to2005.txt

/output/2006to2005_allmega

 where the top 5 numbers from /output/2006to2005_all6 directory and top number from /output/2006to2005_allmega number will be concatenated into 6 numbers after using Pig to post-process the data in Chapter 3

 

a)      Section 2.2 b) method iv) will generate the most frequent winning number from each individual winning number with the following 6 commands:

  hadoop jar CaliforniaLotto1.jar com.lotto1.lotto1Stats /input/2006to2005.txt /output/2006to2005_all1

hadoop jar CaliforniaLotto2.jar com.lotto2.lotto2Stats /input/2006to2005.txt /output/2006to2005_all2

hadoop jar CaliforniaLotto3.jar com.lotto3.lotto3Stats /input/2006to2005.txt

/output/2006to2005_all3

hadoop jar CaliforniaLotto4.jar com.lotto4.lotto4Stats /input/2006to2005.txt

/output/2006to2005_all4

hadoop jar CaliforniaLotto5.jar com.lotto5.lotto5Stats /input/2006to2005.txt

/output/2006to2005_all5

hadoop jar CaliforniaLottomega.jar com.lotto6.lotto6Stats /input/2006to2005.txt

/output/2006to2005_allmega

where the top individual numbers from each output files are concatenated together into 6 individual numbers later after using Pig to post-process the data in Chapter 3.

Here is the content of the output file as an example:

File location:  /output/2006to2005_all6/part-r-00000

Content:

 

hdfs dfs -cat /output/2006to2005_all6/part-r-00000

OpenJDK Server VM warning: You have loaded library /home/notroot/lab/software/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

15/09/06 19:46:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

1        13

10      9

11      8

12      15

13      18

14      20

15      12

16      12

17      15

18      17

19      6

2        15

20      19

21      9

22      12

23      10

24      18

25      24

26      8

27      17

28      15

29      14

3        11

30      11

 Click here to go to Chapter 3:  www.leisdesignservices.com/pig.htm

 

 Or click here to go to the Table of Content:  www.leisdesignservices.com/hadoopproofofconcept.htm