12 1 Teknologi dan Tools Big Data Bagian 2 Big Data L1617 v5

  Teknologi dan Tools Big Data (Bagian 2) Imam Cholissodin | imam.cholissodin@gmail.com

  

Pokok Bahasan

  1. Konsep Single Vs Multi-Node Cluster

  2. Konfigurasi Hadoop (lanjutan) o

Single Node Cluster Pada Linux & Windows

  o

Multi-Node Cluster Pada Linux & Windows ( Now )

  3. Studi Kasus

  4. Tugas

  • • Sebuah cluster HDFS terdiri dari namenode untuk

    mengelola metadata dari kluster, dan datanode untuk menyimpan data/file.
  • File dan direktori diwakili pada namenode menyimpan

    atribut seperti permission, modifikasi dan waktu akses,

    atau kuota namespace dan diskspace.
  • Namenode aktif memonitor jumlah salinan/replika

  blok file. Ketika ada salinan blok file yang hilang karena kerusakan pada datanode, namenode akan mereplikasi kembali blok file tersebut ke datanode lainnya yang berjalan baik.

  

ResourceManager di node master, yang berfungsi mengatur semua resource

yang digunakan aplikasi dalam sistem.

  NodeManager di Agen-Framework setiap node slave, yang bertanggung jawab terhadap Container, dengan memantau penggunaan resource/sumber daya dari container(cpu, memori, disk, jaringan) dan melaporkannya pada

   Setting PC Master + (PC Node1, Node2, Node3):

   Setting PC Master + (PC Node1, Node2, Node3): o

  Lakukan hal berikut (Jika mau dicoba setting dari awal): nidos@master:~$ sudo apt-get update nidos@master:~$ sudo apt-get install default-jdk (cek dengan java -version) nidos@master:~$ sudo apt-get install ssh nidos@master:~$ ssh-keygen -t rsa -P "" nidos@master:~$ Ketikkan “cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys” nidos@master:~$ wget http://mirror.wanxp.id/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz nidos@master:~$ sudo tar xvzf hadoop-2.7.3.tar.gz nidos@master:~$ sudo mv hadoop-2.7.3 /usr/local/hadoop nidos@master:~$ sudo nano ~/.bashrc nidos@master:~$ source ~/.bashrc

  Jika memang PC master sudah disetting, atau sudah di-clone dari PC master dari project Single Node Cluster pertemuan sebelumnya, maka abaikan pada line terakhir, tambahkan berikut:

  “ export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$ HADOOP_INSTALL/lib/native“ export HADOOP_CLASSPATH=/usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar " nidos@master:~$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

  Ubah “export JAVA_HOME=....” menjadi: export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop_tmp

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /etc/hostname

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /etc/hosts

   Cek default Route dan Primary DNS: o

  Lakukan hal berikut:

  Klik , lalu klik “Connections Information”, pilih Ethernet, lalu tekan tombol edit

  Ubah Method: Automatic (DHCP)

  Lalu klik “Connections Information”, lagi

   Cek default Route dan Primary DNS: o

  Anda sudah mendapatkan “default route dan Primary DNS”

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Set IP PC Master, lakukan hal berikut:

  Klik , lalu klik “Edit Connections...”, lalu tekan tombol edit

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut: Dari slide ke-8

   Setting PC Master + (PC Node1, Node2, Node3): o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /etc/hosts

  Ubah Method menjadi: Manual

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters

  master

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves node1 node2 node3 nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml ..

  <configuration> <property> <name>dfs.replication</name> <value>3</value>

  </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>

  </property>

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml ..

  <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml ..

  <configuration> <property> <name>mapred.job.tracker</name> <value>master:54311</value>

  </property> </configuration>

  

Setting PC Master + (PC Node1, Node2, Node3):

o

  Lakukan hal berikut:

  nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml ..

  <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>

  </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property> <property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>master:8025</value>

  </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value>

  </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8050</value>

  </property>

   Clone PC Master to (PC Node1, Node2, Node3): o

  Lakukan hal berikut (shutdown PC Master, lalu klik kanan, klik Clone, beri nama node1, klik Next, Pilih Linked, Klik Clone): o

  Lakukan juga untuk node2 dan node3

   Setting PC Master: o

  Lakukan hal berikut:

  nidos@master:~$ sudo rm -rf /usr/local/hadoop_tmp/

  nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode

  nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop_tmp nidos@master:~$

   Setting PC Node1, Node2 dan Node3: o

  Lakukan hal berikut: nidos@node1:~$ sudo rm -rf /usr/local/hadoop_tmp/

  nidos@node1:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode nidos@node1:~$ sudo chown -R nidos:nidos /usr/local/hadoop_tmp/ nidos@node1:~$

  

Setting PC Master + ( PC Node1 , Node2 , Node3 ):

o

  Lihat IP PC Master: PC Master

  

Setting PC Master + ( PC Node1 , Node2 , Node3 ):

o

  Setting IP PC Node1, lakukan hal berikut: PC Node1

  

Setting PC Master + ( PC Node1 , Node2 , Node3 ):

o

  Setting IP PC Node2, lakukan hal berikut: PC Node2

  

Setting PC Master + ( PC Node1 , Node2 , Node3 ):

o

  Setting IP PC Node3, lakukan hal berikut: PC Node3

  

Setting PC Master + ( PC Node1 , Node2 , Node3 ):

o

Restart Network disemua PC, lakukan hal berikut:

  nidos@master:~$ sudo /etc/init.d/networking restart [sudo] password for nidos: nidos@master:~$ sudo reboot nidos@node1:~$ sudo /etc/init.d/networking restart [sudo] password for nidos: nidos@master:~$ sudo reboot nidos@node2:~$ sudo /etc/init.d/networking restart [sudo] password for nidos: nidos@node2:~$ nidos@node3:~$ sudo /etc/init.d/networking restart [sudo] password for nidos: nidos@master:~$ sudo reboot

  

Setting PC Master + ( PC Node1 , Node2 , Node3 ):

o

Lakukan hal berikut:

  nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml . . <configuration> <property>

  <name>dfs.replication</name> <value>3</value>

  </property> <property>

  <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>

  </property> </configuration>

  Lakukan juga untuk node2 dan node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Lakukan hal berikut:

  nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters

  Lakukan juga

  untuk node2 dan node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Lakukan hal berikut:

  nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves

  Lakukan juga

  untuk node2 dan node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Lakukan hal berikut:

  nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml ..

  <configuration> <property>

  Lakukan juga

  <name>fs.default.name</name>

  untuk node2 dan

  <value>hdfs://master:9000</value>

  node3

  </property> </configuration>

  nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml ..

  <configuration> <property>

  Lakukan juga

  <name>mapred.job.tracker</name>

  untuk node2 dan

  <value>master:54311</value>

  node3

  </property> </configuration>

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Lakukan hal berikut:

  nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml . .

  <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>

  </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8025</value>

  </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value>

  </property> <property> <name>yarn.resourcemanager.address</name>

  Lakukan juga <value>master:8050</value>

  untuk node2 dan

  </property>

  node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Call ssh, lakukan hal berikut:

  Lalu tekan tombol Tab nidos@master:~$ ssh ::1 ip6-allrouters master fe00::0 ip6-localhost node1 ff00::0 ip6-localnet node2 ff02::1 ip6-loopback node3 ff02::2 ip6-mcastprefix ubuntu

OK

  Solusi (cek status ssh ):

  ip6-allnodes localhost

  nidos@master:~$ sudo service ssh status [sudo] password for nidos: ssh start/running, process 790

  nidos@master:~$ ssh node1 ssh: connect to host node1 port 22: No route to host nidos@master:~$

Error

  Solusi (cek status ssh ): nidos@node1:~$ sudo service ssh status

  OK [sudo] password for nidos:

  Solusi (re-install ssh, dan cek status ):

  nidos@node1:~$ sudo apt-get remove openssh-client openssh-server nidos@node1:~$ sudo apt-get install openssh-client openssh-server Jika muncul: nidos@node1:~$ sudo service ssh status ssh: unrecognized service ssh start/running, process 3100

  Lakukan juga untuk node2 dan node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Call ssh, lakukan hal berikut:

  Lalu tekan tombol Tab nidos@master:~$ ssh ::1 ip6-allrouters master fe00::0 ip6-localhost node1 ff00::0 ip6-localnet node2 ff02::1 ip6-loopback node3 ff02::2 ip6-mcastprefix ubuntu

OK

  Solusi (cek status ssh ):

  ip6-allnodes localhost

  nidos@master:~$ sudo service ssh status [sudo] password for nidos: ssh start/running, process 790

  nidos@master:~$ ssh node2 ssh: connect to host node2 port 22: No route to host nidos@master:~$

Error

  Solusi (cek status ssh ): nidos@node2:~$ sudo service ssh status

  OK [sudo] password for nidos:

  Solusi (re-install ssh, dan cek status ):

  nidos@node2:~$ sudo apt-get remove openssh-client openssh-server nidos@node2:~$ sudo apt-get install openssh-client openssh-server Jika muncul: nidos@node2:~$ sudo service ssh status ssh: unrecognized service ssh start/running, process 3084

  Lakukan juga untuk node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Call ssh, lakukan hal berikut:

  Lalu tekan tombol Tab nidos@master:~$ ssh ::1 ip6-allrouters master fe00::0 ip6-localhost node1 ff00::0 ip6-localnet node2 ff02::1 ip6-loopback node3 ff02::2 ip6-mcastprefix ubuntu

OK

  Solusi (cek status ssh ):

  ip6-allnodes localhost

  nidos@master:~$ sudo service ssh status [sudo] password for nidos: ssh start/running, process 790

  nidos@master:~$ ssh node3 ssh: connect to host node2 port 22: No route to host nidos@master:~$

Error

  Solusi (cek status ssh ): nidos@node3:~$ sudo service ssh status

  OK [sudo] password for nidos:

  Solusi (re-install ssh, dan cek status ):

  nidos@node3:~$ sudo apt-get remove openssh-client openssh-server nidos@node3:~$ sudo apt-get install openssh-client openssh-server Jika muncul: nidos@node3:~$ sudo service ssh status ssh: unrecognized service ssh start/running, process 3087

   Solusi untuk error “ssh: connect to host master/node1/node2/node3 port 22: No route to host”, lakukan hal berikut:

  nidos@master:~$ sudo iptables -P INPUT ACCEPT (to accept all incoming traffic) nidos@master:~$ sudo iptables -F (Clear/flush/remove rule of my iptables)

  klik PC master klik

  shutdown all PC, ubah setting network pada virtual box (Pilih misal , lalu

  Network

  )

   Solusi untuk error “ssh: connect to host master/node1/node2/node3 port 22: No route to host”, lakukan hal berikut:

  nidos@master:~$ sudo iptables -P INPUT ACCEPT (to accept all incoming traffic) nidos@master:~$ sudo iptables -F (Clear/flush/remove rule of my iptables)

  klik PC master klik

  shutdown all PC, lalu ubah setting network pada virtual box (Pilih misal , lalu

  Network

  , pada Adapter 1, pilih “Internal Network”, lalu klik OK)

  Lakukan juga

  untuk node1, node2 dan node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Coba call lagi ssh-nya node1 dari master, lakukan hal berikut: Atau dengan nidos@master:~$ ssh node1 nidos@master:~$ ssh 192.168.2.117 The authenticity of host 'node1 (192.168.2.117)' can't be established.

  ECDSA key fingerprint is 87:d8:ac:1e:41:19:a9:1d:80:ab:b6:2c:75:f9:27:85. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'node1' (ECDSA) to the list of known hosts. nidos@node1's password: Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)

  • Documentation: https://help.ubuntu.com/ New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it.

  Last login: Sat Dec 3 13:16:28 2016 from master nidos@node1:~$ exit logout Connection to node1 closed. nidos@master:~$

  Lakukan juga untuk mencoba:  call ssh-nya node2 dari master  call ssh-nya node3

   Setting PC Master + ( PC Node1 , Node2 , Node3 ): o

  Coba call ssh-nya master dari node1, lakukan hal berikut: Atau dengan nidos@node1:~$ ssh master nidos@node1:~$ ssh 192.168.2.116 nidos@master's password: Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)

  • Documentation: https://help.ubuntu.com/ 631 packages can be updated. 331 updates are security updates. Last login: Sat Dec 3 13:27:54 2016 from node1

  nidos@master:~$

  Lakukan juga untuk mencoba:  call ssh-nya master dari node2  call ssh-nya master dari node3 nidos@master:~$ hdfs namenode

   Format namenode dari PC Master:

  • –format

  

Copy ssh-id dari PC Master ke semua PC Node:

  nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node1 nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node2 nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node3 atau dengan perintah seperti berikut: nidos@master:~$ ssh-copy-id nidos@node1

  nidos@master:~$ ssh-copy-id nidos@node2

  nidos@master:~$ ssh-copy-id nidos@node3 Now you will be able to ssh without password.

  

start-dfs.sh lalu start-yarn.sh (atau dengan start-all.sh)

dari PC Master:

  nidos@master:~$ start-dfs.sh

  

start-dfs.sh lalu start-yarn.sh (atau dengan start-all.sh)

dari PC Master:

  nidos@master:~$ start-yarn.sh

   Buka firefox “http://localhost:50070”:

   Buka firefox “http://localhost:50070”:

   Buka firefox “http://localhost:50090/status.html”:

   Buka firefox “http://localhost:8088/cluster”:

   Buka firefox “http://localhost:8088/cluster”:

   Membuat Directories di HDFS harus satu demi satu: o

Lakukan hal berikut:

  nidos@master:~$ cd /usr/local/hadoop nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/wordcount nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos Found 1 items drwxr-xr-x - nidos supergroup 0 2016-12-05 07:40 /user/nidos/wordcount nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/wordcount/input

  

Menghitung Kemunculan Kata dalam file dokumen:

o

Lakukan hal berikut:

  Buat file dokumen yang akan diuji (misal): nidos@master:/usr/local/hadoop$ cd nidos@master:~$ cd /home/nidos/Desktop/ nidos@master:~/Desktop$ mkdir data nidos@master:~/Desktop/data$ >> a.txt nidos@master:~/Desktop/data$ gedit a.txt

  

Menghitung Kemunculan Kata dalam file dokumen:

o

  Lakukan hal berikut:

Buat file “WordCount.java”:

  nidos@master:~/Desktop/data$ cd /usr/local/hadoop nidos@master:/usr/local/hadoop$ >> WordCount.java nidos@master:/usr/local/hadoop$ gedit WordCount.java nidos@master:/usr/local/hadoop$ ls bin include libexec logs README.txt share etc lib LICENSE.txt NOTICE.txt sbin WordCount.java

  

Siapkan file *.java (msial WordCount.java Part 1 of 2) untuk dicompile ke *.jar:

import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); }

  } }

  

Siapkan file *.java (msial WordCount.java Part 2 of 2) untuk dicompile ke *.jar:

public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values,

  Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result);

  } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration();

  Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1);

  } }

  

Menghitung Kemunculan Kata dalam file dokumen:

o file “WordCount.java”:

   WordCount.java dicompile ke *.jar: o

  Lakukan hal berikut:

  nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main WordCount.java nidos@master:/usr/local/hadoop$

   Hasil: nidos@master:/usr/local/hadoop$ jar cf wc.jar WordCount*.class

   Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input dan Running proses perhitungan kata dalam file dokumen: o

  Lakukan hal berikut:

  Jika menggunakan hdfs, maka gunakan dfs Jika menggunakan hadoop, maka gunakan fs nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /home/nidos/Desktop/data/a.txt /user/nidos/wordcount/input

Jika folder output sudah ada, maka sebaiknya membuat output lainnya, misal “output2” nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount

  /user/nidos/wordcount/input/a.txt /user/nidos/wordcount/output nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos/wordcount/output Found 2 items

  • rw-r--r-- 3 nidos supergroup 0 2016-12-05 08:29 /user/nidos/wordcount/output/_SUCCESS
  • rw-r--r-- 3 nidos supergroup 1189 2016-12-05 08:29 /user/nidos/wordcount/output/part-r- 00000 nidos@master:/usr/local/hadoop$ bin/hdfs dfs -cat /user/nidos/wordcount/output/part*

   Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input dan Running proses perhitungan kata dalam file dokumen: o

  Lakukan hal berikut:

  nidos@master:/usr/local/hadoop$ bin/hdfs dfs -cat /user/nidos/wordcount/output/part*

   Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input dan Running proses perhitungan kata dalam file dokumen:

   Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input dan Running proses perhitungan kata dalam file dokumen:

   Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input dan Running proses perhitungan kata dalam file dokumen:

   Siapkan file, misal b.txt, Copy file /home/nidos/Desktop/data/b.txt ke /user/hduser/wordcount/input dan Running proses perhitungan kata dalam file dokumen: o

Lakukan hal berikut:

  nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /home/nidos/Desktop/data/b.txt /user/nidos/wordcount/input Menjalankan JAR untuk wordcount untuk satu file dalam satu folder (misal file b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/b.txt /user/nidos/wordcount/output2 Atau, menjalankan JAR untuk wordcount untuk semua file dalam satu folder (file a.txt dan b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/ /user/nidos/wordcount/output2 Cara menghapus folder HDFS (misal hapus folder /user/nidos/wordcount/output): nidos@master:/usr/local/hadoop$ hadoop fs -rm -r -f /user/nidos/wordcount/output

Tugas Kelompok

  

1. Jelaskan perbedaan dari Hadoop Single Node Cluster dan Hadoop Multi Node

Cluster!

  

2. Lakukan Studi Kasus WordCount dengan dokumen yang berbeda pada Hadoop

Multi Node Cluster! dan berilah penjelasan untuk setiap langkah-langkahnya disertai screenshot!

  

3. Berdasarkan slide ke-56, jelaskan hasil perbedaan jika dijalankan (a) dan jika

dijalankan (b)

  a. Menjalankan JAR untuk wordcount untuk satu file dalam satu folder (misal file b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/b.txt /user/nidos/wordcount/output2

  b. Atau, menjalankan JAR untuk wordcount untuk semua file dalam satu folder (file a.txt dan b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/ /user/nidos/wordcount/output2

  Terimakasih Imam Cholissodin | imam.cholissodin@gmail.com