Programming Breaks: Setup Hadoop YARN on CentOS VMs

Thursday, March 31, 2016

Setup Hadoop YARN on CentOS VMs

This post summarizes my experience in setting up testing environment for YARN using CentOS VMs.

After we setup the hdfs following the link (http://czcodezone.blogspot.sg/2016/03/setup-hdfs-cluster-in-centos-vms.html). We can go ahead to set up yarn to manage jobs in hadoop.

Hadoop v2 uses application masters in the datanode to work together with nodemanager to manage a job, and uses resource manager in the namenode to schedule resources for a job (a job refers to a particular application or driver from distributed computation framework such as mapreduce or spark).

1. Setup yarn configuration in hadoop

To setup yarn, on each VM (both namenode and datanode), perform the following steps:

1.1. Edit hadoop/etc/hadoop/mapred-site.xml

Run the following command to edit the mapred-site.xml:

```bash
`cd hadoop/etc/hadoop
`cp mapred-site.xml.template mapred-site.xml
`vi mapred-site.xml

In the mapred-site.xml, Modify as follows:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://centos01:9000</value>
</property>
</configuration>

The "hdfs://centos01:9000" specify the master as the "fs.default.name". (It is important that this is not specified as "hdfs://localhost:9000", otherwise the "hadoop/bin/hdfs dfsadmin -report" will have "Connection refused" exception)

1.2 Edit hadoop/etc/hadoop/yarn-site.xml

Run the following command to edit the yarn-site.xml:

```bash
`vi hadoop/etc/hadoop/yarn-site.xml

In the yarn-site.xml, modify as follows:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

2. Start the yarn in the namenode.

On the namenode centos01 (please refers to http://czcodezone.blogspot.sg/2016/03/setup-centos-vm-in-virtualbox-for.html), run the following command to start the hdfs and then yarn:

```bash
`hadoop/sbin/start-dfs.sh
`hadoop/sbin/start-yarn.sh
`hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
`jps

3. Stop hdfs and yarn

To stop hdfs and yarn, run the following command:

```bash
`hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
`hadoop/sbin/stop-yarn.sh
`hadoop/sbin/stop-dfs.sh

2 comments:

UnknownOctober 15, 2019 at 2:59 AM
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Big data online training
ReplyDelete
Replies
AnonymousMay 17, 2022 at 7:58 PM
tül perde modelleri
Sms Onay
mobil ödeme bozdurma
Nftnasilalinir.com
ankara evden eve nakliyat
trafik sigortası
dedektör
web sitesi kurma
aşk kitapları
ReplyDelete
Replies

Add comment

Programming Breaks