Wednesday, March 30, 2016

Setup CentOS VM in VirtualBox for Software Development and Distributed Computation on Spark and HDFS

This post summarize my experience on setting up a simple CentOS VM for development environment using VirtualBox that i used to test-run distributed jobs in spark and HDFS cluster.

1. Create vbox centos VM

Launch virtualbox, and create a VM using the centos iso image downloaded. Configure the VM settings to have the network adapter in the "Network" tab:

Host-only adapter

If you need to share a folder in the host computer with the centos VM, add the shared folder in the "Shared Folders" tab of the VM settings and configure it to have:

1. Full Access
2. Auto Mount

In this example, it is assumed the shared folder is named "git" and it is available on "C:\Users\xschen\git" on the host computer.

2. Install the development tools in the centos VM

Launch the centos VM and follows the standard installation steps. Make sure that the three network adapters are enabled when installing the centos.

After the installation is completed, run the following commands to install the necessary tools:

`yum update
`yum install -y java-1.8.0-openjdk-devel
`yum install -y maven
`yum install -y kernel-devel
`yum install -y gcc
`yum install -y bzip2

The java-1.8.0-openjdk-devel and the maven are used for java development, and the kernel-devel, gcc, and bzip2 can be used for compiling C or C++ based source codes (which will be needed later to install the vboxsf)

3. Access shared folder on the host computer

In order to access the shared folder "git" on the host computer, we must first install the VirtualBoxLinuxAdditions so that vboxsf is available in centos VM.

3.1 Mount and install VBoxGuestAddition

Click the "Device-->Insert Guest Addition CD Image" in the menu of the VM user display, and the VBoxGuestAdditions.iso will be mounted on the VM cdrom. To see the device that mounts the VBoxGuestAdditions.iso, run the following command in the centos VM:

`ls /dev -l | grep cd

You should see something like /dev/sr0 which indicates mounted iso there. Run the following commands to access the mounted iso:

`mkdir /mnt/dvd
`mount -r -t iso9660 /dev/sr0 /mnt/dvd

Now run the following command to install the vboxsf:

`cd /mnt/dvd

if encounter any error, run the following command:

`yum groupinstall "Development Tools"

After the above commands are successfully executed, you should be able to see vboxsf is available in the centos VM by running the following command:

`lsmod | grep vbox

3.2. Mount and access the shared folder

Run the following commands to mount and access the shared folder:

`mkdir /mnt/git
`mount -t vboxsf git /mnt/git

Now to access the shared folder, just enter the following commands:

`cd /mnt/git

4. Assign static ip address to a network adapter 

In this example, we want to assign a static ipaddress to the network adapter. It is assumed that:

1. the host computer is in the subnet "192.168.56.*"
2. the static ip address to the host-only network adapter is "" (which will be in the same subnet as the host computer)

In the centos VM, install and run the ifconfig tool:

`yum -y install net-tools

In my computer, i have the enp0s3 as the host-only network adapter. their configuration files ifcfg-enp0s3 (if not exist, simply create the text file of the same name) can be found in the folder "/etc/sysconfig/network-scripts".

4.1 Configure static IP address in the network adapter

Run the following command to open and edit the ifcfg-enp0s3:

`cd /etc/sysconfig/network-scripts
`vi ifcfg-enp0s3

Add or modify the following settings in the ifcfg-enp0s3:


A sample of the ifcfg-enp0s3 is as shown below:


4.2. Restart the network service

Run the following command to restart the network service and re-check the network configuration:

`service network restart

You should see that the static ip address

5. Security: Disable firewall and selinux

For the project I am working, I need to run distributed computation jobs using apache spark cluster and HDFS. In order for spark cluster to work, the firewall and selinux must be properly configured. As a quick and dirty way, the firewall and selinux (optional) can be disabled on all VMs running spark cluster (including master and slave nodes in spark and namenodes and datanodes in HDFS).

5.1. Disable firewalld

 To disable firewall, run the command:

`systemctl stop firewalld.service
`systemctl disable firewalld.service

To restart the the firewall, run the following command:

`systemctl start firewalld.service
`systemctl enable firewall.service

5.2. Disable selinux (Optional)

Run the command to edit /etc/selinux/config:

`vi /etc/selinux/config

In the /etc/selinux/config, change the following line:




Next to turn off the selinux immediately, run the following command:

`setenforce 0

No comments:

Post a Comment