Node setup in apache Hadoop using Amazon EC2

Our configuration includes a single NameNode and three DataNodes that act as the slaves for
processing. Beginning with setting up the AWS EC2 equipment, in this setup we take you through the full
configuration of the machines. For that demonstration, we use Apache Hadoop 2.7.3.

To learn comple hadoop course visit ITGuru's hadoop admin online training Blog

Appropriations for Apache Hadoop
Sign up for an AWS account, if you don't have one already. For the first year, you get certain resources
free, including an EC2 Micro Instance.

AWS EC2 Launch
You will now use Amazon EC2 to build 4 instances of Ubuntu Server 16.04 LTS.
Select For
● Go to your AWS Console, pick Ubuntu Server 16.04 LTS and click on Launch Instance.
● Choose sort of AWS instance for the hadoop cluster apache
Type Instance
For the type of instance, we choose t2.micro because that is enough for demo purposes. If you need a
high-memory or high-cpu instance, then you can choose one of them.
To set up an Apache Hadoop Cluster, select type of instance
● Click Next to Setup Instance Details
● Details Instance
Here we ask for 4 instances of the selected type of machine. We also choose a subnet (us-west-1b) just
so that if we need more machines we can launch into the same place.
To set up an Apache Hadoop Cluster.
● configure instance parameters
● Tap Next to Save

Warehousing
The default 8 GB instance storage is appropriate for our intent. If you need more storage, either increase
the size or press "Attach Volume" to add a disk. When you add a volume, you'll need to clip, format and
mount the volume to your case. Because this is a guide for beginners, such steps are not addressed
here.

● To set up an Apache Hadoop Cluster, add Space
● Tap Add Tags to Your Instances Next.

Names Instance
A tag lets you identify your instance with a name to choose from.
Click Add Tag, set Name key and Hadoop value. You will use this tag to re-label our instances afterwards
as "namenode," "datanode1" and so on. Let the name of all instances be "Hadoop" for now.
When setting up an Apache Hadoop Cluster, apply Name tag to instances
Select Next to customize Instances Security Unit.

Team Defense
We are developing a fully transparent security community for testing purposes for the safety group.
Build an open security community to set up an Apache Hadoop Cluster
You are now getting to the Launch pad.

Instances to Launch
Check the information again, and click Start to start the instances. You will need to build a key pair or
use a key pair already in place. Follow the Launch Wizard instructions to create a new pair of keys.
● Start instances to an Apache Hadoop Cluster
● Now you can go to the instances tab, and check the instances status.

Naming the Officials
Let us configure the instances names on the instances tab. These are not DNS names, but we give names
that allow us to differentiate between them.
● Click the pencil icon next to the name, and set the names as displayed.
● Configure Apache Hadoop Cluster Name Tag

Establishing Instances
● Once the instances are up and running it's time for us to set them up. Including the following:
● Set up passwordless communication between the datanodes and the namenode.
● Java update.
● Hadoop Setup.

Name of public DNS instance to copy
You must now copy every node's Public DNS Name (1 namenode and 3 datanodes). These names are
used in the steps below on configuration. Because the DNS is unique to each configuration, the names
are referenced as follows.
Common Setup for All Knots
For all nodes some configuration is common: NameNode and DataNodes. This section covers this.
All Nodes: Instance change
Let us update the OS using the latest software patches available.
● Sudo apt-get update & & apt-get install -and dist-upgrade
● The system could need a restart after the updates. Execute a reboot from the EC2 Instances tab.
● Reboot nodes after Apache Hadoop Cluster updates nodes
All the Nodes: Android update
Let's grab Java, now. We install the package on all of the nodes: openjdk-8-jdk-headless.
Apt-get sudo-and update openjdk-8-jdk-headless
All the Nodes: Apache Hadoop update
On all instances install Apache Hadoop 2.7.3. Get the link from the Apache website to download, and
run the following commands. We install Hadoop in the home directory below a directory server.
All Nodes: Build Data Dir
HDFS requires you to provide the data directory on every node: 1 name node and 3 data nodes. Build
this directory as displayed, and assign ownership to the Ubuntu user.
Mkdir sudo -p /usr / local / hadoop / hdfs / data
-R ubuntu: ubuntu /usr / local / hadoop / hdfs / data sudo chown
Setting NameNode to
Let us now set up the NameNode after performing configuration common to all nodes.
Name Code: Less SSH password
We need password-less SSH between the nodes of name and the nodes of data, as described before. For
this reason let 's create a public-private key pair on the namenode.
Name > ssh-keygen

Use the default for the key location (/home / ubuntu/.ssh / id rsa), and hit enter for an empty
passphrase.
Datanodes: Open Setup Key
Save the Public Key in /home / ubuntu/.ssh / id rsa.pub. We need to copy this file from the namenode
into each data node and add the contents on each data node to /home / ubuntu/.ssh / authorized keys.
Cat id rsa.pub > > ~/.ssh / authorized keys Datanode1 >
Cat id rsa.pub > > ~/.ssh / authorized keys Datanode2 >
Cat id rsa.pub > > ~/.ssh / authorized keys Datanode3
Name: Configure SSH
For different parameters SSH uses a configuration file located at ~/.ssh / config. Set it to work as shown
below. Again, replace the Public DNS of each node with the parameter HostName (for example, replace
NameNode with EC2 Public DNS).
Nnode Host
NameHostName
App subtu
FileIdentity~/.ssh / id rsa
User dnode1
NameHostName
App subtu
FileIdentity~/.ssh / id rsa
Dnode host2
NameHostName
App subtu
FileIdentity~/.ssh / id rsa
Dnode3 a server
NameHostName
App subtu

FileIdentity~/.ssh / id rsa
At this point, verify that the password-less operation works as follows on each node (the first time, you
will receive a warning that the host is unknown and whether you want to connect to it. Enter yes and hit
enter.
Namenode > nnode ssh
Namenode > dnode ssh1
Nameode > dnode ssh2
Namenode > dnode ssh3
Name: Download HDFS Properties
Edit the following file under NameNode:~/server / hadoop-2.7.3 / etc / hadoop / hdfs-site.xml
Setting aside:
Through:
Dfs. Processing
Dfs.namenode.dir.
File:/usr / local / hadoop / hdfs / documents
Name: MapReduce Setup Properties
Copy the file on the NameNode (~/server / hadoop-2.7.3 / etc / hadoop / mapred-site.xml.template)
into (~/server / hadoop-2.7.3 / etc / hadoop / mapred-site.xml). Setting aside:
With this (as replaced by public DNS of NameNode above):
Address: mapreduce.jobtracker.
: 54311 Active
Name: mapreduce.frame.
Select
Name: YARN Setup Properties
First, on the NameNode, we need to set ~/server / hadoop-2.7.3 / etc / hadoop / yarn-site.xml Replace
this with:
With (as before, substitute public DNS with NameNode):

Yarn.nodemanager.aux-.
Shuffle mapreduce
Aux-services.mapreduce.shuffle.class yarn.nodemanager.aux
Org.apache.hadoop.mapred. ShuffleHandler Operator
Hostname Yarn.resourcemanager.
Name: Master and Slavic Setup
Build ~/server / hadoop-2.7.3 / etc / hadoop / masters with the following on the NameNode (replace the
public DNS for the NameNode):
Replace all content in ~/server / hadoop-2.7.3 / etc / hadoop / slaves with (replace each, etc with the
corresponding public DNS for DateNode).
Conclusion
I hope you reach to a conclusion about node setup in apache Hadoop. You can learn more through hadoop admin online course