Hadoop and Spark Installation on Ubuntu

This guide provides step-by-step instructions to install and configure Hadoop and Spark on an Ubuntu system. The accompanying scripts (hadoop-install-cmd.sh and spark-install-cmd.sh) automate the process for ease of setup.

Prerequisites

A working Ubuntu system with sudo access.
Basic knowledge of terminal commands.
Internet connection for downloading required packages.

1. Hadoop Installation

Steps Overview:

Update and Upgrade System
Ensure the system is updated:
```
sudo apt update && sudo apt upgrade -y
```
Install Java 8
Hadoop requires Java 8. Install it with:
```
sudo apt install openjdk-8-jdk ssh rsync -y
```
Set JAVA_HOME Environment Variable
Edit /etc/environment to include:
```
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
```
Reload environment variables:
```
source /etc/environment
```

Download and Extract Hadoop
Download the latest Hadoop version:

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
tar -xvzf hadoop-3.4.1.tar.gz
sudo mv hadoop-3.4.1 /usr/local/hadoop

Configure Hadoop Files
- Update .bashrc to set environment variables.
- Configure core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml as outlined in the script.
Setup SSH for Hadoop
Generate SSH keys and ensure passwordless login for localhost.
Format and Start HDFS
Format the HDFS filesystem and start Hadoop services:
```
hdfs namenode -format
start-dfs.sh
start-yarn.sh
```
Verify Installation
Check services with:
```
jps
```
Access interfaces:
- HDFS: http://localhost:9870
- YARN: http://localhost:8088

2. Spark Installation

Steps Overview:

Install Dependencies
- Ensure Java 8 and Hadoop are installed.
- Install Scala:
```
sudo apt install scala -y
```

Download and Extract Spark

wget https://dlcdn.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz
tar xvf spark-3.5.3-bin-hadoop3.tgz
sudo mv spark-3.5.3-bin-hadoop3 /opt/spark

Configure Spark
- Update .bashrc with Spark environment variables.
- Configure spark-env.sh and log4j2.properties as outlined in the script.
Start Spark Services
Start Spark master and worker nodes:
```
start-master.sh
start-worker.sh spark://localhost:7077
```
Access Spark interfaces:
- Spark Master: http://localhost:8080
- Spark Worker: http://localhost:8081
Stop Spark Services
To stop Spark:
```
stop-master.sh
stop-worker.sh
```

Notes

Ensure all paths in configuration files match your system's setup.
Scripts must be run with appropriate permissions.

AliAhmadi-Software/install-hadoop-spark

Hadoop and Spark Installation on Ubuntu

Prerequisites

1. Hadoop Installation

Steps Overview:

2. Spark Installation

Steps Overview:

Notes

On this page

Languages

Contributors