AliAhmadi-Software/install-hadoop-spark
Install and Config Hadoop & Spark in Single Node
Hadoop and Spark Installation on Ubuntu
This guide provides step-by-step instructions to install and configure Hadoop and Spark on an Ubuntu system. The accompanying scripts (hadoop-install-cmd.sh and spark-install-cmd.sh) automate the process for ease of setup.
Prerequisites
- A working Ubuntu system with sudo access.
- Basic knowledge of terminal commands.
- Internet connection for downloading required packages.
1. Hadoop Installation
Steps Overview:
-
Update and Upgrade System
Ensure the system is updated:sudo apt update && sudo apt upgrade -y -
Install Java 8
Hadoop requires Java 8. Install it with:sudo apt install openjdk-8-jdk ssh rsync -y -
Set JAVA_HOME Environment Variable
Edit/etc/environmentto include:JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"Reload environment variables:
source /etc/environment -
Download and Extract Hadoop
Download the latest Hadoop version:wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz tar -xvzf hadoop-3.4.1.tar.gz sudo mv hadoop-3.4.1 /usr/local/hadoop -
Configure Hadoop Files
- Update
.bashrcto set environment variables. - Configure
core-site.xml,hdfs-site.xml,mapred-site.xml, andyarn-site.xmlas outlined in the script.
- Update
-
Setup SSH for Hadoop
Generate SSH keys and ensure passwordless login forlocalhost. -
Format and Start HDFS
Format the HDFS filesystem and start Hadoop services:hdfs namenode -format start-dfs.sh start-yarn.sh -
Verify Installation
Check services with:jpsAccess interfaces:
- HDFS: http://localhost:9870
- YARN: http://localhost:8088
2. Spark Installation
Steps Overview:
-
Install Dependencies
- Ensure Java 8 and Hadoop are installed.
- Install Scala:
sudo apt install scala -y
-
Download and Extract Spark
wget https://dlcdn.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz tar xvf spark-3.5.3-bin-hadoop3.tgz sudo mv spark-3.5.3-bin-hadoop3 /opt/spark -
Configure Spark
- Update
.bashrcwith Spark environment variables. - Configure
spark-env.shandlog4j2.propertiesas outlined in the script.
- Update
-
Start Spark Services
Start Spark master and worker nodes:start-master.sh start-worker.sh spark://localhost:7077Access Spark interfaces:
- Spark Master: http://localhost:8080
- Spark Worker: http://localhost:8081
-
Stop Spark Services
To stop Spark:stop-master.sh stop-worker.sh
Notes
- Ensure all paths in configuration files match your system's setup.
- Scripts must be run with appropriate permissions.