Debian系统配置
我这里在Vmware里面虚拟4个Debian系统,一个master,三个solver。hostname分别是master、solver1、solver2、solver3。对了,下面的JDK和hadoop安装配置操作都是使用hadoop用户权限来执行,并非root权限。
1. 静态网络的配置
编辑/etc/network/interfaces
文件,注释自动获取IP,并添加下面内容
# The primary network interface #allow-hotplug ens33 #iface ens33 inet dhcp # static IP address auto ens33 iface ens33 inet static address 192.168.20.101 netmask 255.255.255.0 gateway 192.168.20.2 dns-nameservers 192.168.20.2 dns-nameservers 114.114.114.114
2. 修改/etc/hosts
文件,添加如下内容
# Hadoop 192.168.20.101 master 192.168.20.102 solver1 192.168.20.103 solver2 192.168.20.104 solver3
3. openssh-server安装和vim的安装
sudo apt-get install openssh-server vim
4. 生成ssh密钥
# 分别在不同的主机上执行`ssh-keygen`命令 # master ssh-keygen -t rsa -C "master" # solver1 ssh-keygen -t rsa -C "solver1" # solver2 ssh-keygen -t rsa -C "solver2" # solver3 ssh-keygen -t rsa -C "solver3"
5. 免密码登录
# 在每台主机上执行: ssh-copy-id -i ~/.ssh/id_rsa.pub master ssh-copy-id -i ~/.ssh/id_rsa.pub solver1 ssh-copy-id -i ~/.ssh/id_rsa.pub solver2 ssh-copy-id -i ~/.ssh/id_rsa.pub solver3
6. 创建用户和用户组
# 在每台主机上执行: useradd -m -s /bin/bash hadoop
JDK 安装与配置
1. 手动安装JDK
解压jdk安装包到/usr/lib/jvm/
。如果/usr/lib
下没有jvm
文件夹,则先创建jvm
,执行sudo mkdir /usr/lib/jvm
。然后创建jdk
软链接:
sudo ln -sf /usr/lib/jvm/jdk1.8.0_202 /usr/lib/jvm/jdk
2. JDK环境变量的配置
- 新建
jdk.sh
文件
vi /etc/profile.d/jdk.sh
- 添加如下内容:
# JDK environment settings export JAVA_HOME=/usr/lib/jvm/jdk export JRE_HOME=${JAVA_HOME}/jre export CLASSPATh=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
- JAVA环境的验证
$ java -version java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
把jdk安装包和jdk.sh分别scp到每台主机上,重复上面的操作。
Hadoop 安装与配置
Hadoop 安装
1. 解压hadoop安装包到/opt
,修改hadoop-3.1.2的拥有者:
sudo chown -R hadoop:hadoop /opt/hadoop-3.1.2
2. 然后创建hadoop
软链接
sudo ln -sf /opt/hadoop-3.1.2 /opt/hadoop
3. 在hadoop
下创建logs
、hdfs/name
、`、
hdfs/data`文件夹
mkdir /opt/hadoop/logs mkdir -p /opt/hadoop/hdfs/name mkdir -p /opt/hadoop/hdfs/data
4. hadoop环境变量的配置
- 新建文件
hadoop.sh
vi /etc/profile.d/hadoop.sh
- 添加如下内容:
# Hadoop environment settings export HADOOP_HOME=/opt/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- 刷新profile变量
# 使profile生效 source /etc/profile
Hadoop文件配置
配置文件都在etc/hadoop/
文件夹下
1. hadoop-env.sh
# jdk环境变量 (因为要远程调用 ${java_home}找不到变量) export JAVA_HOME=/usr/lib/jvm/jdk
2. workers
# 添加所有solver机器的hostname solver1 solver2 solver3
3. core-site.xml
<configuration> <!-- hdfs的位置 --> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <!-- hadoop运行时产生的缓冲文件存储位置 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> </configuration>
4. hdfs-site.xml
<configuration> <!-- hdfs 数据备份数量 --> <property> <name>dfs.replication</name> <value>1</value> </property> <!-- hdfs namenode上存储hdfs名字空间元数据 --> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop/hdfs/name</value> </property> <!-- hdfs datanode上数据块的物理存储位置 --> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop/hdfs/data</value> </property> </configuration>
5. mapred-site.xml
<configuration> <!-- mapreduce运行的平台 默认local本地模式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- mapreduce web UI address --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
6. yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <!-- yarn 的 hostname --> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <!-- yarn Web UI address --> <property> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <!-- reducer 获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
把/opt/hadoop-3.1.2
和hadoop.sh
打包scp到每台电脑上,然后重复Hadoop安装步骤
Hadoop 的验证
- 首先格式化 hdfs
hdfs namenode -format
- 启动与关闭 jobhistoryserver
mr-jobhistory-daemon.sh start historyserver mr-jobhistory-daemon.sh stop historyserver
- 启动与关闭 yarn
start-yarn.sh stop-yarn.sh
- 启动与关闭 hdfs
start-dfs.sh stop-dfs.sh
- 一键启动与关闭
start-all.sh stop-all.sh
- 验证
$ jps 13074 SecondaryNameNode 14485 Jps 10441 JobHistoryServer 12876 NameNode 13341 ResourceManager
访问Web UI
Daemon | Web Interface | Notes |
---|---|---|
NameNode | https://192.168.20.101:9870 | Default HTTP port is 9870. |
Resourcemanager | http://192.168.20.101:8088 | Default HTTP port is 8088. |
MapReduce JobHistory Server | http://192.168.20.101:19888 | Default HTTP port is 19888. |