标签: 大数据
环境:
CentOS6.4
Hadoop 2.6.0 -cdh5.7.0
Prerequisites
首先我们进入到官方网址http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/
由于我们一开始是要做一个“伪分布式”,因此我们在左侧General
中选择Single Node Setup
然后我们看到,需要安装JDK以及SSH:
推荐的是安装jdk7:http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html
下载:jdk-7u79-linux-i586.tar.gz
然后下载,在Linux的Firefox下压缩包被下载到Downloads下,我们使用mv命令将其移动到software下。
使命令解压到app目录下:
tar -zxvf jdk-7u79-linux-x64.tar.gz -C ~/app
然后我们在使用pwd获取其全路径。将jdk配置到系统环境变量中。
我们打开用户目录下的.hash_profile
文件:
vim ~/.bash_profile
然后编辑文件,添加系统环境变量:
# .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs export JAVA_HOME=/home/japson/app/jdk1.7.0_79 export PATH=$JAVA_HOME/bin:$PATH
然后我们要使环境变量配置生效:
surce ~/.bash_profile
此时我们已经配置完系统环境变量了,我们进行检查:
[japson@localhost jdk1.7.0_79]$ echo $JAVA_HOME /home/japson/app/jdk1.7.0_79 [japson@localhost jdk1.7.0_79]$ java -version java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
在CentOS中,我们使用yum来安装软件。
sudo yum install ssh
但是出现了一个问题:
japson is not in the sudoers file. This incident will be reported.
究其原因是用户没有加入到sudo的配置文件里,使用su root
切换到root用户,运行visudo命令:
[root@localhost japson]# visudo
在打开的配置文件中,找到root ALL=(ALL) ALL,在下面添加一行
japson ALL=(ALL) ALL
输入:wq保存并退出配置文件,并使用su japson
切换回用户
再次使用sudo命令就不会有上面的提示了。
然后我们要对ssh进行一个免密码连接的配置
首先要生成密钥:
[japson@localhost ~]$ ssh-keygen -t rsa
然后一路按回车,提示我们:
Your public key has been saved in /home/japson/.ssh/id_rsa.pub.
我们找到对应的目录:
[japson@localhost ~]$ ls .ssh id_rsa id_rsa.pub
然后我们将id_rsa.pub 复制为 authorized_keys
[japson@localhost ~]$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
ssh就配置完成了。线面我们进行验证,使用ssh命令连接localhost并退出:
[japson@localhost .ssh]$ ssh localhost The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is 63:3f:25:ca:15:35:17:97:cc:ea:eb:08:c5:15:1c:f1. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Last login: Fri May 18 02:54:24 2018 from 192.168.1.108 [japson@localhost ~]$ exit logout Connection to localhost closed.
我们要下载Hadoop,地址:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz
将其移动到software包下,然后解压到app目录下:
tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app/
解压完成后,我们在hadoop目录下看一下结构:
[japson@localhost hadoop-2.6.0-cdh5.7.0]$ ll total 76 drwxr-xr-x. 2 japson japson 4096 Mar 23 2016 bin drwxr-xr-x. 2 japson japson 4096 Mar 23 2016 bin-mapreduce1 drwxr-xr-x. 3 japson japson 4096 Mar 23 2016 cloudera drwxr-xr-x. 6 japson japson 4096 Mar 23 2016 etc drwxr-xr-x. 5 japson japson 4096 Mar 23 2016 examples drwxr-xr-x. 3 japson japson 4096 Mar 23 2016 examples-mapreduce1 drwxr-xr-x. 2 japson japson 4096 Mar 23 2016 include drwxr-xr-x. 3 japson japson 4096 Mar 23 2016 lib drwxr-xr-x. 2 japson japson 4096 Mar 23 2016 libexec -rw-r--r--. 1 japson japson 17087 Mar 23 2016 LICENSE.txt -rw-r--r--. 1 japson japson 101 Mar 23 2016 NOTICE.txt -rw-r--r--. 1 japson japson 1366 Mar 23 2016 README.txt drwxr-xr-x. 3 japson japson 4096 Mar 23 2016 sbin drwxr-xr-x. 4 japson japson 4096 Mar 23 2016 share drwxr-xr-x. 17 japson japson 4096 Mar 23 2016 src
其中bin目录下是一些可执行文件,etc目录是一个很重要的目录,其中有很多重要的配置文件,sbin目录是一些启动和关闭文件,jar目录下的/hadoop/mapreduce目录下有一个examples的jar包,有一些例子供我们直接使用。
按照官网所说:
Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows: # set to the root of your Java installation export JAVA_HOME=/usr/java/latest # Assuming your installation directory is /usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop Try the following command: $ bin/hadoop This will display the usage documentation for the hadoop script. Now you are ready to start your Hadoop cluster in one of the three supported modes: ・ Local (Standalone) Mode ・ Pseudo-Distributed Mode ・ Fully-Distributed Mode
我们来到hadoop目录下,etc/hadoop/hadoop-env.sh
找到并改写路径
# The java implementation to use. # export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/home/japson/app/jdk1.7.0_79
到这里我们就可以继续往下看了。
因为我们要进行的是Pseudo-Distributed Operation(伪分布式操作)。
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
配置HDFS默认文件系统的地址和缓存文件的存储地址
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/japson/app/tmp</value> </property> </configuration>
配置hdfs副本系数为1,因为我们只有一个节点,没有三份。
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
还有一个slaves文件,里面会配置DataNode的主机名。
在这里,我们最好将bin目录也配置到系统的环境变量中去。
即通过pwd获取路径,然后将其配置到~/.bash_profile
中去:
export HADOOP_HOME=/home/japson/app/hadoop-2.6.0-cdh5.7.0 export PATH=$HADOOP_HOME/bin:$PATH
然后使用source使其生效并检验:
[japson@localhost bin]$ source ~/.bash_profile [japson@localhost bin]$ echo $HADOOP_HOME /home/japson/app/hadoop-2.6.0-cdh5.7.0
格式化文件系统(客户端操作,仅第一次执行即可,不要重复执行):hdfs namenode -format
启动hdfs:sbin/start-dfs.sh
验证是否启动成功:在浏览器中输入
localhost:50070
看是否能访问,或者查看进程:
[japson@localhost sbin]$ jps 4450 NameNode 4834 Jps 4565 DataNode 4719 SecondaryNameNode
ֹͣhdfs
sbin/stop-dfs.sh