Mapreduce基础编程模型:将一个大任务拆分成一个个小任务,再进行汇总。
MapReduce是分两个阶段:map阶段:拆;reduce阶段:聚合。
hadoop环境安装
安装: 1、解压 : tar -zxvf hadoop-2.4.1.tar.gz -C /root/training/ 2、设置环境变量: vi ~/.bash_profile HADOOP_HOME=/root/training/hadoop-2.7.3 export HADOOP_HOME PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export PATH 生效环境变量: source ~/.bash_profile 第一节:Hadoop的目录结构
第二节:Hadoop的本地模式 1、特点:不具备HDFS,只能测试MapReduce程序 2、修改hadoop-env.sh(echo $JAVA_HOME查出jdk安装路径:xx,将export JAVA_HOME=${JAVA_HOME}替换成export JAVA_HOME=xx) 修改第25行:export JAVA_HOME=/usr/java/jdk8u202-b08(行号可通过:esc后再set number来显示) 3、演示Demo: $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar 命令:hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount ~/data/hadoop/input/test.txt ~/data/hadoop/output/wc 日志:19/09/16 10:45:00 INFO mapreduce.Job: map 100% reduce 100% 结果查看: cd ~/data/hadoop/output/ ls
(前者是运行的结果集,后者是执行程序的状态)
more part-r-00000
注意:MR有一个默认的排序规则 第三节:Hadoop的伪分布模式 1、特点:具备Hadoop的所有功能,在单机上模拟一个分布式的环境 (1)HDFS:主:NameNode,数据节点:DataNode (2)Yarn:容器,运行MapReduce程序 主节点:ResourceManager 从节点:NodeManager 2、步骤: (1)hdfs-site.xml <!--配置HDFS的冗余度--> <property> <name>dfs.replication</name> <value>1</value> </property> <!--配置是否检查权限--> <property> <name>dfs.permissions</name> <value>false</value> </property> (2)core-site.xml <!--配置HDFS的NameNode--> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.88.11:9000</value> </property> <!--配置DataNode保存数据的位置--> <property> <name>hadoop.tmp.dir</name> <value>/root/training/hadoop-2.7.3/tmp</value> </property> (3) mapred-site.xml <!--配置MR运行的框架--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> (4) yarn-site.xml <!--配置ResourceManager的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.88.11</value> </property> <!--配置NodeManager执行任务的方式--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> (5) 格式化NameNode hdfs namenode -format 日志:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted. (6) 启动:start-all.sh (*) HDFS: 存储数据 (*) Yarn:执行计算 (7) 访问:(*)命令行 (*)Java API (*)Web Console: HDFS:http://192.168.88.11:50070 Yarn:http://192.168.88.11:8088
到这里已经能够通过外部访问了
web console无法通过http://ip:port访问服务页面问题排查
原文出自(https://blog.csdn.net/hanwenshan123/article/details/78717782)
问题1:hdfs-site.xml配置项
通过jps命令查看java进程的状态,HADOOP相关的进程运行正常。(jps是jdk提供的一个查看当前java进程的小工具, 可以看做是JavaVirtual Machine Process Status Tool的缩写) [root@node4 ~]# jps 25059 SecondaryNameNode 25347 ResourceManager 25556 NodeManager 24805 DataNode 29269 Jps 24633 NameNode 通过netstat命令查看网络端口服务情况,发现local address列给出的ip地址除了127.0.0.1就是0.0.0.0,这些本地有效的地址,是无法对外提供服务的,这才是问题的关键。 [root@node4 ~]# netstat -ntlp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:43759 0.0.0.0:* LISTEN 24805/java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 24633/java tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 12782/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2325/master tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 24805/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 24805/java tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 24805/java tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 24633/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 25059/java tcp6 0 0 :::22 :::* LISTEN 12782/sshd tcp6 0 0 127.0.0.1:8088 :::* LISTEN 25347/java tcp6 0 0 ::1:25 :::* LISTEN 2325/master tcp6 0 0 :::13562 :::* LISTEN 25556/java tcp6 0 0 :::43451 :::* LISTEN 25556/java tcp6 0 0 127.0.0.1:8030 :::* LISTEN 25347/java tcp6 0 0 127.0.0.1:8031 :::* LISTEN 25347/java tcp6 0 0 127.0.0.1:8032 :::* LISTEN 25347/java tcp6 0 0 127.0.0.1:8033 :::* LISTEN 25347/java tcp6 0 0 :::8040 :::* LISTEN 25556/java tcp6 0 0 :::8042 :::* LISTEN 25556/java 修改HADOOP_HOME/etc/hadoop/hdfs-site.xml文件,加入 <property> <name>dfs.namenode.http-address</name> <value>node4:50070</value> </property> 或者加入 <property> <name>dfs.namenode.http-address</name> <value>hdfs://192.168.88.11:50070</value> </property> 再次用netstat -ntlp查看 [root@node4 ~]# netstat -ntlp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:43759 0.0.0.0:* LISTEN 24805/java tcp 0 0 10.60.8.28.50070 0.0.0.0:* LISTEN 24633/java tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 12782/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2325/master tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 24805/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 24805/java tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 24805/java tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 24633/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 25059/java tcp6 0 0 :::22 :::* LISTEN 12782/sshd tcp6 0 0 127.0.0.1:8088 :::* LISTEN 25347/java tcp6 0 0 ::1:25 :::* LISTEN 2325/master tcp6 0 0 :::13562 :::* LISTEN 25556/java tcp6 0 0 :::43451 :::* LISTEN 25556/java tcp6 0 0 127.0.0.1:8030 :::* LISTEN 25347/java tcp6 0 0 127.0.0.1:8031 :::* LISTEN 25347/java tcp6 0 0 127.0.0.1:8032 :::* LISTEN 25347/java tcp6 0 0 127.0.0.1:8033 :::* LISTEN 25347/java tcp6 0 0 :::8040 :::* LISTEN 25556/java tcp6 0 0 :::8042 :::* LISTEN 25556/java
问题2:selinux
按照道理应该可以访问50070端口了,但是仍然不行。再检查selinux,发现状态是enabled。 - 查看SELINUX的状态 [root@node4 ~]# /usr/sbin/sestatus -v SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: enforcing Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 28 Process contexts: Current context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 Init context: system_u:system_r:init_t:s0 /usr/sbin/sshd system_u:system_r:sshd_t:s0-s0:c0.c1023 File contexts: Controlling terminal: unconfined_u:object_r:user_devpts_t:s0 /etc/passwd system_u:object_r:passwd_file_t:s0 /etc/shadow system_u:object_r:shadow_t:s0 /bin/bash system_u:object_r:shell_exec_t:s0 /bin/login system_u:object_r:login_exec_t:s0 /bin/sh system_u:object_r:bin_t:s0 -> system_u:object_r:shell_exec_t:s0 /sbin/agetty system_u:object_r:getty_exec_t:s0 /sbin/init system_u:object_r:bin_t:s0 -> system_u:object_r:init_exec_t:s0 /usr/sbin/sshd system_u:object_r:sshd_exec_t:s0
编辑/etc/selinux/config文件SELINUX=enforcing修改成SELINUX=disable,重启服务器。再试。修改后的selinux
[root@node4 ~]# /usr/sbin/sestatus -v SELinux status: disabled
问题3:firewall(iptables端口开放)
关闭selinux之后,仍然无法访问页面,再查看iptables防火墙的设置
[root@node4 sbin]# firewall-cmd --state running [root@node4 sbin]# firewall-cmd --get-service RH-Satellite-6 amanda-client amanda-k5-client bacula bacula-client bitcoin bitcoin-rpc bitcoin-testnet bitcoin-testnet-rpc ceph ceph-mon cfengine condor-collector ctdb dhcp dhcpv6 dhcpv6-client dns docker-registry dropbox-lansync elasticsearch freeipa-ldap freeipa-ldaps freeipa-replication freeipa-trust ftp ganglia- client ganglia-master high-availability http https imap imaps ipp ipp-client ipsec iscsi-target kadmin kerberos kibana klogin kpasswd kshell ldap ldaps libvirt libvirt-tls managesieve mdns mosh mountd ms-wbt mssql mysql nfs nrpe ntp openvpn ovirt-imageio ovirt-storageconsole ovirt-vmconsole pmcd pmproxy pmwebapi pmwebapis pop3 pop3s postgresql privoxy proxy-dhcp ptp pulseaudio puppetmaster quassel radius rpc-bind rsh rsyncd samba samba-client sane sip sips smtp smtp-submission smtps snmp snmptrap spideroak-lansync squid ssh synergy syslog syslog-tls telnet tftp tftp-client tinc tor-socks transmission-client vdsm vnc-server wbem-https xmpp-bosh xmpp-client xmpp-local xmpp-server
增加50070端口到允许,重启防火墙服务
[root@node4 sbin]# firewall-cmd --zone=public --add-port=50070/tcp --permanent success [root@node4 sbin]# firewall-cmd --reload success
处理结果
问题4.8088端口无法访问yarn
修改yarn-site.xml文件,在<configuration></configuration>添加: <property> <name>yarn.resourcemanager.webapp.address</name> <value>192.168.88.11:8088</value> </property>