Hadoop的API

匿名 (未验证) 提交于 2019-12-02 23:52:01

大数据Hadoop

HDFS

HDFS

1.1 概念

HDFS,全称:Hadoop Distributed File System,用于存储文件通过目录树来定位文件;其次,它是分布式的,由很多服务器联合起来实现其功能,集群中的服务器有各自的角色。

1.2

1HDFSNameNodeDataNodeSecondary Namenode

2NameNode

3DataNode datanode

4Secondary NameNode用来HDFSHDFS

1.3HDFS 文件

HDFSblock( dfs.blocksize)hadoop2.x128M64M

HDFS块时间块时间,

10ms,而100MB/s,ʹ1%100MB。的128MB

块10ms*100*100M/s = 100M

HFDS命令行操作

1

bin/hadoop fs 具体命令

2)参数

bin/hadoop fs

[-appendToFile <localsrc> ... <dst>]

3实操

1-help:

bin/hdfs dfs -help rm

2

hadoop fs -ls /

3-mkdirhdfs

(4-moveFromLocalhdfs

/hdfs

5追加一个文件到已经存在的文件末尾

/hdfs

6-cat

hadoop fs -cat /hdfs

7

8-chmod-chownlinux

9hdfshdfs

12

10-mvhdfs/

11-getcopyToLocalhdfs

12linuxhdfs/aaa/:log.1,Linux

hadoop fs -getmerge /aaa/log.* ./log.sum

合成到不同的目录:hadoop fs -getmerge /hdfs1 /hdfs2 /

13-putcopyFromLocal

14-rm

hadoop fs -rm -r /hdfs

(15-df

(16-du

17-count

hadoop fs -count /aaa/

(18-setrephdfs3

hadoop fs -setrep 3 / hdfs

这里设置namenode的datanode3310台时副本10

HDFS客户端

3.1IDEA

{$MAVEN_HOME/conf/settings}

-->

<localRepository>F:\m2\repository</localRepository>

<!--Jar-->

<!--JDK8-->

<profiles>

<maven.compiler.compilerVersion>1.8</maven.compiler.compilerVersion>

</properties>

</profile>

</profiles>

3.1.0Maven

3.1.1Maven

<dependency>

3.1.2IDEA

1HADOOP_HOME环境

2hadoopbin 、lib两个(如果IDEA

3java

publicclass

publicstaticvoidthrows

// 1

Configuration configuration = new

//

configuration.set("fs.defaultFS", "hdfs://bigdata111:9000");

FileSystem fileSystem = FileSystem.get(configuration);

//

// FileSystem fileSystem = FileSystem.get(new URI("hdfs://bigdata111:9000"),configuration, "itstar");

// 2

fileSystem.copyFromLocalFile(newnew

// 3

fileSystem.close();

System.out.println("over");

}

}

4

eclipse可能

客户端hdfsʱ情况hdfsapijvm:-DHADOOP_USER_NAME=itstar,itstar

3.2 通过APIHDFS

3.2.1HDFS获取文件系统

1)详细

@Test

publicvoidthrows

// 1 信息对象

Configuration configuration = new

// 2

FileSystem fs = FileSystem.get(configuration);

// 3

System.out.println(fs.toString());

}

3.2.2HDFS上传

@Test

publicvoidthrows

// 1

// new Configuration();jarhdfs-default.xml

// classpath下的hdfs-site.xml

Configuration configuration = new

// 2

// 12classpath下的用户自定义配置文件 3

configuration.set("dfs.replication", "2");

FileSystem fs = FileSystem.get(new

// 3 创建要上传文件所在的本地路径

Path src = new

// 4 创建要上传到hdfs的目标路径

Path dst = new

// 5

fs.copyFromLocalFile(src, dst);

fs.close();

}

2)测试参数

12

3.2.3HDFS下载

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

// fs.copyToLocalFile(new Path("hdfs://bigdata111:9000/user/itstar/hello.txt"), new Path("d:/hello.txt"));

// boolean delSrc

// Path src

// Path dst

// boolean useRawLocalFileSystem

文件

fs.copyToLocalFile(false, newnewtrue);

fs.close();

}

3.2.4HDFS目录创建

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

//2

fs.mkdirs(new

fs.close();

}

3.2.5HDFS夹删除

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

//2 2是否递归删除,true递归

fs.delete(newtrue);

fs.close();

}

3.2.6HDFS名更改

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

//2

fs.rename(newnew

fs.close();

}

3.2.7HDFS详情查看

查看长度块

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

// List

RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(newtrue);

while

LocatedFileStatus fileStatus = listFiles.next();

System.out.println(fileStatus.getPath().getName());

System.out.println(fileStatus.getBlockSize());

System.out.println(fileStatus.getPermission());

System.out.println(fileStatus.getLen());

BlockLocation[] blockLocations = fileStatus.getBlockLocations();

for

System.out.println("block-offset:" + bl.getOffset());

String[] hosts = bl.getHosts();

for

System.out.println(host);

}

}

System.out.println("--------------Andy--------------");

}

}

3.2.8HDFS文件夹判断

@Test

public void findAtHDFS() throws Exception, IllegalArgumentException, IOException{

// 1

Configuration configuration = new Configuration();

FileSystem fs = FileSystem.get(new URI("hdfs://bigdata111:9000"),configuration, "itstar");

// 2

FileStatus[] listStatus = fs.listStatus(new Path("/"));

// 3

for

if

System.out.println("f--" + status.getPath().getName());

} else

System.out.println("d--" + status.getPath().getName());

}

}

}

3.3 通过IO流操作HDFS

3.3.1HDFS上传

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

// 2

FileInputStream inStream = newnew

// 3

String putFileName = "hdfs://bigdata111:9000/user/itstar/hello1.txt";

Path writePath = new

// 4

FSDataOutputStream outStream = fs.create(writePath);

// 5

try{

IOUtils.copyBytes(inStream, outStream, 4096, false);

}catch(Exception e){

e.printStackTrace();

}finally{

IOUtils.closeStream(inStream);

IOUtils.closeStream(outStream);

}

}

3.3.2HDFS下载

@Test

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

// 2

String filename = "hdfs://bigdata111:9000/user/itstar/hello1.txt";

// 3 path

Path readPath = new

// 4

FSDataInputStream inStream = fs.open(readPath);

// 5

try{

IOUtils.copyBytes(inStream, System.out, 4096, false);

}catch(Exception e){

e.printStackTrace();

}finally{

IOUtils.closeStream(inStream);

}

}

3.3.3 定位文件

1

@Test

//

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

// 2

Path path = new

// 3

FSDataInputStream fis = fs.open(path);

// 4

FileOutputStream fos = new

// 5

byte[] buf = newbyte[1024];

forint

fis.read(buf);

fos.write(buf);

}

// 6

IOUtils.closeStream(fis);

IOUtils.closeStream(fos);

}

2

@Test

//

publicvoidthrows

// 1

Configuration configuration = new

FileSystem fs = FileSystem.get(new

// 2

Path path = new

// 3

FSDataInputStream fis = fs.open(path);

// 4

FileOutputStream fos = new

// 5

fis.seek(1024 * 1024 * 128);

// 6

IOUtils.copyBytes(fis, fos, 1024);

// 7

IOUtils.closeStream(fis);

IOUtils.closeStream(fos);

}

3

在window

HDFS的数据

4.1

4.1.1

1namenodenamenode

2namenode

3blockdatanode

4namenode3datanodedn1、dn2、dn3。

5dn1上传数据,dn1收到请求会继续调用dn2,然后dn2调用dn3,将这个建立完成

6dn1、dn2、dn3逐级应答客户端

7)客户端开始往dn1blockpacketdn1packetdn2,dn2传给dn3;dn1packet

8blocknamenodeblock3-7步)

4.1.2

在“彼此”是在――的

节点距离

例如d1机架r1中n1节点/d1/r1/n1这种

Distance(/d1/r1/n1, /d1/r1/n1)=0(同一节点)

Distance(/d1/r1/n1, /d1/r1/n2)=2(同一)

Distance(/d1/r1/n1, /d1/r3/n2)=4(同一数据)

Distance(/d1/r1/n1, /d2/r4/n2)=6(不同数据)

大家算每之间距离

4.1.3 机架(副本节点

1ip:

http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Replication

2)低版本Hadoop副本节点

第一个副本client。客户端

第副本副本位于不的

第副本和第副本,

3)Hadoop2.7.2副本节点选择

第一个副client。客户端

第副副本位于,

第副,随机

4.2

1namenodenamenodedatanode

2datanode

3datanodepacket

4packet

4.3

1debug

@Test

public void writeFile() throws Exception{

// 1

Configuration configuration = new Configuration();

fs = FileSystem.get(configuration);

// 2

Path path = new Path("hdfs://bigdata111:9000/user/itstar/hello.txt");

FSDataOutputStream fos = fs.create(path);

// 3

fos.write("hello".getBytes());

刷新

fos.hflush();

fos.close();

}

2)总结

写入client

client

NameNode

5.1 NameNode&Secondary NameNode工作机制

1namenode

1namenode格式化后创建fsimage和edits。如果

2

3namenode。

4namenode

2Secondary NameNode工作

1Secondary NameNode询问namenodecheckpoint直接namenode

2Secondary NameNode请求checkpoint。

3namenode滚动edits

4Secondary NameNode

5Secondary NameNode编辑

6fsimage.chkpoint

7fsimage.chkpoint到namenode

8namenodefsimage.chkpoint重新fsimage

3)webSecondaryNameNode

1

2:http://bigdata111:50090/status.html

3SecondaryNameNode信息

4chkpoint检查参数璁剧疆

1SecondaryNameNode每隔。

[hdfs-default.xml]

<property>

</property>

21SecondaryNameNode执行

<property>

<description>操作动作次数</description>

</property>

<property>

<description> 1分钟检查一次操作次数</description>

</property>

5.2

1

namenode在/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current目录文件

edits_0000000000000000000

fsimage_0000000000000000000.md5

seen_txid

VERSION

1FsimageHDFSHDFSidnode

2EditsHDFSedits

3seen_txidedits_

4Namenodefsimage00001seen_txideditsNamenodefsimageedits

2oivfsimage

1oiv和oev命令

[itstar@bigdata111 current]$ hdfs

2

hdfs oiv -p 文件

3

[itstar@bigdata111 current]$ pwd

/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current

[itstar@bigdata111 current]$ hdfs oiv -p XML -i fsimage_0000000000000000025 -o /opt/module/hadoop-2.7.2/fsimage.xml

[itstar@bigdata111 current]$ cat /opt/module/hadoop-2.7.2/fsimage.xml

将显示xmlIDEAxml。

3)oevedits

(1)基本语法

hdfs oev -p 文件-o 转换

(2)案例实操

[itstar@bigdata111 current]$ hdfs oev -p XML -i edits_0000000000000000012-0000000000000000013 -o /opt/module/hadoop-2.7.2/edits.xml

[itstar@bigdata111 current]$ cat /opt/module/hadoop-2.7.2/edits.xml

将显示xmlIDEAxml。

5.3

正常HDFS可以

1)滚动前提

[itstar@bigdata111 current]$ hdfs dfsadmin -rollEdits

2)镜像

Namenode启动

5.4 namenode版本

1namenode

在/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current这个VERSION

namespaceID=1933630176

clusterID=CID-1f2bf8d1-5ad2-4202-af1c-6713ab381175

cTime=0

storageType=NAME_NODE

blockpoolID=BP-97847618-192.168.10.102-1493726072779

layoutVersion=-63

2)namenode

1namespaceIDHDFSNamenodeNamenodenamespaceIDblockpoolID

2clusterID集群id,

3cTime属性namenode存储0

4storageType属性说明namenode的

5blockpoolIDblock pool idblock poolNamespace(format)IDBlockPoolIDNNBlockPoolIDload

6layoutVersionHDFS

5.5NameNodeĿ¼

Secondary NameNode用来HDFSHDFS

在/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/current这个SecondaryNameNodeĿ¼

edits_0000000000000000001-0000000000000000002

fsimage_0000000000000000002

fsimage_0000000000000000002.md5

VERSION

SecondaryNameNode的namesecondary/currentnamenodecurrent布局

好处namenode发生假设,可以SecondaryNameNode恢复。

方法SecondaryNameNode中namenode存储数据

方法使用-importCheckpoint选项namenodeSecondaryNameNode中数据namenodeĿ¼。

1(一)

模拟namenode采用namenode

1kill

2namenode存储的(/opt/module/hadoop-2.7.2/data/tmp/dfs/name)

rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*

3SecondaryNameNode中namenode存储

cp -R /opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/* /opt/module/hadoop-2.7.2/data/tmp/dfs/name/

4namenode

sbin/hadoop-daemon.sh start namenode

2)案例(二)

模拟namenode采用二namenode

0hdfs-site.xml中

<property>

</property>

<property>

</property>

1kill

2namenode存储的(/opt/module/hadoop-2.7.2/data/tmp/dfs/name)

rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*

3SecondaryNameNode不和Namenode在主机SecondaryNameNode存储Namenode存储

[itstar@bigdata111 dfs]$ pwd

/opt/module/hadoop-2.7.2/data/tmp/dfs

[itstar@bigdata111 dfs]$ ls

(4)导入(等待ctrl+c结束掉)

bin/hdfs namenode -importCheckpoint

(5namenode

sbin/hadoop-daemon.sh start namenode

(6)如果提示in_use.lock

rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/in_use.lock

5.6模式操作

1

Namenode映像fsimage载入edits中在fsimage,namenode开始datanodenamenodenamenode

系统中namenode维护datanode中。ϵͳnamenode安全datanode向namenodenamenode了解多

如果满足“最小副本”,namenode30的副本99.9%副本默认dfs.replication.min=1。启动HDFSnamenode不会

2

д。启动

(1)bin/hdfs dfsadmin -safemode get (功能)

2)

3bin/hdfs dfsadmin -safemode leave (功能)

4bin/hdfs dfsadmin -safemode wait (功能)

3

模拟

1)

bin/hdfs dfsadmin -safemode enter

2)

编辑

#!/bin/bash

bin/hdfs dfsadmin -safemode wait

bin/hdfs dfs -put ~/hello.txt /root/hello.txt

3)再

bin/hdfs dfsadmin -safemode leave

5.7

1namenode的目录可以配置且增加

2

hdfs-site.xml

<property>

<value>file:///${hadoop.tmp.dir}/dfs/name1,file:///${hadoop.tmp.dir}/dfs/name2</value>

</property>

DataNode工作机制

6.1 DataNode工作

1一个datanode上的时间戳

2DataNode启动namenode1小时的namenode

3心跳3namenodedatanode复制。10datanode

4集群可以

6.2 完整性

1当DataNodeblockchecksum

2如果checksumblockblock

3clientDataNodeblock.

4datanodechecksum

6.3 掉线时限参数设置

datanodedatanodenamenodenamenodeHDFS10+30timeout

dfs.namenode.heartbeat.recheck-interval 5dfs.heartbeat.interval3

hdfs-site.xml heartbeat.recheck.interval毫秒dfs.heartbeat.interval秒。

<property>

</property>

<property>

</property>

6.4 DataNode的目录

和namenode不同datanode阶段

1/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current这

[itstar@bigdata111 current]$ cat VERSION

storageID=DS-1b998a1d-71a3-43d5-82dc-c0ff3294921b

clusterID=CID-1f2bf8d1-5ad2-4202-af1c-6713ab381175

cTime=0

datanodeUuid=970b2daf-63b8-4e17-a514-d81741392165

storageType=DATA_NODE

layoutVersion=-56

2)

1storageID:id

2clusterID集群id,

3cTime属性datanode存储0

4datanodeUuiddatanode

5storageType:

6layoutVersionHDFS

3)在/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current/BP-97847618-192.168.10.102-1493726072779/current这个块

[itstar@bigdata111 current]$ cat VERSION

#Mon May 08 16:30:19 CST 2017

namespaceID=1933630176

cTime=0

blockpoolID=BP-97847618-192.168.10.102-1493726072779

layoutVersion=-56

4)

1namespaceID:datanode首次namenodenamenodestorageID对datanode但对datanode,namenodedatanode

2cTime属性datanode存储0

3blockpoolIDblock pool idblock poolNamespace(format)IDBlockPoolIDNNBlockPoolIDload

4layoutVersionHDFS

6.5 Datanode

1datanode每个即

2)具体配置

hdfs-site.xml

<property>

</property>

HDFS其他

7.1 集群间

1scp实现

scp -r hello.txt root@bigdata112:/user/itstar/hello.txt // push

scp -r // pull

scp -r root@bigdata112:/user/itstar/hello.txtssh

2discp实现涓や釜hadoop

bin/hadoop distcp hdfs://haoop102:9000/user/itstar/hello.txt hdfs://bigdata112:9000/user/itstar/hello.txt

7.2 Hadoop存档

1

每个文件存储namenode的内存hadoop大量namenode注意,1MB128MB1MB磁盘128MB

HadoopHARHDFSnamenode˵HadoopMapReduce

2

1yarn

start-yarn.sh

2文件

归档xxx.har的Xx.harĿ¼

3

hadoop fs -lsr /user/my/myhar.har

hadoop fs -lsr har:///myhar.har

4文件

hadoop fs -cp har:/// user/my/myhar.har /* /user/itstar

7.3 快照管理

快照。不会写入

1)基本语法

1hdfs dfsadmin -allowSnapshot 开启)

(功能禁用

3对)

指定创建)

5hdfs dfs -renameSnapshot 重命名)

列出)

(7)hdfs snapshotDiff 1 2比较)

8)

2实操

1/

hdfs dfsadmin -allowSnapshot /user/itstar/data

2

通过webhdfs://bigdata111:9000/user/itstar/data/.snapshot/s…..//

hdfs dfs -lsr /user/itstar/data/.snapshot/

3创建

4

hdfs dfs -renameSnapshot /user/itstar/data/ miao170508 itstar170508

5

hdfs lsSnapshottableDir

6

7

hdfs dfs -cp /user/itstar/input/.snapshot/s20170708-134303.027 /user

7.4 回收站

1

默认fs.trash.interval=00,可以。

默认fs.trash.checkpoint.interval=0,检查。

要求fs.trash.checkpoint.interval<=fs.trash.interval。

2

修改core-site.xml,垃圾1

<property>

</property>

3)查看

回收在/user/itstar/.Trash/….

4)修改访问

进入垃圾用户dr.who,itstar

[core-site.xml]

<property>

</property>

5)moveToTrash()

Trash trash = New Trash(conf);

trash.moveToTrash(path);

6)恢复数据

7)

hdfs dfs -expunge

HDFS

8.1 HA概述

1HA(high available),7*24Сʱ。

2的消除单点。HAHAHDFSHAYARNHA

3Hadoop2.0֮ǰHDFSNameNodeSPOF。

4)NameNode以下HDFS

NameNode机器

NameNode

HDFS HA功能Active/Standby涓や釜nameNodesNameNode解决出现维护NameNode

8.2 HDFS-HA工作机制

1)通过双namenode消除单点故障

8.2.1 HDFS-HA工作

1)元数据管理方式需要改变:

内存中各自保存一份元数据;

Edits日志只有Active状态的namenode节点可以做写操作;

涓や釜namenode都可以读取edits;

edits放在一个共享存储中管理(qjournal和NFS两个主流实现);

2)需要一个状态管理功能模块

zkfailover,常驻在每一个namenode所在的节点,每一个zkfailover负责监控自己所在namenode节点,利用zk进行状态标识,当需要进行状态切换时,由zkfailover来负责切换,切换时需要防止brain split现象的发生。

3保证涓や釜NameNodessh

4Fence,同一时刻NameNode

8.2.2 HDFS-HA自动故障转移工作

ǰhdfs haadmin -failoverNameNodeNameNodeNameNodeHAHDFSZooKeeperZKFailoverControllerZKFCZooKeeperHAZooKeeper

1NameNodeZooKeeperZooKeeperZooKeeperNameNode

2NameNodeZooKeeperactiveNameNodeZooKeeperNameNode

ZKFCZooKeeperNameNodeNameNodeZKFCZKFC

1ZKFCpingNameNodeNameNodeZKFC

2ZooKeeperNameNodeZKFCZooKeeperNameNodeactiveZKFCznodeZooKeeper

3ZooKeeperNameNodeZKFCznodeNameNodeactive进程NameNodeNameNodeactive

8.4HDFS-HA

8.4.1 环境准备

1IP

2)修改主机名及IP

3

4ssh

5JDK

8.4.2 规划

NameNode NameNode

JournalNode JournalNode JournalNode

DataNode DataNode DataNode

ZK ZK ZK

ResourceManager

NodeManager NodeManager NodeManager

8.4.3 Zookeeper

0)集群

在bigdata111、bigdata112和bigdata113三个Zookeeper。

1

1zookeeper安装/opt/module/

[itstar@bigdata111 software]$ tar -zxvf zookeeper-3.4.10.tar.gz -C /opt/module/

2/opt/module/zookeeper-3.4.10/这个zkData

mkdir -p zkData

3/opt/module/zookeeper-3.4.10/conf这个zoo_sample.cfgΪzoo.cfg

mv zoo_sample.cfg zoo.cfg

2)配置zoo.cfg文件

1

dataDir=/opt/module/zookeeper-3.4.10/zkData

增加

#######################cluster##########################

server.2=bigdata111:2888:3888

server.3=bigdata112:2888:3888

server.4=bigdata113:2888:3888

2

Server.A=B:C:D。

A

Bip

CLeader

D是LeaderLeader

集群myid,dataDirĿ¼AZookeeper的zoo.cfg里面server

3)

1/opt/module/zookeeper-3.4.10/zkDataĿ¼myid

touch myid

myidlinux,notepad++里面

2myid

vi myid

在与serverӦ2

3zookeeper

scp -r zookeeper-3.4.10/ root@bigdata112.itstar.com:/opt/app/

scp -r zookeeper-3.4.10/ root@bigdata113.itstar.com:/opt/app/

并myid34

4zookeeper

[root@bigdata111 zookeeper-3.4.10]# bin/zkServer.sh start

[root@bigdata112 zookeeper-3.4.10]# bin/zkServer.sh start

[root@bigdata113 zookeeper-3.4.10]# bin/zkServer.sh start

5

[root@bigdata111 zookeeper-3.4.10]# bin/zkServer.sh status

JMX enabled by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!