當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

2021年大数据Hadoop（十二）：HDFS的API操作

發(fā)布時(shí)間：2023/11/28 生活经验 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 2021年大数据Hadoop（十二）：HDFS的API操作小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

2021大數(shù)據(jù)領(lǐng)域優(yōu)質(zhì)創(chuàng)作博客，帶你從入門到精通，該博客每天更新，逐漸完善大數(shù)據(jù)各個(gè)知識體系的文章，幫助大家更高效學(xué)習(xí)。

有對大數(shù)據(jù)感興趣的可以關(guān)注微信公眾號：三幫大數(shù)據(jù)

HDFS的API操作

HDFS的JAVA API操作

配置Windows下Hadoop環(huán)境

導(dǎo)入Maven依賴

使用文件系統(tǒng)方式訪問數(shù)據(jù)

1、涉及的主要類

2、獲取FileSystem方式

3、遍歷HDFS中所有文件

4、HDFS上創(chuàng)建文件夾

5、下載文件-方式1

6、下載文件-方式2

7、上傳文件

8、小文件合并

9、hdfs訪問權(quán)限控制

HDFS的API操作

HDFS的JAVA API操作

HDFS在生產(chǎn)應(yīng)用中主要是客戶端的開發(fā)，其核心步驟是從HDFS提供的api中構(gòu)造一個(gè)HDFS的訪問客戶端對象，然后通過該客戶端對象操作（增刪改查）HDFS上的文件。

配置Windows下Hadoop環(huán)境

在windows上做HDFS客戶端應(yīng)用開發(fā)，需要設(shè)置Hadoop環(huán)境,而且要求是windows平臺編譯的Hadoop,不然會報(bào)以下的錯(cuò)誤:

缺少winutils.exe

Could not locate executable null \bin\winutils.exe in the hadoop binaries

缺少hadoop.dll

Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable

搭建步驟:

第一步：將已經(jīng)編譯好的Windows版本Hadoop解壓到到一個(gè)沒有中文沒有空格的路徑下面

第二步：在windows上面配置hadoop的環(huán)境變量： HADOOP_HOME，并將%HADOOP_HOME%\bin添加到path中

第三步：把hadoop2.7.5文件夾中bin目錄下的hadoop.dll文件放到系統(tǒng)盤: ?C:\Windows\System32 目錄

第四步：關(guān)閉windows重啟

導(dǎo)入Maven依賴

<dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.7.5</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.7.5</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.7.5</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.7.5</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency>
</dependencies>

使用文件系統(tǒng)方式訪問數(shù)據(jù)

1、涉及的主要類

在java中操作HDFS，主要涉及以下Class：

Configuration：該類的對象封轉(zhuǎn)了客戶端或者服務(wù)器的配置;

FileSystem：該類的對象是一個(gè)文件系統(tǒng)對象，可以用該對象的一些方法來對文件進(jìn)行操作，通過FileSystem的靜態(tài)方法get獲得該對象。

FileSystem fs = FileSystem.get(conf);

get方法從conf中的一個(gè)參數(shù) fs.defaultFS的配置值判斷具體是什么類型的文件系統(tǒng)。如果我們的代碼中沒有指定fs.defaultFS，并且工程classpath下也沒有給定相應(yīng)的配置，conf中的默認(rèn)值就來自于hadoop的jar包中的core-default.xml，默認(rèn)值為： file:///，則獲取的將不是一個(gè)DistributedFileSystem的實(shí)例，而是一個(gè)本地文件系統(tǒng)的客戶端對象。

2、獲取FileSystem方式

第一種方式

@Test
public void getFileSystem1() throws IOException {Configuration configuration = new Configuration();//指定我們使用的文件系統(tǒng)類型:configuration.set("fs.defaultFS", "hdfs://node1:8020/");//獲取指定的文件系統(tǒng)FileSystem fileSystem = FileSystem.get(configuration);System.out.println(fileSystem.toString());
}

第二種方式

@Test
public void getFileSystem2() throws ?Exception{FileSystem fileSystem = FileSystem.get(new URI("hdfs://node1:8020"), new ??????Configuration());System.out.println("fileSystem:"+fileSystem);
}

3、遍歷HDFS中所有文件

@Test
public void listMyFiles()throws Exception{//獲取fileSystem類FileSystem fileSystem = FileSystem.get(new URI("hdfs://node1:8020"), new Configuration());//獲取RemoteIterator 得到所有的文件或者文件夾，第一個(gè)參數(shù)指定遍歷的路徑，第二個(gè)參數(shù)表示是否要遞歸遍歷RemoteIterator<LocatedFileStatus> locatedFileStatusRemoteIterator = fileSystem.listFiles(new Path("/"), true);while (locatedFileStatusRemoteIterator.hasNext()){LocatedFileStatus next = locatedFileStatusRemoteIterator.next();System.out.println(next.getPath().toString());}fileSystem.close();
}

4、HDFS上創(chuàng)建文件夾

@Test
public void mkdirs() throws ?Exception{FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());boolean mkdirs = fileSystem.mkdirs(new Path("/hello/mydir/test"));fileSystem.close();
}

???????5、下載文件-方式1??????????????

@Test
public void getFileToLocal()throws ?Exception{FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());FSDataInputStream inputStream = fileSystem.open(new Path("/timer.txt"));FileOutputStream outputStream = new FileOutputStream(new File("e:\\timer.txt"));IOUtils.copy(inputStream,outputStream );IOUtils.closeQuietly(inputStream);IOUtils.closeQuietly(outputStream);fileSystem.close();
}

6、下載文件-方式2

?@Test
public?void?downLoadFile()?throws?URISyntaxException,?IOException,?InterruptedException {
//1:獲取FiletSystem對象
FileSystem fileSystem =?FileSystem.get(new?URI("hdfs://node1:8020"),?new?Configuration());//2:實(shí)現(xiàn)文件下載
fileSystem.copyToLocalFile(new?Path("/anaconda-ks.cfg"),?new?Path("E:\\test"));//3:釋放資源fileSystem.close();}

???????7、上傳文件

@Test
public void putData() throws ?Exception{FileSystem fileSystem = FileSystem.get(new URI("hdfs://node1:8020"), new Configuration());fileSystem.copyFromLocalFile(new Path("file:///c:\\install.log"),new Path("/hello/mydir/test"));fileSystem.close();
}

8、小文件合并

??????????????由于 Hadoop 擅長存儲大文件，因?yàn)榇笪募脑獢?shù)據(jù)信息比較少，如果 Hadoop 集群當(dāng)中有大量的小文件，那么每個(gè)小文件都需要維護(hù)一份元數(shù)據(jù)信息，會大大的增加集群管理元數(shù)據(jù)的內(nèi)存壓力，所以在實(shí)際工作當(dāng)中，如果有必要一定要將小文件合并成大文件進(jìn)行一起處理,可以在上傳的時(shí)候?qū)⑿∥募喜⒌揭粋€(gè)大文件里面去小文件合并

@Test
public void mergeFile() throws ?Exception{//獲取分布式文件系統(tǒng)FileSystem fileSystem = FileSystem.get(new URI("hdfs://node1:8020"), new Configuration(),"root");FSDataOutputStream outputStream = fileSystem.create(new Path("/bigfile.txt"));//獲取本地文件系統(tǒng)LocalFileSystem local = FileSystem.getLocal(new Configuration());//通過本地文件系統(tǒng)獲取文件列表，為一個(gè)集合FileStatus[] fileStatuses = local.listStatus(new Path("file:///E:\\input"));for (FileStatus fileStatus : fileStatuses) {FSDataInputStream inputStream = local.open(fileStatus.getPath());IOUtils.copy(inputStream,outputStream);IOUtils.closeQuietly(inputStream);}IOUtils.closeQuietly(outputStream);local.close();fileSystem.close();
}

???????9、hdfs訪問權(quán)限控制

HDFS權(quán)限模型和Linux系統(tǒng)類似。每個(gè)文件和目錄有一個(gè)所有者（owner）和一個(gè)組（group）。文件或目錄對其所有者、同組的其他用戶以及所有其他用戶（other）分別有著不同的權(quán)限。對文件而言，當(dāng)讀取這個(gè)文件時(shí)需要有r權(quán)限，當(dāng)寫入或者追加到文件時(shí)需要有w權(quán)限。對目錄而言，當(dāng)列出目錄內(nèi)容時(shí)需要具有r權(quán)限，當(dāng)新建或刪除子文件或子目錄時(shí)需要有w權(quán)限，當(dāng)訪問目錄的子節(jié)點(diǎn)時(shí)需要有x權(quán)限。但hdfs的文件權(quán)限需要開啟之后才生效，否則在HDFS中設(shè)置權(quán)限將不具有任何意義!

HDFS的權(quán)限設(shè)置是通過hdfs-site.xml文件來設(shè)置，在搭建Hadoop集群時(shí)，將HDFS的權(quán)限關(guān)閉了，所以對HDFS的任何操作都不會受到影響的。

接下來我們將HDFS的權(quán)限開啟，測試下HDFS的權(quán)限控制。

1.停止hdfs集群，在node1機(jī)器上執(zhí)行以下命令

stop-dfs.sh

2.修改node1機(jī)器上的hdfs-site.xml當(dāng)中的配置文件

vim hdfs-site.xml

<property><name>dfs.permissions.enabled</name><value>true</value></property>

3.修改完成之后配置文件發(fā)送到其他機(jī)器上面去

scp hdfs-site.xml node2:$PWDscp hdfs-site.xml node3:$PWD

4.重啟hdfs集群

start-dfs.sh

5.隨意上傳一些文件到我們hadoop集群當(dāng)中準(zhǔn)備測試使用

cd /export/server/hadoop-2.7.5/etc/hadoophadoop fs -mkdir /confighadoop fs -put *.xml /confighadoop fs -chmod 600?/config/core-site.xml

經(jīng)過以上操作之后,core-site.xml文件的權(quán)限如下:

?這個(gè)權(quán)限是當(dāng)前所屬用戶root具有對core-site.xml文件的可讀，可寫權(quán)限。

6.使用代碼準(zhǔn)備下載文件

@Test
public void getConfig()throws ?Exception{FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration(),"root");fileSystem.copyToLocalFile(new Path("/config/core-site.xml"),new Path("file:///c:/core-site.xml"));fileSystem.close();
}

當(dāng)HDFS的權(quán)限開啟之后，運(yùn)行以上代碼發(fā)現(xiàn)權(quán)限拒絕，不允許訪問。

這是因?yàn)槲覀冊赪indows下運(yùn)行HDFS的客戶端，用戶名一般不是root，是其他用戶，所以對core-site.xml文件沒有任何操作權(quán)限。

解決方法:

方式1-修改core-site.xml的文件權(quán)限

hadoop fs -chmod 777/config/core-site.xml

方式2-偽造用戶

在這里，我們可以以root用戶的身份去訪問文件

@Test
public void getConfig()throws ?Exception{FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration(),"root");fileSystem.copyToLocalFile(new Path("/config/core-site.xml"),new Path("file:///c:/core-site.xml"));fileSystem.close();
}

執(zhí)行結(jié)果如下

執(zhí)行成功

📢博客主頁：https://lansonli.blog.csdn.net
📢歡迎點(diǎn)贊 👍 收藏 ?留言 📝 如有錯(cuò)誤敬請指正！
📢本文由 Lansonli 原創(chuàng)，首發(fā)于 CSDN博客🙉
📢大數(shù)據(jù)系列文章會每天更新，停下休息的時(shí)候不要忘了別人還在奔跑，希望大家抓緊時(shí)間學(xué)習(xí)，全力奔赴更美好的生活?

總結(jié)

以上是生活随笔為你收集整理的2021年大数据Hadoop（十二）：HDFS的API操作的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： 2021年大数据Hadoop（十一）：H
下一篇： 2021年大数据Hadoop（十四）：H

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

2021年大数据Hadoop（十二）：HDFS的API操作

HDFS的API操作

HDFS的JAVA API操作

配置Windows下Hadoop環(huán)境

導(dǎo)入Maven依賴

使用文件系統(tǒng)方式訪問數(shù)據(jù)

1、涉及的主要類

2、獲取FileSystem方式

3、遍歷HDFS中所有文件

4、HDFS上創(chuàng)建文件夾

???????5、下載文件-方式1??????????????

6、下載文件-方式2

???????7、上傳文件

8、小文件合并

???????9、hdfs訪問權(quán)限控制

總結(jié)