Reading HDFS and local files in Java

前端 未结 3 1666
难免孤独
难免孤独 2020-12-08 10:45

I want to read file paths irrespective of whether they are HDFS or local. Currently, I pass the local paths with the prefix file:// and HDFS paths with the prefix hdfs:// an

3条回答
  •  被撕碎了的回忆
    2020-12-08 11:24

    Please check the code snippet below that list files from HDFS path; namely the path string that starts with hdfs://. If you can provide Hadoop configuration and local path it will also list files from local file system; namely the path string that starts with file://.

        //helper method to get the list of files from the HDFS path
        public static List listFilesFromHDFSPath(Configuration hadoopConfiguration, String hdfsPath,
                                                         boolean recursive)
        {
            //resulting list of files
            List filePaths = new ArrayList();
            FileSystem fs = null;
    
            //try-catch-finally all possible exceptions
            try
            {
                //get path from string and then the filesystem
                Path path = new Path(hdfsPath);  //throws IllegalArgumentException, all others will only throw IOException
                fs = path.getFileSystem(hadoopConfiguration);
    
                //resolve hdfsPath first to check whether the path exists => either a real directory or o real file
                //resolvePath() returns fully-qualified variant of the path
                path = fs.resolvePath(path);
    
    
                //if recursive approach is requested
                if (recursive)
                {
                    //(heap issues with recursive approach) => using a queue
                    Queue fileQueue = new LinkedList();
    
                    //add the obtained path to the queue
                    fileQueue.add(path);
    
                    //while the fileQueue is not empty
                    while (!fileQueue.isEmpty())
                    {
                        //get the file path from queue
                        Path filePath = fileQueue.remove();
    
                        //filePath refers to a file
                        if (fs.isFile(filePath))
                        {
                            filePaths.add(filePath.toString());
                        }
                        else   //else filePath refers to a directory
                        {
                            //list paths in the directory and add to the queue
                            FileStatus[] fileStatuses = fs.listStatus(filePath);
                            for (FileStatus fileStatus : fileStatuses)
                            {
                                fileQueue.add(fileStatus.getPath());
                            } // for
                        } // else
    
                    } // while
    
                } // if
                else        //non-recursive approach => no heap overhead
                {
                    //if the given hdfsPath is actually directory
                    if (fs.isDirectory(path))
                    {
                        FileStatus[] fileStatuses = fs.listStatus(path);
    
                        //loop all file statuses
                        for (FileStatus fileStatus : fileStatuses)
                        {
                            //if the given status is a file, then update the resulting list
                            if (fileStatus.isFile())
                                filePaths.add(fileStatus.getPath().toString());
                        } // for
                    } // if
                    else        //it is a file then
                    {
                        //return the one and only file path to the resulting list
                        filePaths.add(path.toString());
                    } // else
    
                } // else
    
            } // try
            catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException
            {
                ex.printStackTrace();
    
                //if some problem occurs return an empty array list
                return new ArrayList();
            } //
            finally
            {
                //close filesystem; not more operations
                try
                {
                    if(fs != null)
                        fs.close();
                } catch (IOException e)
                {
                    e.printStackTrace();
                } // catch
    
            } // finally
    
    
            //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories
            return filePaths;
        } // listFilesFromHDFSPath
    

    If you really want to work with java.io.File API then the following method will help you list files only from local file system; namely path string that starts with file://.

        //helper method to list files from the local path in the local file system
        public static List listFilesFromLocalPath(String localPathString, boolean recursive)
        {
            //resulting list of files
            List localFilePaths = new ArrayList();
    
            //get the Java file instance from local path string
            File localPath = new File(localPathString);
    
    
            //this case is possible if the given localPathString does not exit => which means neither file nor a directory
            if(!localPath.exists())
            {
                System.err.println("\n" + localPathString + " is neither a file nor a directory; please provide correct local path");
    
                //return with empty list
                return new ArrayList();
            } // if
    
    
            //at this point localPath does exist in the file system => either as a directory or a file
    
    
            //if recursive approach is requested
            if (recursive)
            {
                //recursive approach => using a queue
                Queue fileQueue = new LinkedList();
    
                //add the file in obtained path to the queue
                fileQueue.add(localPath);
    
                //while the fileQueue is not empty
                while (!fileQueue.isEmpty())
                {
                    //get the file from queue
                    File file = fileQueue.remove();
    
                    //file instance refers to a file
                    if (file.isFile())
                    {
                        //update the list with file absolute path
                        localFilePaths.add(file.getAbsolutePath());
                    } // if
                    else   //else file instance refers to a directory
                    {
                        //list files in the directory and add to the queue
                        File[] listedFiles = file.listFiles();
                        for (File listedFile : listedFiles)
                        {
                            fileQueue.add(listedFile);
                        } // for
                    } // else
    
                } // while
            } // if
            else        //non-recursive approach
            {
                //if the given localPathString is actually a directory
                if (localPath.isDirectory())
                {
                    File[] listedFiles = localPath.listFiles();
    
                    //loop all listed files
                    for (File listedFile : listedFiles)
                    {
                        //if the given listedFile is actually a file, then update the resulting list
                        if (listedFile.isFile())
                            localFilePaths.add(listedFile.getAbsolutePath());
                    } // for
                } // if
                else        //it is a file then
                {
                    //return the one and only file absolute path to the resulting list
                    localFilePaths.add(localPath.getAbsolutePath());
                } // else
            } // else
    
    
            //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories
            return localFilePaths;
        } // listFilesFromLocalPath
    

提交回复
热议问题