Is there a workaround for Java's poor performance on walking huge directories?

前端 未结 10 779
予麋鹿
予麋鹿 2020-12-02 23:30

I am trying to process files one at a time that are stored over a network. Reading the files is fast due to buffering is not the issue. The problem I have is just listing

10条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-03 00:24

    If you're on Java 1.5 or 1.6, shelling out "dir" commands and parsing the standard output stream on Windows is a perfectly acceptable approach. I've used this approach in the past for processing network drives and it has generally been a lot faster than waiting for the native java.io.File listFiles() method to return.

    Of course, a JNI call should be faster and potentially safer than shelling out "dir" commands. The following JNI code can be used to retrieve a list of files/directories using the Windows API. This function can be easily refactored into a new class so the caller can retrieve file paths incrementally (i.e. get one path at a time). For example, you can refactor the code so that FindFirstFileW is called in a constructor and have a seperate method to call FindNextFileW.

    JNIEXPORT jstring JNICALL Java_javaxt_io_File_GetFiles(JNIEnv *env, jclass, jstring directory)
    {
        HANDLE hFind;
        try {
    
          //Convert jstring to wstring
            const jchar *_directory = env->GetStringChars(directory, 0);
            jsize x = env->GetStringLength(directory);
            wstring path;  //L"C:\\temp\\*";
            path.assign(_directory, _directory + x);
            env->ReleaseStringChars(directory, _directory);
    
            if (x<2){
                jclass exceptionClass = env->FindClass("java/lang/Exception");
                env->ThrowNew(exceptionClass, "Invalid path, less than 2 characters long.");
            }
    
            wstringstream ss;
            BOOL bContinue = TRUE;
            WIN32_FIND_DATAW data;
            hFind = FindFirstFileW(path.c_str(), &data);
            if (INVALID_HANDLE_VALUE == hFind){
                jclass exceptionClass = env->FindClass("java/lang/Exception");
                env->ThrowNew(exceptionClass, "FindFirstFileW returned invalid handle.");
            }
    
    
            //HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
            //DWORD dwBytesWritten;
    
    
            // If we have no error, loop thru the files in this dir
            while (hFind && bContinue){
    
              /*
              //Debug Print Statment. DO NOT DELETE! cout and wcout do not print unicode correctly.
                WriteConsole(hStdOut, data.cFileName, (DWORD)_tcslen(data.cFileName), &dwBytesWritten, NULL);
                WriteConsole(hStdOut, L"\n", 1, &dwBytesWritten, NULL);
                */
    
              //Check if this entry is a directory
                if (data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY){
                    // Make sure this dir is not . or ..
                    if (wstring(data.cFileName) != L"." &&
                        wstring(data.cFileName) != L"..")
                    {   
                        ss << wstring(data.cFileName) << L"\\" << L"\n";
                    }
                }
                else{
                    ss << wstring(data.cFileName) << L"\n";
                }
                bContinue = FindNextFileW(hFind, &data);
            }   
            FindClose(hFind); // Free the dir structure
    
    
    
            wstring cstr = ss.str();
            int len = cstr.size();
            //WriteConsole(hStdOut, cstr.c_str(), len, &dwBytesWritten, NULL);
            //WriteConsole(hStdOut, L"\n", 1, &dwBytesWritten, NULL);
            jchar* raw = new jchar[len];
            memcpy(raw, cstr.c_str(), len*sizeof(wchar_t));
            jstring result = env->NewString(raw, len);
            delete[] raw;
            return result;
        }
        catch(...){
            FindClose(hFind);
            jclass exceptionClass = env->FindClass("java/lang/Exception");
            env->ThrowNew(exceptionClass, "Exception occured.");
        }
    
        return NULL;
    }
    

    Credit: https://sites.google.com/site/jozsefbekes/Home/windows-programming/miscellaneous-functions

    Even with this approach, there are still efficiencies to be gained. If you serialize the path to a java.io.File, there is a huge performance hit - especially if the path represents a file on a network drive. I have no idea what Sun/Oracle is doing under the hood but if you need additional file attributes other than the file path (e.g. size, mod date, etc), I have found that the following JNI function is much faster than instantiating a java.io.File object on a network the path.

    JNIEXPORT jlongArray JNICALL Java_javaxt_io_File_GetFileAttributesEx(JNIEnv *env, jclass, jstring filename)
    {   
    
      //Convert jstring to wstring
        const jchar *_filename = env->GetStringChars(filename, 0);
        jsize len = env->GetStringLength(filename);
        wstring path;
        path.assign(_filename, _filename + len);
        env->ReleaseStringChars(filename, _filename);
    
    
      //Get attributes
        WIN32_FILE_ATTRIBUTE_DATA fileAttrs;
        BOOL result = GetFileAttributesExW(path.c_str(), GetFileExInfoStandard, &fileAttrs);
        if (!result) {
            jclass exceptionClass = env->FindClass("java/lang/Exception");
            env->ThrowNew(exceptionClass, "Exception Occurred");
        }
    
      //Create an array to store the WIN32_FILE_ATTRIBUTE_DATA
        jlong buffer[6];
        buffer[0] = fileAttrs.dwFileAttributes;
        buffer[1] = date2int(fileAttrs.ftCreationTime);
        buffer[2] = date2int(fileAttrs.ftLastAccessTime);
        buffer[3] = date2int(fileAttrs.ftLastWriteTime);
        buffer[4] = fileAttrs.nFileSizeHigh;
        buffer[5] = fileAttrs.nFileSizeLow;
    
        jlongArray jLongArray = env->NewLongArray(6);
        env->SetLongArrayRegion(jLongArray, 0, 6, buffer);
        return jLongArray;
    }
    

    You can find a full working example of this JNI-based approach in the javaxt-core library. In my tests using Java 1.6.0_38 with a Windows host hitting a Windows share, I have found this JNI approach approximately 10x faster then calling java.io.File listFiles() or shelling out "dir" commands.

提交回复
热议问题