Inspect Parquet from command line

前端 未结 9 1497
再見小時候
再見小時候 2020-12-07 20:26

How do I inspect the content of a Parquet file from the command line?

The only option I see now is

$ hadoop fs -get my-path local-file
$ parquet-tool         


        
9条回答
  •  长情又很酷
    2020-12-07 20:51

    On Windows 10 x64 I ended up building parquet-reader just now from source:

    Windows 10 + WSL + GCC

    Installed WSL with Ubuntu LTS 18.04. Upgraded gcc to v9.2.1 and CMake to latest. Bonus: install Windows Terminal.

    git checkout https://github.com/apache/arrow
    cd arrow
    cd cpp
    mkdir buildgcc
    cd buildgcc
    cmake .. -DPARQUET_BUILD_EXECUTABLES=ON -DARROW_PARQUET=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DPARQUET_BUILD_EXAMPLES=ON -DARROW_CSV=ON
    make -j 20
    cd release
    ./parquet-reader
    Usage: parquet-reader [--only-metadata] [--no-memory-map] [--json] [--dump] [--print-key-value-metadata] [--columns=...] 
    

    If it has trouble building, may have to use vcpkg for the missing libraries.

    Also see a another solution that offers less, but in a simpler way: https://github.com/chhantyal/parquet-cli

    Linked from: How can I write streaming/row-oriented data using parquet-cpp without buffering?

    Initially tried brew install parquet-tools, but this did not appear to work under my install of WSL

    Windows 10 + MSVC

    Same as above. Use CMake to generate the Visual Studio 2019 project, then build.

    git checkout https://github.com/apache/arrow
    cd arrow
    cd cpp
    mkdir buildmsvc
    cd buildmsvc
    cmake .. -DPARQUET_BUILD_EXECUTABLES=ON -DARROW_PARQUET=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DPARQUET_BUILD_EXAMPLES=ON -DARROW_CSV=ON
    # Then open the generated .sln file in MSVC and build. Everything should build perfectly.
    

    Troubleshooting:

    In case there was any missing libraries, I pointed it at my install of vcpkg. I ran vcpkg integrate install, then copied the to the end of the CMake line:

    -DCMAKE_TOOLCHAIN_FILE=[...path...]/vcpkg/scripts/buildsystems
    

    If it had complained about any missing libraries, I would have installed these, e.g. boost, etc using commands like vcpkg install boost:x64.

提交回复
热议问题