How to write TIMESTAMP logical type (INT96) to parquet, using ParquetWriter?

前端 未结 2 1512
梦毁少年i
梦毁少年i 2020-12-30 15:03

I have a tool that uses a org.apache.parquet.hadoop.ParquetWriter to convert CSV data files to parquet data files.

Currently, it only handles int32,

相关标签:
2条回答
  • 2020-12-30 15:07

    I figured it out, using this code from spark sql as a reference.

    The INT96 binary encoding is split into 2 parts: First 8 bytes are nanoseconds since midnight Last 4 bytes is Julian day

    String value = "2019-02-13 13:35:05";
    
    final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
    final long NANOS_PER_MINUTE = TimeUnit.MINUTES.toNanos(1);
    final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1);
    
    // Parse date
    SimpleDateFormat parser = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
    cal.setTime(parser.parse(value));
    
    // Calculate Julian days and nanoseconds in the day
    LocalDate dt = LocalDate.of(cal.get(Calendar.YEAR), cal.get(Calendar.MONTH)+1, cal.get(Calendar.DAY_OF_MONTH));
    int julianDays = (int) JulianFields.JULIAN_DAY.getFrom(dt);
    long nanos = (cal.get(Calendar.HOUR_OF_DAY) * NANOS_PER_HOUR)
            + (cal.get(Calendar.MINUTE) * NANOS_PER_MINUTE)
            + (cal.get(Calendar.SECOND) * NANOS_PER_SECOND);
    
    // Write INT96 timestamp
    byte[] timestampBuffer = new byte[12];
    ByteBuffer buf = ByteBuffer.wrap(timestampBuffer);
    buf.order(ByteOrder.LITTLE_ENDIAN).putLong(nanos).putInt(julianDays);
    
    // This is the properly encoded INT96 timestamp
    Binary tsValue = Binary.fromReusedByteArray(timestampBuffer);
    
    
    0 讨论(0)
  • 2020-12-30 15:32
    1. INT96 timestamps use the INT96 physical type without any logical type, so don't annotate them with anything.
    2. If you are interested in the structure of an INT96 timestamp, take a look here. If you would like to see sample code that converts to and from this format, take a look at this file from Hive.
    0 讨论(0)
提交回复
热议问题