问题
I have a daily ingestion of data in to HDFS . From data into HDFS I generate Hive external tables partitioned by date . My qestion is as follows , should I run MSCK REPAIR TABLE tablename after each data ingestion , in this case I have to run the command each day. Or running it just one time at the table creation is enough . Thanks a lot for your answers
Best regards
回答1:
You only run MSCK REPAIR TABLE while the structure or partition of the external table is changed. This command updates the metadata of the table.
One example that usually happen, e.g.
You use a field dt which represent a date to partition the table.
- Yesterday, you inserted some data which is
dt=2018-06-12, then you should runMSCK REPAIR TABLEto update the metadata to tell hive to aware a new partitiondt=2018-06-12. - Today, you insert some data which is
dt=2018-06-13, then you should runMSCK REPAIR TABLEto update the metadata to tell hive to aware a new partitiondt=2018-06-13.
来源:https://stackoverflow.com/questions/50832059/msck-repair-hive-external-tables