following is already been achieved
[take 2] OK, so you can't properly "stream" data into Hive. But you can add a periodic compaction post-processing job...
(role='collectA'), (role='collectB'), (role='archive')(role='activeA')(role='activeB')then dump every record that you have collected in the "A" partition into "archive", hoping that Hive default config will do a good job of limiting fragmentation
INSERT INTO TABLE twitter_data PARTITION (role='archive')
SELECT ...
FROM twitter_data WHERE role='activeA'
;
TRUNCATE TABLE twitter_data PARTITION (role='activeA')
;
at some point, switch back to "A" etc.
One last word: if Hive still creates too many files on each compaction job, then try tweaking some parameters in your session, just before the INSERT e.g.
set hive.merge.mapfiles =true;
set hive.merge.mapredfiles =true;
set hive.merge.smallfiles.avgsize=1024000000;