My goal is to split and sort WARC file from CommonCrawl into its individual records. Example file:
WARC/1.0 WARC-Type: warcinfo WARC-Date: 2020-08-04T01:43:40