XML Split of a Large file

前端 未结 10 1499
心在旅途
心在旅途 2021-01-04 00:41

I have a 15 GB XML file which I would want to split it .It has approximately 300 Million lines in it . It doesn\'t have any top nodes which are interdependent .Is there any

10条回答
  •  遥遥无期
    2021-01-04 01:02

    Used this for splitting Yahoo Q&A dataset
    
        count = 0
        file_count = 1
        with open('filepath') as f:
    
        current_file = ""
    
        for line in f:
            current_file = current_file + line
    
            if "" in line:
                count = count + 1
    
            if count==50000:
                current_file = current_file + ""
                with open('filepath/Split/file_' +str(file_count)+'.xml' , 'w') as split:
                    split.write(current_file)
                file_count = file_count + 1
                current_file = "\n"
                count = 0
    
    current_file = current_file + ""
    with open('filepath/Split/file_' +str(file_count)+'.xml' , 'w') as split:
        split.write(current_file)
    

提交回复
热议问题