Write large text file data into excel

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-05 09:11:18

A recent version of apache-poi has sxssf. Shameless copy from website

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

I had used it for creating spreadsheet with 1.5 million rows.

I will answer with respect to Aspose.Cells for Java, since you tried it too.

Creating or loading a very large Excel file almost always requires large memory. Even if you read single line or multiple lines at a time, still you will be writing the contents to instance of Workbook, which is loaded into memory.

Solution 1 (Not good and very limited): Increase the heap size, if the max heap size allowed works for your largest file, opt for it.

Solution 2 (Complex with some manual work): Excel 2007 and later allows around 1 million rows per sheet. I would suggest you to create one workbook with only one sheet for 1 million rows. That is, if you have 10 million lines in text file, create 10 separate Excel workbooks.

Later on, combine them in a single Excel workbook manually. Aspose.Cells will give out of memory exception when copying sheets with such a huge data.

Below is the code snippet that creates 10 separate Excel files, each having 1 million rows.

import com.aspose.cells.*;
import java.util.*;

public class ExcelLargeTextImport
{
    private static String excelFile = Common.dataDir + "largedata.xlsx";

    public static void main(String args[])
    {
        try
        {
            Common.setLicenses();
            importToExcel();
        }
        catch(Exception ex)
        {
            System.out.println(ex.getMessage());
        }
    }

    private static void importToExcel() throws Exception
    {
        // Process each workbook in a method
        for (int sheetCounter=0 ; sheetCounter<10 ; sheetCounter++)
        {
            saveWorkbook(sheetCounter);
        }
    }

    private static void saveWorkbook(int sheetCounter) throws Exception
    {
        Workbook workbook = new Workbook();
        // Get the first sheet 
        Worksheet worksheet = workbook.getWorksheets().get(0);
        Cells cells = worksheet.getCells();

        // Initialize array list with 1 million records
        ArrayList<String> lines = new ArrayList<String>();
        int rowCount = 1000000;
        for (int i=0 ; i<rowCount ; i++)
        {
            lines.add(i + ";value1;value2;value3");
        }

        long lineNo = 1;
        for (String line : lines)
        {
            // Split the line by delimeter
            String[] values = line.split(";");

            // First cell
            Cell cell = cells.get("A" + lineNo);
            cell.setValue(values[0]);

            // Second cell
            cell = cells.get("B" + lineNo);
            cell.setValue(values[1]);

            // Third cell
            cell = cells.get("C" + lineNo);
            cell.setValue(values[2]);

            // Fourth cell
            cell = cells.get("D" + lineNo);
            cell.setValue(values[2]);

            lineNo++;
        }
        System.out.print(sheetCounter + " ");

        // Saving the Excel file
        workbook.save(excelFile.replace(".xlsx", sheetCounter + ".xlsx"));

        System.out.println("\nExcel file created");
    }
}

PS. I am a Developer Evangelist at Aspose.

Juned Ahsan

Why don't you do the read and write in chunks. Here is the approach I can think about:

  • Read your txt file for few lines and put the information in map as you are doing. Suppose you read 100 lines and you have 100 entries in your map.
  • Write these hundred entries to excel file you, first time genarate the excel
  • Empty your map or re-initialize.
  • Now read next 100 lines form text. So as I understand there's no way to directly access the 101th line without reading the first 100 lines. So you may have to read file from the start but you can avoid the first 100 lines and creating the entry in map.
  • Now update the excel file. I think you can update the excel using POI as mentioned on this link: Edit existing excel files using jxl api / Apache POI

If you keep iterating this process. You will surely save your memory consumption although I don't see a major difference in cpu consumption.

Hope it helps!

Bhushankumar Lilapara

Here is your answer...

Try this simple code & If you need further future you can ad...

https://stackoverflow.com/a/16479713/1490962

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!