What architecture to use to address this SystemOutOfMemoryException while allowing me to instantiate the cells of a sheet?

Summary

This question is the follow-up of a a desire to architect a simple spreadsheet API while keeping it user-friendly to those who know Excel well.

To sum it up, this question is related to these below two:
1. How to implement column self-naming from its index?;
2. How to make this custom worksheet initialization faster?.

Objective

To provide a simplified Excel API used as a wrapper over the nevralgic components such as the Application, the Workbook, the Worksheet and the Range classes/interfaces while exposing only the most commonly used object properties for each of these.

Usage example

This usage example is inspired from the unit tests that allowed me to bring this solution up to where it stands now.

Dim file as String = "C:\Temp\WriteTest.xls"

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb as IWorkbook = mgr.CreateWorkbook()
    wb.Sheets("Sheet1").Cells("A1").Value = 3.1415926
    wb.SaveAs(file)
End Using

And now we open it:

Dim file as String = "C:\Temp\WriteTest.xls"

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb as IWorkbook = mgr.OpenWorkbook(file)
    // Working with workbook here...
End Using

Discussion

While instantiating an Excel Workbook:

An instance of a Worksheet is automatically initialized in the Workbook.Sheets collection;
Upon initialization, a Worksheet initializes its Cells through the Range object that can represent one or multiple cells.

These Cells are immediately accessible with all their properties as soon as the Worksheet exists.

My wish is to reproduce this behaviour so that

The Workbook class constructor initializes the Workbook.Sheets collection property with the native sheets;
The Worksheet class constructor initializes the Worksheet.Cells collection property with the native cells.

My problem comes from the Worksheet class constructor while initializing the Worksheet.Cells collection property illustrated at #2.

Thoughts

Following these above-linked questions encountered issues, I wish to figure out another architecture that would allow me to:

Access specific feature of a cell Range when required;
Deliver most commonly used properties through my ICell interface;
Having access to all of the Range cells of a worksheet from its initialization.

While keeping in mind that accessing a Range.Value property is the fastest interaction possible with the underlying Excel application instance using the Interop.

So, I thought of initializing my ReadonlyOnlyDictionary(Of String, ICell) with the name of the cells without immediately wrapping an instance of the Range interface so that I would simply generate the row and column indexes along with the cell's name to index my dictionary, then, assigning the Cell.NativeCell property only when one wants to access or format a specific cell or cell range.

That way, the data in the dictionary would be indexed with the name of the cells obtained from the column indexes generated in the Worksheet class constructor. Then, when one would do this:

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb As IWorkbook = mgr.CreateWorkbook()
    wb.Sheet(1).Cells("A1").Value = 3.1415926 // #1:
End Using

#1: This would allow me to use the indexes from my Cell class to write the given value to the specific cell, which is faster then using its name directly against the Range.

Questions and Concerns

Besides, when working with UsedRange.get_Value() or Cells.get_Value(), this returns Object(,) arrays.

1. So should I just be happy with working with Object(,) arrays for cells, without having the possibility to format it somehow?

2. How to architect these Worksheet and Cell classes so that I have the best performance offered while working with Object(,) arrays, while keeping the possibility that a Cell instance may represent or wrap a single cell Range?

Thanks to any of you who takes the time to read my post and my sincerest thanks to those who answer.

The used architecture has gone through an object class that I named CellCollection. Here's what it does:

Based on these hypothesis:

Given that an Excel worksheet has 256 columns and 65536 lines;

Given that 16,777,216 (256 * 65536) cells needed to be instantiated at a time;

Given that the most common use of a worksheet takes less then 1,000 lines and less than 100 columns;

Given that I needed it to be able to refer to the cells with their addresses ("A1"); and

Given that it is benchmarked that accessing all the values at once and load them into a object[,] in memory as being the fastest way to work with an underlying Excel worksheet,*

I have considered not to instantiate any of the cells, letting my CellCollection property within my IWorksheet interface initialized and empty upon instantiation, except for an existing workbook. So, when opening a workbook, I verify that NativeSheet.UsedRange is empty or return null (Nothing in Visual Basic), otherwise, I have already gotten the used "native cells" in memory so that only remains to add them in my internal CellCollection dictionary while indexing them with their respective address.

Finally, Lazy Initialization Design Pattern to the rescue! =)

public class Sheet : ISheet {
    public Worksheet(Microsoft.Office.Interop.Excel.Worksheet nativeSheet) {
        NativeSheet = nativeSheet;
        Cells = new CellCollection(this);
    }

    public Microsoft.Office.Interop.Excel.Worksheet NativeSheet { get; private set; }

    public CellCollection Cells { get; private set; }
}

public sealed class CellCollection {
    private IDictionary<string, ICell> _cells;
    private ReadOnlyDictionary<string, ICell> _readonlyCells;

    public CellCollection(ISheet sheet) {
        _cells = new Dictionary<string, ICell>();
        _readonlyCells = new ReadonlyDictionary<string, ICell>(_cells);
        Sheet = sheet;
    }

    public readonly ReadOnlyDictionary<string, ICell> Cells(string addresses) {
        get {
            if (string.IsNullOrEmpty(addresses) || 0 = address.Trim().Length)
                throw new ArgumentNullException("addresses");

            if (!Regex.IsMatch(addresses, "(([A-Za-z]{1,2,3}[0-9]*)[:,]*)"))
                throw new FormatException("addresses");

            foreach(string address in addresses.Split(",") {
                Microsoft.Office.Interop.Excel.Range range = Sheet.NativeSheet.Range(address)

                foreach(Microsoft.Office.Interop.Excel.Range cell in range) {
                    ICell c = null;
                    if (!_cells.TryGetValue(cell.Address(false, false), c)) { 
                        c = new Cell(cell);
                        _cells.Add(c.Name, c);
                    }
                }
            }

            return _readonlyCells;
        }
    }

    public readonly ISheet Sheet { get; private set; }
}

Obviously, this is a first try shot, and it works just fine so far, with more than acceptable performance. Humbly though, I feel like it could use some optimizations, though I will use it this way for now, and optimize it later if needed.

After having written this collection, I was able to come to the expected behaviour. Now, I shall try to implement some of the .NET interfaces to make it useable against some IEnumerable, IEnumerable<T>, ICollection, ICollection<T>, etc. so that it may respectively be considered as a true .NET collection.

Feel free to comment and bring constructive alternatives and/or changes to this code so that it may become even greater than it currently is.

I DO hope this will serve one's purpose someday.

Thanks for reading! =)

来源：https://stackoverflow.com/questions/5183833/what-architecture-to-use-to-address-this-systemoutofmemoryexception-while-allowi

标签

.net

vb.net

performance

excel

interop