问题
I have one very large (10mb) csv file. I parsed it and put it into memory using a generic list.
I created a class to represent each line. This class has only several fields (data type ip-address, string).
I thoguht that since the file is only 10 megabytes I could expect a similar size in-memory.
I was quite surprised when I found out that the method that is creating the list is allocating 300 mb and not freeing it up.
Is this normal, and what can be causing this.
Note that the csv file has many lines (100 000 +) this could be a factor.
Namespace Geo Public Class CountryMarker Public StartAddress As IPAddress Public EndAddress As IPAddress Public Country As String Public CountryCode As String End Class
Public Class Markers
Private Const DatabasePath = "~/App_Data/ip.csv" '10 MB file
Public Shared List As List(Of CountryMarker) = LoadData()
Shared Function LoadData() As List(Of CountryMarker)
Dim Markers As New List(Of CountryMarker)
Using Stream = New IO.FileStream(Hosting.HostingEnvironment.MapPath(DatabasePath), FileMode.Open)
Dim Reader = New StreamReader(Stream)
Do While Reader.Peek > -1
Dim Line = Reader.ReadLine()
Dim Values = Line.Split(",").Select(Function(i) i.Replace("""", ""))
Markers.Add(New CountryMarker With {.Country = Values(5), .CountryCode = Values(4), .StartAddress = IPAddress.Parse(Values(0)), .EndAddress = IPAddress.Parse(Values(1))})
Loop
End Using
Return Markers
End Function
End Class
End Namespace
回答1:
First, if the file is ASCII text or UTF-8 with predominately Western European characters (like English), then the in-memory size of the text will be at least double the file's size on disk. .NET stores strings as 16-bit Unicode values. So "A", for example, which takes one byte in a text file, requires two bytes in memory.
Each class instance that you create is going to require at least 24 bytes (16 bytes of allocation, plus 8 bytes for the reference.) If your file is 100,000 lines, that's 2.4 megabytes, minimum. In addition, every string that you allocate will require 24 bytes, plus whatever is required for the string. Things add up quick.
(Note that my 24 bytes number is for a 64-bit system. It's 16 bytes per allocation in the 32-bit runtime.)
As others have commented, it's impossible to give you any more detail unless you post some code, including your class definition.
As to not freeing up any memory: that's kind of difficult to prove. Maybe the garbage collector just hasn't gotten around to doing a collection yet. If it sees no memory pressure (i.e. there's plenty of memory available and no other process is begging for memory), the GC might decide it doesn't need to collect yet.
回答2:
In addition to Jim's comment, if you read a lot of items into a List, it will internally reallocate memory at exponentially increasing chunk sizes. I don't know the exact heuristic, but consider that there is no realloc in .NET - if you use Reflector, you'll see that even Array.Resize will allocate a brand new array.
Suppose you allocated 2049 objects, and assume that List will double the buffer size when it needs more space. You will get 1, 2, 4 .. 1024, 2048, and finally 4096 - almost double what you required (this is the worst case).
One solution is to call List.TrimExcess(). This will get the array back down to reasonable size. A better solution is to estimate how many items you need to store and pass that as the initial capacity to the List constructor.
Without seeing the code for your parser and class, however, it's hard to say how much this is contributing to your memory usage issue.
来源:https://stackoverflow.com/questions/5865024/putting-csv-file-into-memory