I have this scenario in which memory conservation is paramount. I am trying to read in > 1 GB of Peptide sequences into memory and group peptide instances together that shar
Use a Dictionary<string, Peptide>
.
Basically you could reimplement HashSet<T>
yourself, but that's about the only solution I'm aware of. The Dictionary<Peptide, Peptide>
or Dictionary<string, Peptide>
solution is probably not that inefficient though - if you're only wasting a single reference per entry, I would imagine that would be relatively insignificant.
In fact, if you remove the hCode
member from Peptide
, that will safe you 4 bytes per object which is the same size as a reference in x86 anyway... there's no point in caching the hash as far as I can tell, as you'll only compute the hash of each object once, at least in the code you've shown.
If you're really desperate for memory, I suspect you could store the sequence considerably more efficiently than as a string
. If you give us more information about what the sequence contains, we may be able to make some suggestions there.
I don't know that there's any particularly strong reason why HashSet
doesn't permit this, other than that it's a relatively rare requirement - but it's something I've seen requested in Java as well...