I\'m working on a web crawler (please don\'t suggest an existing one, its not an option). I have it working the way it is expected to. My only issue is that currently I\'m u
I suggest using EhCache for this, even though what you're building isn't really a cache. EhCache allows you to configure the cache instance so that it overflows to disc storage, while keeping the most recent items in memory. It can also be configured to be disc-persistent, i.e. data is flushed to disc on shutdown, and read back into memory at startup. On top of all that, it's key-value based, so it already fits your model. It supports concurrent access, and since the disk storage is managed as a separate thread, you shouldn't need to worry about disk access concurrency.
Alternatively, you could consider a proper embedded database such as Hypersonic (or numerous others of a similar style), but that's probably going to be more work.
JDBM2 library provides persistent maps for Java. Its fast and thread-safe.
UPDATE: Evolved into MapDB project
Chronicle Map is an embeddable, hash-based Java data store, persisting the data to disk (to a single file), which targets to be a drop-in replacement of ConcurrentHashMap (provides the same ConcurrentMap interface). Chronicle Map is the fastest store among similar solutions and features excellent read/write concurrency, scaling almost linearly to the number of available cores in the machine.
Disclaimer: I'm the developer of Chronicle Map.
what about using JPA in your class, and persist data in a database (that can be text based like sqlite) http://en.wikipedia.org/wiki/Java_Persistence_API
There is Tokyo Cabinet, which is a fast implementation of a disk-based hash table.
In your case, I think the best way to store values in such a setup would be to prefix the metadata keys with the url:
[url]_[name] => [value]
[url]_[name2] => [value2]
Unfortunately, I'm not sure you can enumerate the metadata for a given URL, using this solution.
If you want to use a more structured data store, there are also MongoDB, and SQLite, which I would recommend.