hashset | 易学教程

What is the difference between set and hashset in C++ STL?

阅读更多关于 What is the difference between set and hashset in C++ STL?

问题 When should I choose one over the other? Are there any pointers that you would recommend for using the right STL containers? 回答1: hash_set is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set , so it will be faster in most circumstances. Another difference will be seen when you iterate through the containers. set will deliver the contents in sorted order, while hash_set will be essentially random (Thanks Lou Franco). Edit: The C++11 update

HashSet vs ArrayList contains performance

阅读更多关于 HashSet vs ArrayList contains performance

问题 When processing large amounts of data I often find myself doing the following: HashSet<String> set = new HashSet<String> (); //Adding elements to the set ArrayList<String> list = new ArrayList<String> (set); Something like "dumping" the contents of the set in the list. I usually do this since the elements I add often contain duplicates I want to remove, and this seems like an easy way to remove them. With only that objective in mind (avoiding duplicates) I could also write: ArrayList<String>

.NET: How to efficiently check for uniqueness in a List<string> of 50,000 items?

阅读更多关于 .NET: How to efficiently check for uniqueness in a List of 50,000 items?

问题 In some library code, I have a List that can contain 50,000 items or more. Callers of the library can invoke methods that result in strings being added to the list. How do I efficiently check for uniqueness of the strings being added? Currently, just before adding a string, I scan the entire list and compare each string to the to-be-added string. This starts showing scale problems above 10,000 items. I will benchmark this, but interested in insight. if I replace the List<> with a Dictionary<>

Java: optimize hashset for large-scale duplicate detection

阅读更多关于 Java: optimize hashset for large-scale duplicate detection

I am working on a project where I am processing a lot of tweets; the goal is to remove duplicates as I process them. I have the tweet IDs, which come in as strings of the format "166471306949304320" I have been using a HashSet<String> for this, which works fine for a while. But by the time I get to around 10 million items I am drastically bogged down and eventually get a GC error, presumably from the rehashing. I tried defining a better size/load with tweetids = new HashSet<String>(220000,0.80F); and that lets it get a little farther, but is still excruciatingly slow (by around 10 million it

Why initialize HashSet<>(0) to zero?

阅读更多关于 Why initialize HashSet(0) to zero?

I love a HashSet<>() and use this eagerly while initializing this with the default constructor: Set<Users> users = new HashSet<>(); Now, my automatic bean creator (JBoss tools) initializes this as: Set<Users> users = new HashSet<>(0); Why the zero ? The API tells me that this is the initial capacity , but what is the advantage of putting this to zero? Is this advised? The default initial capacity is 16 , so by passing in 0 you may save a few bytes of memory if you end up not putting anything in the set. Other than that there is no real advantage; when you pass 0 the set is created with a

how to find and return objects in java hashset

阅读更多关于 how to find and return objects in java hashset

According to the HashSet javadoc, HashSet.contains only returns a boolean. How can I "find" an object in a hashSet and modify it (it's not a primitive data type)? I see that HashTable has a get() method, but I would prefer to use the set. You can remove an element and add a different one. Modifying an object while it is in a hash set is a recipe for disaster (if the modification changes the hash value or equality behavior). To quote the source of the stock Sun java.util.HashSet: public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final

HashSet load factor

阅读更多关于 HashSet load factor

If I use a HashSet with a initial capacity of 10 and a load factor of 0.5 then every 5 elements added the HashSet will be increased or first the HashSet is increased of 10 elements and after at 15 at 20 atc. the capacity will be increased? Sheo The load factor is a measure of how full the HashSet is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the

Efficient way to clone a HashSet<T>?

阅读更多关于 Efficient way to clone a HashSet?

A few days ago, I answered an interesting question on SO about HashSet<T> . A possible solution involved cloning the hashset, and in my answer I suggested to do something like this: HashSet<int> original = ... HashSet<int> clone = new HashSet<int>(original); Although this approach is quite straightforward, I suspect it's very inefficient: the constructor of the new HashSet<T> needs to separately add each item from the original hashset, and check if it isn't already present . This is clearly a waste of time: since the source collection is a ISet<T> , it is guaranteed not to contain duplicates.

What is the difference between set and hashset in C++ STL?

阅读更多关于 What is the difference between set and hashset in C++ STL?

When should I choose one over the other? Are there any pointers that you would recommend for using the right STL containers? hash_set is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set , so it will be faster in most circumstances. Another difference will be seen when you iterate through the containers. set will deliver the contents in sorted order, while hash_set will be essentially random (Thanks Lou Franco). Edit: The C++11 update to the C++ standard introduced unordered_set which should be preferred instead of hash_set . The performance

HashSet vs ArrayList contains performance

阅读更多关于 HashSet vs ArrayList contains performance

When processing large amounts of data I often find myself doing the following: HashSet<String> set = new HashSet<String> (); //Adding elements to the set ArrayList<String> list = new ArrayList<String> (set); Something like "dumping" the contents of the set in the list. I usually do this since the elements I add often contain duplicates I want to remove, and this seems like an easy way to remove them. With only that objective in mind (avoiding duplicates) I could also write: ArrayList<String> list = new ArrayList<String> (); // Processing here if (! list.contains(element)) list.add(element); /