Hashset vs Treeset
I've always loved trees, that nice
O(n*log(n)) and the tidiness of them. However, every software engineer I've ever known has asked me pointedly why I would use a
TreeSet. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of
In which cases should I use a
HashSet over a
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
- the class offers constant time performance for the basic operations (add, remove, contains and size).
- it does not guarantee that the order of elements will remain constant over time
- iteration performance depends on the initial capacity and the load factor of the HashSet.
- It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.
- guarantees log(n) time cost for the basic operations (add, remove and contains)
- guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements
- doesn't offer any tuning parameters for iteration performance
- offers a few handy methods to deal with the ordered set like
- Both guarantee duplicate-free collection of elements
- It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
- None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
- LinkedHashSet is in some sense intermediate between
TreeSet. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.
So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
SortedSet<String> s = new TreeSet<String>(hashSet);
Read more... Read less...
One advantage not yet mentioned of a
TreeSet is that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, a
TreeSet places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.
This is in contrast to a
HashSet, which spreads the entries all over memory, no matter what their keys are.
When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the
TreeSet can be a much better choice.
HashSet is O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the set isn't possible.
TreeSet is useful if maintaining an order(In terms of values and not the insertion order) matters to you. But, as you've noted, you're trading order for slower time to access an element: O(log n) for basic operations.
From the javadocs for
This implementation provides guaranteed log(n) time cost for the basic operations (
1.HashSet allows null object.
2.TreeSet will not allow null object. If you try to add null value it will throw a NullPointerException.
3.HashSet is much faster than TreeSet.
TreeSet<String> ts = new TreeSet<String>(); ts.add(null); // throws NullPointerException HashSet<String> hs = new HashSet<String>(); hs.add(null); // runs fine
Basing on lovely visual answer on Maps by @shevchyk here is my take:
╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗ ║ Property ║ HashSet ║ TreeSet ║ LinkedHashSet ║ ╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣ ║ ║ no guarantee order ║ sorted according ║ ║ ║ Order ║ will remain constant║ to the natural ║ insertion-order ║ ║ ║ over time ║ ordering ║ ║ ╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣ ║ Add/remove ║ O(1) ║ O(log(n)) ║ O(1) ║ ╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣ ║ ║ ║ NavigableSet ║ ║ ║ Interfaces ║ Set ║ Set ║ Set ║ ║ ║ ║ SortedSet ║ ║ ╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣ ║ ║ ║ not allowed ║ ║ ║ Null values ║ allowed ║ 1st element only ║ allowed ║ ║ ║ ║ in Java 7 ║ ║ ╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣ ║ ║ Fail-fast behavior of an iterator cannot be guaranteed ║ ║ Fail-fast ║ impossible to make any hard guarantees in the presence of ║ ║ behavior ║ unsynchronized concurrent modification ║ ╠══════════════╬═══════════════════════════════════════════════════════════════╣ ║ Is ║ ║ ║ synchronized ║ implementation is not synchronized ║ ╚══════════════╩═══════════════════════════════════════════════════════════════╝
The reason why most use
HashSet is that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be "messing around with hash functions" as that has been done for you. If the set contains custom classes, you have to implement
hashCode to use
HashSet (although Effective Java shows how), but if you use a
TreeSet you have to make it
Comparable or supply a
Comparator. This can be a problem if the class does not have a particular order.
I have sometimes used
TreeSet (or actually
TreeMap) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.
Now if you need the sorted, then
TreeSet is appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.