I came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators: What are the differences between a HashMap and a Hashtable in Java?
So what is the difference in C++ implementation of set
and unordered_set
?
This question can be of course extended to map
vs unordered_map
and so on for other C++ containers.
Here is my initial assessment:
set
: While the standard doesn't explicitly ask it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a tree.
Usually as RB tree (as seen in GCC 4.8), which is height-balanced.
Since they are height balanced, they have predictable time-complexity for find()
Pros: Compact (compared to other DS in comparison)
Con: Access time complexity is O(lg n)
unordered_set
: While the standard doesn't explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a hash-table.
Pros:
- Faster (promises amortized O(1) for search)
- Easy to convert basic primitives to thread-safe, as compared to tree-DS
Cons:
- Look up not guaranteed to be O(1). Theoretical worst case is O(n).
- Not as compact as tree (for practical purposes load factors is never 1).
Note: The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.
Did I miss any difference between map/set for performance analysis that one should know?
See Question&Answers more detail:os