The simplest way is to std::sort
the vector and then use std::adjacent_find
.
However, if you don't want to sort the vector, you can do something like this in C++11:
#include <unordered_map>
#include <functional> // For std::hash<std::string>.
#include <string>
#include <iostream>
int main() {
// Test data.
std::vector<std::string> v;
v.push_back("a");
v.push_back("b");
v.push_back("c");
v.push_back("a");
v.push_back("c");
v.push_back("d");
v.push_back("a");
// Hash function for the hashtable.
auto h = [](const std::string* s) {
return std::hash<std::string>()(*s);
};
// Equality comparer for the hashtable.
auto eq = [](const std::string* s1, const std::string* s2) {
return s1->compare(*s2) == 0;
};
// The hashtable:
// Key: Pointer to element of 'v'.
// Value: Occurrence count.
std::unordered_map<const std::string*, size_t, decltype(h), decltype(eq)> m(v.size(), h, eq);
// Count occurances.
for (auto v_i = v.cbegin(); v_i != v.cend(); ++v_i)
++m[&(*v_i)];
// Print strings that occur more than once:
for (auto m_i = m.begin(); m_i != m.end(); ++m_i)
if (m_i->second > 1)
std::cout << *m_i->first << ": " << m_i->second << std::endl;
return 0;
}
This prints:
a: 3
c: 2
I didn't actually benchmark it, but this has a chance for being rather performant, for following reasons:
- Assuming the actual vector elements do not produce pathologically lopsided hashes, this is actually an O(n) algorithm, as opposed to O(n*log(n)) for sorting.
- We are using the hashtable of pointers to strings, not strings themselves, so there is no unnecessary copying taking place.
- We can "pre-allocate" hashtable buckets (we pass
v.size()
when constructing m
), so hashtable resizes are minimized.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…