I am counting the number of times every word occurs in a text file. I would like to avoid cases and hence am doing tolower to my input and then counting. I have a map data struc
You can use a structure or std::pair to keep both the original case and a number of occurrences. Your type would then look like this: map < string, pair <string, int> >
You can use map<string, vector<string> >.
The key is the lowercase word. The value is the vector of all the given cases of this word.
(you can also use multimap<string, string> which is basically the same, but I usually prefer a map of vectors)
map<string, vector<string> > m;
m.size(); // number of lowercase words
m["abc"].size(); // number of the given cases of the word "abc"
What do you want to happen with different case variants of the same word?
One possibility is to use std::multiset with a caseless comparator as its Compare template parameter. In this case, all variants of each word will be preserved in the set. Number of occurrences of each word can be obtained via count() member function of the set.
This should work. For multiple cases the first case will be inside the map and not lower case. Also the solution uses only one map as you wanted
using namespace std;
struct StrCaseInsensitive
{
bool operator() (const string& left , const string& right )
{
return _stricmp( left.c_str() , right.c_str() ) < 0;
}
};
int main(void)
{
char* input[] = { "Foo" , "bar" , "Bar" , "FOO" };
std::map<string, int , StrCaseInsensitive> CountMap;
for( int i = 0 ; i < 4; ++i )
{
CountMap[ input[i] ] += 1;
}
return 0;
}
The third template parameter of std::map is a comparator type. You can provide your own comparison operation, in your case a case-insensitive one.
struct CaseInsensitive {
bool operator()(std::string const& left, std::string const& right) const {
size_t const size = std::min(left.size(), right.size());
for (size_t i = 0; i != size; ++i) {
char const lowerLeft = std::tolower(left[i]);
char const lowerRight = std::tolower(right[i]);
if (lowerLeft < lowerRight) { return true; }
if (lowerLeft > lowerRight) { return false; }
// if equal? continue!
}
// same prefix? then we compare the length
return left.size() < right.size();
}
};
Then instanciate your map:
typedef std::map<std::string, unsigned, CaseInsensitive> MyWordCountingMap;
Note: only the first spelling is preserved (which seems okay with you)