C++ Finding Anagrams in words

I'm working on a program that looks at whether or not a particular word is an anagram using std:count however, I don't think my function logic is correct and I cannot seem to figure it out.

Assume there are the following words in the file:

Evil
Vile
Veil  
Live

My code is as follows:

#include <iostream>
#include <vector>
#include <fstream>
#include <map>
using namespace std;

struct Compare {
std::string str;
Compare(const std::string& str) : str(str) {}
};

bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
return c.str == p.second;
}
   bool operator==(const Compare& c, const std::pair<int, std::string>&p) {
   return c.str == p.second;
}

std::vector<std::string> readInput(ifstream& file)
{
std::vector<std::string> temp;

string word;

while (file >> word)
{
    temp.push_back(word);
}
std::sort(temp.begin(), temp.end());

return temp;
}

int main(int argc, char *argv[]) {  

string file = "testing.txt";
ifstream ss(file.c_str());

if(!ss.is_open())
{
    cerr << "Cannot open the text file";
}

std::vector<std::string> words = readInput(ss);

std::map<int, std::string> wordsMap; 

//std::map<std::string value, int key> values; 

for(unsigned i=0; (i < words.size()); i++)
{
    wordsMap[i] = words[i];
}


int count = std::count(wordsMap.begin(), wordsMap.end(), Compare("Evil"));
cout << count << endl;
}

I'm pretty sure it's just a case of my logic is wrong in the functions. I hope someone can help :)

The most simple approach would be

To check like following (pseudo code)

bool isAnagram(string s, string t) {return sort(s) == sort(t); }

So, use some think like following, no need of std::map

struct Compare {
std::string str;
Compare(const std::string& x) : str(x) { 
    std::sort(str.begin(),str.end()); std::transform(str.begin(), 
    str.end(),str.begin(), ::toupper);}

    bool operator ()(const std::string& t)
    {
        std::string s= t;
        std::transform(s.begin(), s.end(),s.begin(), ::toupper);
        std::sort(s.begin(),s.end());

    return s == str;
    }
};

And then

int count = std::count_if(words.begin(), words.end(), Compare("Evil"));

See HERE

This is not the most efficient algorithm, but a quick change to your program that would work could be:

bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
  std::string a = c.str;
  std::transform(a.begin(), a.end(), a.begin(), ::tolower);
  std::sort(a.begin(), a.end());

  std::string b = p.second;
  std::transform(b.begin(), b.end(), b.begin(), ::tolower);
  std::sort(b.begin(), b.end());

  return a == b;
}

EDIT: It seems in your present code, you are checking whether the strings are exactly equal to each other (not anagrams).

INSTEAD:
For each word, make an array of 26 elements, each element corresponding to a letter of the alphabet. Parse each word character by character, and increase the count of the particular character in the respective array.

For example for evil, the array would be:

0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0. // It has 1's for letters e,v,i and l
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z

You make this array for each word that you have. In your case, all the words will have the same array. You then compare these arrays element-wise and proceed accordingly.

Now you just need to see which words have the same corresponding array.

If you want to compare all the N words pair-wise, you can do so using two nested loops in O(N^2) complexity.
The complexity for comparing one pair is O(1).
Complexity of creating the arrays = O(L) where L is the length of the string.

Consider the following:

map<string, set<string>> anagrams;

for (auto word : words)
    anagrams[sort(word)].insert(word);

const set<string>& find_anagrams(const string& word)
{
    return anagrams[word];
}

Leeor

When you have a lot of words that are relatively short (or if you can work with large number libs), then you can use a solution similar to what I wrote here -

Generate same unique hash code for all anagrams

Essentially - map each character to a unique prime number (doesn't have to be big, you can map the entire ABC into primes up to 101), and for each word multiply the primes received from it characters. Since multiplication is commutative, anagrams would give the same result, so you just compare that result, hash it, or do whatever you want

Keep in mind that for long words the values would grow pretty fast, so you might need a big numbers lib

来源：https://stackoverflow.com/questions/18834128/c-finding-anagrams-in-words

标签

c++

string

algorithm

logic