Parsing morse code

戏子无情 提交于 2020-03-04 06:13:48

问题


I am trying to solve this problem. The goal is to determine the number of ways a morse string can be interpreted, given a dictionary of word. What I did is that I first "translated" words from my dictionary into morse. Then, I used a naive algorithm, searching for all the ways it can be interpreted recursively.

#include <iostream>
#include <vector>
#include <map>
#include <string>
#include <iterator>
using namespace std;

string morse_string;
int morse_string_size;
map<char, string> morse_table;
unsigned int sol;

void matches(int i, int factor, vector<string> &dictionary) {
    int suffix_length = morse_string_size-i;
    if (suffix_length <= 0) {
        sol += factor;
        return;
    }
    map<int, int> c;
    for (vector<string>::iterator it = dictionary.begin() ; it != dictionary.end() ; it++) {
        if (((*it).size() <= suffix_length) && (morse_string.substr(i, (*it).size()) == *it)) {
            if (c.find((*it).size()) == c.end())
                c[(*it).size()] = 0;
            else
                c[(*it).size()]++;
        }
    }

    for (map<int, int>::iterator it = c.begin() ; it != c.end() ; it++) {
        matches(i+it->first, factor*(it->second), dictionary);
    }
}

string encode_morse(string s) {
    string ret = "";
    for (unsigned int i = 0 ; i < s.length() ; ++i) {
        ret += morse_table[s[i]];
    }
    return ret;
}

int main() {
    morse_table['A'] = ".-"; morse_table['B'] = "-..."; morse_table['C'] = "-.-."; morse_table['D'] = "-.."; morse_table['E'] = "."; morse_table['F'] = "..-."; morse_table['G'] = "--."; morse_table['H'] = "...."; morse_table['I'] = ".."; morse_table['J'] = ".---"; morse_table['K'] = "-.-"; morse_table['L'] = ".-.."; morse_table['M'] = "--"; morse_table['N'] = "-."; morse_table['O'] = "---"; morse_table['P'] = ".--."; morse_table['Q'] = "--.-"; morse_table['R'] = ".-."; morse_table['S'] = "..."; morse_table['T'] = "-"; morse_table['U'] = "..-"; morse_table['V'] = "...-"; morse_table['W'] = ".--"; morse_table['X'] = "-..-"; morse_table['Y'] = "-.--"; morse_table['Z'] = "--..";
    int T, N;
    string tmp;
    vector<string> dictionary;
    cin >> T;

    while (T--) {
        morse_string = "";
        cin >> morse_string;
        morse_string_size = morse_string.size();
        cin >> N;
        for (int j = 0 ; j < N ; j++) {
            cin >> tmp;
            dictionary.push_back(encode_morse(tmp));
        }

        sol = 0;
        matches(0, 1, dictionary);
        cout << sol;

        if (T)
            cout << endl << endl;
    }

    return 0;
}

Now the thing is that I only have 3 seconds of execution time allowed, and my algorithm won't work under this limit of time.

Is this the good way to do this and if so, what am I missing ? Otherwise, can you give some hints about what is a good strategy ?

EDIT : There can be at most 10 000 words in the dictionary and at most 1000 characters in the morse string.


回答1:


A solution that combines dynamic programming with a rolling hash should work for this problem.

Let's start with a simple dynamic programming solution. We allocate an vector which we will use to store known counts for prefixes of morse_string. We then iterate through morse_string and at each position we iterate through all words and we look back to see if they can fit into morse_string. If they can fit then we use the dynamic programming vector to determine how many ways we could have build the prefix of morse_string up to i-dictionaryWord.size()

vector<long>dp;
dp.push_back(1);
for (int i=0;i<morse_string.size();i++) {
   long count = 0;
   for (int j=1;j<dictionary.size();j++) {
       if (dictionary[j].size() > i) continue;
       if (dictionary[j] == morse_string.substring(i-dictionary[j].size(),i)) {
           count += dp[i-dictionary[j].size()];
       }
   }
   dp.push_back(count);
}
result = dp[morse_code.size()]

The problem with this solution is that it is too slow. Let's say that N is the length of morse_string and M is the size of the dictionary and K is the size of the largest word in the dictionary. It will do O(N*M*K) operations. If we assume K=1000 this is about 10^10 operations which is too slow on most machines.

The K cost came from the line dictionary[j] == morse_string.substring(i-dictionary[j].size(),i)

If we could speed up this string matching to constant or log complexity we would be okay. This is where rolling hashing comes in. If you build a rolling hash array of morse_string then the idea is that you can compute the hash of any substring of morse_string in O(1). So you could then do hash(dictionary[j]) == hash(morse_string.substring(i-dictionary[j].size(),i))

This is good but in the presence of imperfect hashing you could have multiple words from the dictionary with the same hash. That would mean that after getting a hash match you would still need to match the strings as well as the hashes. In programming contests, people often assume perfect hashing and skip the string matching. This is often a safe bet especially on a small dictionary. In case it doesn't produce a perfect hashing (which you can check in code) you can always adjust your hash function slightly and maybe the adjusted hash function will produce a perfect hashing.



来源:https://stackoverflow.com/questions/23664993/parsing-morse-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!