C++ searching a line from a file for certain words and then inserting a word after those words

感情迁移 提交于 2019-12-25 01:55:42

问题


Im very new to C++ and ive been struggling for quite a while trying to figure out how to do this problem. Basically, i need to read from a file and find all instances of an article ("a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE")and then insert an adjective after that article. The adjective's capitalization must be based on the word originally in front of the article. For instance, if i found "a SHARK" i would need to make it "a HAPPY SHARK." Can anyone tell me what the best way to do this would be? So far I've scrapped a lot of ideas and this is what i have now, though i don't think i can do it this way:

#include <iostream>
#include <string>
#include <cctype>
#include <fstream>
#include <sstream>

using namespace std;

void
usage(char *progname, string msg){
    cerr << "Error: " << msg << endl;
    cerr << "Usage is: " << progname << " [filename]" << endl;
    cerr << " specifying filename reads from that file; no filename reads standard input" << endl;
}

int main(int argc, char *argv[])
{
    string adj;
    string file;
    string line;
    string articles[14] = {"a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE"};
    ifstream rfile;
    cin >> adj;
    cin >> file;
    rfile.open(file.c_str());
    if(rfile.fail()){
        cerr << "Error while attempting to open the file." << endl;
        return 0;
    }
    while(rfile.good()){
        getline(rfile,line,'\n');
        istringstream iss(line);
        string word;
        while(iss >> word){
            for(int i = 0; i <= 14; i++){
                if(word == articles[i]){
                    cout << word + " " << endl;
                }else{
                    continue;
                }
            }
        }
        }
  }

回答1:


So far, pretty good, although if you need to handle an article at the end of a line, then you might be in trouble doing this line by line.

Anyway, ignoring that wrinkle for a second, after you've matched an article, then first you need to get the next word on which you need to base your capitalization. Then you need to create a new string version of your adjective that has the correct capitalization:

string adj_buf;  // big enough or dynamically allocate it based on adj

while(iss >> word){
    for(int i = 0; i <= 14; i++){
        if(word == articles[i]){
            cout << word + " ";
            iss >> word;  // TODO: check return value and handle no more words on this line
            adj_buf = adj;
            for (j = 0; j < word.size() && j < adj.size(); ++j)
                if (isupper(word[j]))
                    adj_buf[j] = toupper(adj[j]);
                else
                    adj_buf[j] = tolower(adj[j]);

            cout << adj_buf + " " + word;
            break;
        }
    }
}

Circling back to the wrinkle we ignored. You probably don't want to do this line by line and then token by token because handling this special case will be ugly in your control. Instead, you probably want to do it token by token in a single loop.

So, you need to write a helper function or class that operates on the file and can give you the next token. (There probably is exactly such a class already in the STL, I'm not sure.) Anyway, using your I/O it might look something like:

struct FileTokenizer
{
    FileTokenizer(string fileName) : rfile(fileName) {}

    bool getNextToken(string &token)
    {
        while (!(iss >> token))
        {
            string line;

            if (!rfile.getline(rfile, line, '\n'))
                return false;

            iss.reset(line);  // TODO: I don't know the actual call to reset it; look it up
        }

        return true;
    }

private:
    ifstream      rfile;
    istringstream iss;
};

And your main loop would then look like:

FileTokenizer tokenizer(file);

while (tokenizer.getNextToken(word))
{
    for(int i = 0; i <= 14; i++){
        if(word == articles[i]){
            cout << word + " ";

            if (!tokenizer.getNextToken(word))
                break; 

            adj_buf = adj;
            for (j = 0; j < word.size() && j < adj.size(); ++j)
                if (isupper(word[j]))
                    adj_buf[j] = toupper(adj[j]);
                else
                    adj_buf[j] = tolower(adj[j]);

            cout << adj_buf + " " + word;
            break;
        }
    }
}

You probably want to output the rest of the input too?




回答2:


First I propose you to use 3 auxiliary function to transform string cases. These will be usefull if you work a lot with text. Here they are based on <algorithm> but many other aproaches are possible:

string strtoupper(const string& s) {   // return the uppercase of the string
    string str = s; 
    std::transform(str.begin(), str.end(), str.begin(), ::toupper);
    return str; 
}
string strtolower(const string& s) {    // return the lowercase of the string
    string str = s;
    std::transform(str.begin(), str.end(), str.begin(), ::tolower);
    return str;
}
string strcapitalize (const string& s) {  // return the capitalisation (1 upper, rest lower) of the string
    string str = s;
    std::transform(str.begin(), str.end(), str.begin(), ::tolower);
    if (str.size() > 0)
        str[0] = toupper(str[0]); 
    return str;
}

Then a utility function to clone the capitalisation of a word : it sets the adjective to lowercase or uppercase or capitalizes it(1 upper+rest lower) copying the case of the refernce word. It's robust enough to handle empty words, and words wich are not alaphanumeric:

string clone_capitalisation(const string& a, const string& w) {
    if (w.size() == 0 || !isalpha(w[0]))  // empty or not a letter
        return a;                         //   => use adj as it is
    else {
        if (islower(w[0]))   // lowercase
            return strtolower(a);
        else return w.size() == 1 || isupper(w[1]) ? strtoupper(a) : strcapitalize(a);
    }
}

All these functions do not change the original strings !

Now to the main(): I don't like having to manually put all the possible combination of upper and lowercase of the articles, so I work only uppercase.

I don't like either to sequentially go through all possible articles for every word. If there would be many more articles, it would not be very performant ! So I prefer to use a <set> :

...
set<string> articles  { "A", "AN", "THE" };   // shorter isn't it ? 
...
while (getline(rfile, line)) {
    istringstream iss(line);
    string word;
    while (iss >> word) {     // loop 
        cout << word << " ";  // output the word in any case
        if (articles.find(strtoupper(word))!=articles.end()) {  // article found ?
            if (iss >> word) {  // then read the next word
                cout << clone_capitalisation(adj, word) << " " << word << " ";
            }
            else cout << word;  // if case there is no next word on the line...
        }
    }
    cout << endl; 
}


来源:https://stackoverflow.com/questions/28507718/c-searching-a-line-from-a-file-for-certain-words-and-then-inserting-a-word-aft

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!