How to strip all non alphanumeric characters from a string in c++?

有些话、适合烂在心里 提交于 2019-12-17 07:18:53

问题


I am writing a piece of software, and It require me to handle data I get from a webpage with libcurl. When I get the data, for some reason it has extra line breaks in it. I need to figure out a way to only allow letters, numbers, and spaces. And remove everything else, including line breaks. Is there any easy way to do this? Thanks.


回答1:


Write a function that takes a char and returns true if you want to remove that character or false if you want to keep it:

bool my_predicate(char c);

Then use the std::remove_if algorithm to remove the unwanted characters from the string:

std::string s = "my data";
s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());

Depending on your requirements, you may be able to use one of the Standard Library predicates, like std::isalnum, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).

If you want to use the Standard Library std::isalnum function, you will need a cast to disambiguate between the std::isalnum function in the C Standard Library header <cctype> (which is the one you want to use) and the std::isalnum in the C++ Standard Library header <locale> (which is not the one you want to use, unless you want to perform locale-specific string processing):

s.erase(std::remove_if(s.begin(), s.end(), (int(*)(int))std::isalnum), s.end());

This works equally well with any of the sequence containers (including std::string, std::vector and std::deque). This idiom is commonly referred to as the "erase/remove" idiom. The std::remove_if algorithm will also work with ordinary arrays. The std::remove_if makes only a single pass over the sequence, so it has linear time complexity.




回答2:


Previous uses of std::isalnum won't compile with std::ptr_fun without passing the unary argument is requires, hence this solution with a lambda function should encapsulate the correct answer:

s.erase(std::remove_if(s.begin(), s.end(), 
[]( auto const& c ) -> bool { return !std::isalnum(c); } ), s.end());



回答3:


You could always loop through and just erase all non alphanumeric characters if you're using string.

#include <cctype>

size_t i = 0;
size_t len = str.length();
while(i < len){
    if (!isalnum(str[i]) || str[i] == ' '){
        str.erase(i,1);
        len--;
    }else
        i++;
}

Someone better with the Standard Lib can probably do this without a loop.

If you're using just a char buffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):

#include <cctype>

size_t buflen = something;
for (size_t i = 0; i < buflen; ++i)
    if (!isalnum(buf[i]) || buf[i] != ' ')
        memcpy(buf[i], buf[i + 1], --buflen - i);



回答4:


The remove_copy_if standard algorithm would be very appropriate for your case.




回答5:


#include <cctype>
#include <string>
#include <functional>

std::string s = "Hello World!";
s.erase(std::remove_if(s.begin(), s.end(),
    std::not1(std::ptr_fun(std::isalnum)), s.end()), s.end());
std::cout << s << std::endl;

Results in:

"HelloWorld"

You use isalnum to determine whether or not each character is alpha numeric, then use ptr_fun to pass the function to not1 which NOTs the returned value, leaving you with only the alphanumeric stuff you want.




回答6:


You can use the remove-erase algorithm this way -

// Removes all punctuation       
s.erase( std::remove_if(s.begin(), s.end(), &ispunct), s.end());



回答7:


Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.

To delete non-alnum characters from a string. (alnum = alphabetical or numeric)

  • Declare a function (isalnum returns 0 if passed char is not alnum)

    bool isNotAlnum(char c) {
        return isalnum(c) == 0;
    }
    
  • And then write this

    s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
    

then your string is only with alnum characters.




回答8:


Below code should work just fine for given string s. It's utilizing <algorithm> and <locale> libraries.

std::string s("He!!llo  Wo,@rld! 12 453");
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !std::isalnum(c); }), s.end());



回答9:


The following works for me.

str.erase(std::remove_if(str.begin(), str.end(), &ispunct), str.end());
str.erase(std::remove_if(str.begin(), str.end(), &isspace), str.end());



回答10:


void remove_spaces(string data)
{ int i=0,j=0;
    while(i<data.length())
    {
        if (isalpha(data[i]))
        {
        data[i]=data[i];
        i++;
        }
        else
            {
            data.erase(i,1);}
    }
    cout<<data;
}



回答11:


The mentioned solution

s.erase( std::remove_if(s.begin(), s.end(), &std::ispunct), s.end());

is very nice, but unfortunately doesn't work with characters like 'Ñ' in Visual Studio (debug mode), because of this line:

_ASSERTE((unsigned)(c + 1) <= 256)

in isctype.c

So, I would recommend something like this:

inline int my_ispunct( int ch )
{
    return std::ispunct(unsigned char(ch));
}
...
s.erase( std::remove_if(s.begin(), s.end(), &my_ispunct), s.end());


来源:https://stackoverflow.com/questions/6319872/how-to-strip-all-non-alphanumeric-characters-from-a-string-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!