I am using std::string
's find()
method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp()
but there doesn't seem to be a stristr()
.
I have found various answers and most suggest using Boost
which is not an option in my case. Additionally, I need to support std::wstring
/wchar_t
. Any ideas?
You could use std::search
with a custom predicate.
#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;
// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
my_equal( const std::locale& loc ) : loc_(loc) {}
bool operator()(charT ch1, charT ch2) {
return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
}
private:
const std::locale& loc_;
};
// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
typename T::const_iterator it = std::search( str1.begin(), str1.end(),
str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
if ( it != str1.end() ) return it - str1.begin();
else return -1; // not found
}
int main(int arc, char *argv[])
{
// string test
std::string str1 = "FIRST HELLO";
std::string str2 = "hello";
int f1 = ci_find_substr( str1, str2 );
// wstring test
std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
std::wstring wstr2 = L"привет";
int f2 = ci_find_substr( wstr1, wstr2 );
return 0;
}
The new C++11 style:
#include <algorithm>
#include <string>
#include <cctype>
/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
auto it = std::search(
strHaystack.begin(), strHaystack.end(),
strNeedle.begin(), strNeedle.end(),
[](char ch1, char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
);
return (it != strHaystack.end() );
}
Explanation of the std::search can be found on cplusplus.com.
Why not just convert both strings to lowercase before you call find()
?
Notice:
- Inefficient for long strings.
- Beware of internationalization issues.
why not use Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
bool Foo()
{
//case insensitive find
std::string str("Hello");
boost::iterator_range<std::string::const_iterator> rng;
rng = boost::ifind_first(str, std::string("EL"));
return rng;
}
Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.
Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).
std::string upper_string(const std::string& str)
{
string upper;
transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
return upper;
}
std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
return upper(str).find(upper(substr) );
}
This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.
Additionally, I need to support std::wstring/wchar_t. Any ideas?
tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).
[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.
Also make sense to provide Boost version: This will modify original strings.
#include <boost/algorithm/string.hpp>
string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)
if (str1.find(str2) != std::string::npos)
{
// str1 contains str2
}
or using perfect boost xpression library
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
cout << word << " found!" << endl;
}
In this example you should pay attention that your search word don't have any regex special characters.
If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator
class.
#include <iostream>
using namespace std;
template <typename charT>
struct ichar {
operator charT() const { return toupper(x); }
charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }
int main()
{
string s = "The STRING";
wstring ws = L"The WSTRING";
cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr")) << endl;
}
A little bit dirty, but short & fast.
I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger
shouldn't match jäger
but <jäger>
should.
It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.
/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
/* Fail fast if the base string was smaller than what we're looking for */
if (subString.length() > baseString.length())
return false;
auto it = std::search(
baseString.begin(), baseString.end(), subString.begin(), subString.end(),
[](char ch1, char ch2)
{
return std::toupper(ch1) == std::toupper(ch2);
}
);
if(it == baseString.end())
return false;
size_t match_start_offset = it - baseString.begin();
std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);
/* Typical special characters and whitespace to split the substring up. */
size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$#@!~`");
/* Pass fast if the remainder of the base string where
the match started is the same length as the substring. */
if (match_end_pos == std::wstring::npos && match_start.length() == subString.length())
return true;
std::wstring extracted_match = match_start.substr(0, match_end_pos);
return (extracted_match.length() == subString.length());
}
来源:https://stackoverflow.com/questions/3152241/case-insensitive-stdstring-find