问题
If I have:
const char *mystr = "cheesecakes";
FILE *myfile = fopen("path/to/file.exe","r");
I need to write a function to determine whether myfile
contains any occurrences of mystr
. Could anyone help me? Thanks!
UPDATE: So it turns out the platform I need to deploy to doesn't have memstr
. Does anyone know of a free implementation I can use in my code?
回答1:
If you can't fit the whole file into memory, and you have access to the GNU memmem()
extension, then:
- Read as much as you can into a buffer;
- Search the buffer with
memmem(buffer, len, mystr, strlen(mystr) + 1)
; - Discard all but the last
strlen(mystr)
characters of the buffer, and move those to the start; - Repeat until end of file reached.
If you don't have memmem
, then you can implement it in plain C using memchr
and memcmp
, like so:
/*
* The memmem() function finds the start of the first occurrence of the
* substring 'needle' of length 'nlen' in the memory area 'haystack' of
* length 'hlen'.
*
* The return value is a pointer to the beginning of the sub-string, or
* NULL if the substring is not found.
*/
void *memmem(const void *haystack, size_t hlen, const void *needle, size_t nlen)
{
int needle_first;
const void *p = haystack;
size_t plen = hlen;
if (!nlen)
return NULL;
needle_first = *(unsigned char *)needle;
while (plen >= nlen && (p = memchr(p, needle_first, plen - nlen + 1)))
{
if (!memcmp(p, needle, nlen))
return (void *)p;
p++;
plen = hlen - (p - haystack);
}
return NULL;
}
回答2:
Because there is no memmem or memstr to find a string in a binary array (others suggested to read it into memory and use strstr - no this doesn't work) you have to read it byte by byte with "fgetch" and write a small state machine to match it while reading.
回答3:
See here:
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
for a Boyer-Moore implementation in C99. This is a very common string searching algorithm and runs in O(n).
回答4:
Here is a slapped together version of it. It has no error checking and probably has overflow bugs. But I think it finds the desired string and accounts for the backtracking necessary for partial substring matches. I doubt there are more than 15 bugs left.
Edit: There was at least one in the first answer. I woke up in the middle of the night and realized the backtracking check was wrong. It didn't find '12123' in '1212123'. It might still be wrong, but at least it finds that one now.
int main( int argc, char* argv[] )
{
FILE *fp;
char *find, *hist;
int len, pos=0, hl=0, i;
char c;
fp = fopen( argv[1], "r" );
find = argv[2];
len = (int)strlen( find );
hist = malloc( len );
memset( hist, 0, len );
while ( !feof( fp )) {
c = fgetc( fp );
if ( find[pos++] == c ) {
if ( pos == len ) {
printf( "Found it\n" );
return 1;
}
}
else {
// check history buffer (kludge for backtracking)
if ( pos > 0 ) {
pos = 0;
for ( i = 0; i < len - 1; i++ )
if ( 0 == memcmp( hist+len-i-1, find, i + 1 )) {
// we had a mismatch, but the history matches up to len i
pos = i;
}
}
}
// update history buffer - this is innefficient - better as circular buffer
memmove( hist, hist + 1, len - 1 );
hist[len-1] = c;
}
printf( "Not found\n" );
}
回答5:
Here's a function that will search for a string in a buffer.
Limitations: it doesn't handle wide characters (in the case of internationalization). You'll have to write your own code to read the file into memory. It won't find the pattern if the pattern is split between 2 read buffers.
/*****************************************************
const char *buffer pointer to your read buffer (the larger, the better).
size_t bufsize the size of your buffer
const char *pattern pattern you are looking for.
Returns an index into the buffer if pattern is found.
-1 if pattern is not found.
Sample:
pos = findPattern (buffer, BUF_SIZE, "cheesecakes");
*****************************************************/
int findPattern (const char *buffer, size_t bufSize, const char *pattern)
{
int i,j;
int patternLen;
// minor optimization. Determine patternLen so we don't
// bother searching buffer if fewer than patternLen bytes remain.
patternLen = strlen (pattern);
for (i=0; i<bufSize-patternLen; ++i)
{
for (j=0; j<patternLen; ++j)
{
if (buffer[i+j] != pattern[j])
{
break;
}
}
if (j == patternLen)
{
return i;
}
}
return -1;
}
回答6:
chat match = "findthis";
int depth = 0;
while(not eof)
{
char ch = getonebyte();
if(ch == match[depth])
{
if (depth == strlen(match))
break;
else
depth++;
}
else
depth = 0;
}
roughly (I am sure there are off by ones in there)
来源:https://stackoverflow.com/questions/2188914/c-searching-for-a-string-in-a-file