Hi:) what i\'m trying to do is write a simple program to expand from shortest entry
for example
a-z or 0-9 or a-b-c or a-z0-9
Based on the fact that the existing function addresses "a-z" and "0-9" sequences just fine, separately, we should explore what happens when they meet. Trace your code (try printing each variable's value at each step -- yes it will be cluttered, so use line breaks), and I believe you will find a logical short-circuit when iterating, for example, from "current token is 'y' and next token is 'z'" to "current token is 'z' and next token is '0'". Explore the if() condition and you will find that it does not cover all possibilities, i.e. you have covered yourself if you are within a<-->z, within 0<-->9, or exactly equal to '-', but you have not considered being at the end of one (a-z or 0-9) with your next character at the start of the next.
Here is a C version (in about 38 effective lines) that satisfies the same test as my earlier C++ version.
The full test program including your test cases, mine and some torture test can be seen live on http://ideone.com/sXM7b#info_3915048
I'm pretty sure I'm overstating the requirements, but
a-c-b
can't happen(char*) 0
)printf("%c", c)
each char without using extraneous functions.I put in some comments as to explain what happens why, but overall you'll find that the code is much more legible anyways, by
*it=='-'
or predicate(*it)
will just return false if it is the null character. Shortcut evaluation prevents us from accessing past-the-end input charactersONE caveat: I haven't implemented a proper check for output buffer overrun (the capacity is hardcoded at 2048 chars). I'll leave it as the proverbial exercise for the reader
Last but not least, the reason I did this:
Without further ado, the implementation, including the testcase:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int alpha_range(char c) { return (c>='a') && (c<='z'); }
int digit_range(char c) { return (c>='0') && (c<='9'); }
char* expand(const char* s)
{
char buf[2048];
const char* in = s;
char* out = buf;
// parser state
int (*predicate)(char) = 0; // either: NULL (free state), alpha_range (in alphabetic range), digit_range (in digit range)
char lower=0,upper=0; // tracks lower and upper bound of character ranges in the range parsing states
// init
*out = 0;
while (*in)
{
if (!predicate)
{
// free parsing state
if (alpha_range(*in) && (in[1] == '-') && alpha_range(in[2]))
{
lower = upper = *in++;
predicate = &alpha_range;
}
else if (digit_range(*in) && (in[1] == '-') && digit_range(in[2]))
{
lower = upper = *in++;
predicate = &digit_range;
}
else *out++ = *in;
} else
{
// in a range
if (*in < lower) lower = *in;
if (*in > upper) upper = *in;
if (in[1] == '-' && predicate(in[2]))
in++; // more coming
else
{
// end of range mode, dump expansion
char c;
for (c=lower; c<=upper; *out++ = c++);
predicate = 0;
}
}
in++;
}
*out = 0; // null-terminate buf
return strdup(buf);
}
void dotest(const char* const input)
{
char* ex = expand(input);
printf("input : '%s'\noutput: '%s'\n\n", input, ex);
if (ex)
free(ex);
}
int main (int argc, char *argv[])
{
dotest("a-z or 0-9 or a-b-c or a-z0-9"); // from the original post
dotest("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6"); // from my C++ answer
dotest("-x-s a-9 9- a-k-9 9-a-c-7-3"); // assorted torture tests
return 0;
}
Test output:
input : 'a-z or 0-9 or a-b-c or a-z0-9'
output: 'abcdefghijklmnopqrstuvwxyz or 0123456789 or abc or abcdefghijklmnopqrstuvwxyz0123456789'
input : 'This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6'
output: 'This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678'
input : '-x-s a-9 9- a-k-9 9-a-c-7-3'
output: '-stuvwx a-9 9- abcdefghijk-9 9-abc-34567'
Ok I tested your program out and it seems to be working for nearly every case. It correctly expands a-z and other expansions with only two letters/numbers. It fails when there are more letters and numbers. The fix is easy, just make a new char to keep the last printed character, if the currently printed character matches the last one skip it. The a-z0-9 scenario didn't work because you forgot a s[i] >= '0' instead of s[i] > '0'. the code is:
#include <stdio.h>
#include <string.h>
void expand(char s[])
{
int i,g,n,c,l;
n=c=0;
int len = strlen(s);
for(i = 1;s[i] >= '0' && s[i]<= '9' || s[i] >= 'a' && s[i] <= 'z' || s[i]=='-';i++)
{
c = s[i-1];
g = s[i];
n = s[i+1];
//printf("\nc = %c g = %c n = %c\n", c,g,n);
if(s[0] == '-')
printf("%c",s[0]);
else if(g == '-')
{
if(c<n)
{
if (c != l){
while(c <= n)
{
printf("%c", c);
c++;
}
l = c - 1;
//printf("\nl is %c\n", l);
}
else
{
c++;
while(c <= n)
{
printf("%c", c);
c++;
}
l = c - 1;
//printf("\nl is %c\n", l);
}
}
else if(c == n)
printf("%c",g);
else if(n != '-')
printf("%c",g);
else if(c != '-')
printf("%c",g);
}
else if(g == n)
{
while(g == n)
{
printf("%c",s[i]);
g++;
}
}
else if( s[len] == '-')
printf("%c",s[len]);
}
printf("\n");
}
int main (int argc, char *argv[])
{
expand(argv[1]);
}
Isn't this problem from K&R? I think I saw it there. Anyway I hope I helped.
Just for fun, I decided to demonstrate to myself that C++ is really just as suited to this kind of thing.
First, let me define the requirements a little more strictly: I assumed it needs to handle these cases:
int main()
{
const std::string in("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6");
std::cout << "input : " << in << std::endl;
std::cout << "output: " << expand(in) << std::endl;
}
input :
This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6output:
This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678
Here is an implementation (actually a few variants) in 14 lines (23 including whitespace, comments) of C++0x code1
static std::string expand(const std::string& in)
{
static const regex re(R"([a-z](?:-[a-z])+|[0-9](?:-[0-9])+)");
std::string out;
auto tail = in.begin();
for (auto match : make_iterator_range(sregex_iterator(in.begin(), in.end(), re), sregex_iterator()))
{
out.append(tail, match[0].first);
// char range bounds: the cost of accepting unordered ranges...
char a=127, b=0;
for (auto x=match[0].first; x<match[0].second; x+=2)
{ a = std::min(*x,a); b = std::max(*x,b); }
for (char c=a; c<=b; out.push_back(c++));
tail = match.suffix().first;
}
out.append(tail, in.end());
return out;
}
Of course I'm cheating a little because I'm using regex iterators from Boost. I will do some timings comparing to the C version for performance. I rather expect the C++ version to compete within a 50% margin. But, let's see what kind of surprises the GNU compiler ahs in store for us :)
Here is a complete program that demonstrates the sample input. _It also contains some benchmark timings and a few variations that trade-off
#include <set> // only needed for the 'slow variant'
#include <boost/regex.hpp>
#include <boost/range.hpp>
using namespace boost;
using namespace boost::range;
static std::string expand(const std::string& in)
{
// static const regex re(R"([a-z]-[a-z]|[0-9]-[0-9])"); // "a-c-d" --> "abc-d", "a-c-e-g" --> "abc-efg"
static const regex re(R"([a-z](?:-[a-z])+|[0-9](?:-[0-9])+)");
std::string out;
out.reserve(in.size() + 12); // heuristic
auto tail = in.begin();
for (auto match : make_iterator_range(sregex_iterator(in.begin(), in.end(), re), sregex_iterator()))
{
out.append(tail, match[0].first);
// char range bounds: the cost of accepting unordered ranges...
#if !SIMPLE_BUT_SLOWER
// debug 15.149s / release 8.258s (at 1024k iterations)
char a=127, b=0;
for (auto x=match[0].first; x<match[0].second; x+=2)
{ a = std::min(*x,a); b = std::max(*x,b); }
for (char c=a; c<=b; out.push_back(c++));
#else // simpler but slower
// debug 24.962s / release 10.270s (at 1024k iterations)
std::set<char> bounds(match[0].first, match[0].second);
bounds.erase('-');
for (char c=*bounds.begin(); c<=*bounds.rbegin(); out.push_back(c++));
#endif
tail = match.suffix().first;
}
out.append(tail, in.end());
return out;
}
int main()
{
const std::string in("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6");
std::cout << "input : " << in << std::endl;
std::cout << "output: " << expand(in) << std::endl;
}
1 Compiled with g++-4.6 -std=c++0x