问题
i was assigned to make some changes to a C program written by someone else...i want to understand it first to work on it properly...i came upon a function that generates the histogram of ASCII values from a given long string of data. it is something like this.
//load the symbols the old data
for(int k = 0;k < 256;++k)
{
sym[k].Symbol = k;
sym[k].Count = 0;
}
//Creating the probability distribution for each of the source symbols.
for(int k = size;k;--k)
{
sym[*in ++].Count ++;
}
here 'in' is the char array (string) containing the characters to be counted. sym is a struct variable. i can't quite understand how this works. can anyone tell me how how the second loop generated the count of the symbols 1 to 255 (ASCII) in the string?
回答1:
for(int k = 0; k < size; k++)
{
sym[in[k]].Count++;
}
This is basically what that second loop is doing.
They just dereference and then move to the next ascii value in one step, and increment the counter for that ascii value.
回答2:
In a word, poorly. The basic idea is pretty simple, but the code is needlessly complex. In particular, his Symbol
member is completely useless.
What you'd normally want to do is something like this:
int counts[UCHAR_MAX] = {0};
size_t len = strlen(input_string);
for (int i=0; i<len; i++)
++counts[unsigned char(input_string[i])];
So, the basic idea here is pretty simple: walk through the string, and for each item in the string, increment the count for that character.
He's doing pretty much the same thing, but keeping the Count
as a member of a structure, along with the Symbol
. Since the Symbol
is always equal to the subscript of that item, storing it is pointless and wasteful.
Other than that, he's counting down in his loop -- probably a micro-optimization, because (at least on some machines) the zero flag will be set based on the value of the counter when it's decremented, so counting down to zero avoids a comparison in the loop. Given the amount he's wasting with his structure and unnecessarily storing the Symbol
values, this makes no sense at all.
If you honestly cared about the code being close to optimum, you could write something more like this:
int counts[UCHAR_MAX] = {0}:
while (*in)
++counts[(unsigned char)*in++];
For anybody wondering about the cast, it's unnecessary if you're sure your input will always be true ASCII, that never has the high-bit set. Since you can rarely guarantee much about the input, however, it's generally safer to cast to unsigned char. Otherwise, a character with its top-bit set will typically be interpreted as a negative number, and index outside the array bound. Of course, it's possible for char to be unsigned by default, but it's pretty rare. On a typical (two's complement) machine, the cast doesn't require any extra operations; it just governs how the existing bit pattern will be interpreted.
回答3:
If 'in' is the input string then *in++ is taking each character in the string and looking up the entry in the ascii list sym[] corrsponding to that character value.
So if the string starts with 'A' then (*in) is 65 and it is referencing sym[65]
edit: sym[k].symbol is a bit redundant you can jsut have an array of 256 integers to represent the ascii chart, since sym[n] must be for symbol numbered 'n'
回答4:
in++
increments in
, a pointer to the character being read.
*in++
, parsed as *(in++)
, is the char currently read. It's also a number, and the algorithm takes advantage of this to use it as an index in an array. The appropriate count (the count of the character just read) sym[*in ++].Count
is incremented.
回答5:
The second loop uses the value of the character pointed to by the pointer in to index the count array.
A really good way to investigate this code is to put a few printf statements around it. Print the value of *in, print the count after it is incremented. You will soon get the picture this way.
Another option is to run code that you don't understand through the debugger.
回答6:
something++
means "add 1 to something
, and return its value before the addition".
in
is a pointer to the first character of the input.
So, *in++
means "move the input pointer one item onwards, and return the item it was pointing to".
So you can see that
sym[*in ++].Count ++;
means "move the input pointer one item onwards, and increment the Count
field of element in the array sym
corresponding to the character that was at the current input pointer position the item it was pointing to";
and the enclosing loop does this size
times, thus processing the input.
来源:https://stackoverflow.com/questions/1869062/histogram-generating-function