printf() debugging library using string table “decoder ring”

问题

I'm writing to see if any of you have ever seen or heard of an implementation of the idea I'm about to describe.

I'm interested in developing a printf-style debugging library for an embedded target. The target is extremely remote, and the comms bandwidth budget between me and the target is extremely tight, so I want to be able to get the debugging messages in a very efficient format.

Quite often, debug statements look something like the following:

myDebugLibraryPrintf("Inside loop, processing item %d out of %d.\n", i, numItems);

Of course, when this is expanded into text, the string printed is something like "Inside loop, processing item 5 out of 10.\n", a total of ~42 bytes or so. Over 90% of the data printed out by this statement is static, literal -- known at compile-time. Of course, only the "5" and "10" aren't known at compile-time.

What I'd like to do is be able to send back only those two integers (8 bytes instead of 42). Once I've received that data, I'd have some kind of "decoder ring" that lets me "reconstitute" the received data and print out the full debug message here at my location.

I'd generate the "decoder ring" by automatically (as part of the build process) giving every myDebugLibraryPrintf() statement a unique ID at compile time, and generating a table that maps those unique IDs to the original format strings. Then, any time myDebugLibraryPrintf() is called on the target, it transmits the unique ID and any of the "%d", "%f", etc. varargs values seen in the format string, but the format string itself is NOT transmitted. (I'll probably just disallow "%s" items for now...) Back at my location, we'll have a program that looks up the unique IDs in the table, finds the appropriate format string, and uses it to reconstruct the original debug message.

I feel like someone has probably had this idea before and I figured maybe someone in the community would have seen something like it (or even know of an open-source library that does this).

Constraints:

To clarify, I'm dealing with C/C++ here, and I'm not interested in a 100%-complete replacement implementation of printf() -- things like non-literal format strings, %s (string) format specifiers, or more advanced format specifiers like putting the width or precision in the varargs list with %*.*d don't need to be supported.
I want the string table to be generated automatically as part of the build process so that adding debug involves no more work than adding a traditional printf(). If any more than the minimum amount of effort is required, nobody on my project will use it.
Doing extra work as part of the build process to generate the string table is pretty much assumed. Fortunately, I have control of all the source code that I'm interested in using this library with, and I have a lot of flexibility within the build process.

Thanks!

回答1:

I've only seen this idea implemented with a pre-defined set of strings. The code would look like debug_print(INSIDE_LOOP_MSG_ID, i, n). When developers wanted to add new messages they would have to put the new text in a specific header file and give it a new ID.

I think the idea of generating it on the fly from a normal-looking print statement is a interesting challenge. I haven't come across any existing implementations.

One idea might be a macro/template which turns the first string argument into a hash value at compile time. So the developer writes debug_print("test %d",i), which gets compiled to debug_port_send(0x1d3s, i). Writing a post-processing script to extract the strings and hashes for use on the recieving side should be simple. (simplest way to resolve hash collisions would be to give error message and force user to alter the wording slightly).

edit:
So I tried this with the compile-time hash at the link above.

#define QQuot_(x) #x
#define QQuote(x) QQuot_(x)
#define Debug_Print(s, v) (Send( CONSTHASH(QQuote(__LINE__)##s), *((long*)&(v))))

void Send(long hash, long value)
{
   printf("Sending %x %x\n", hash, value); //replace with COMMS
}


int main()
{
   int i = 1;
   float f= 3.14f;
   Debug_Print("This is a test %d", i);
   i++;
   Debug_Print("This is a test %d", i);
   Debug_Print("This was test %f", f);
}

With a little more cleverness you could support multiple arguments. Examining dissasembly shows that all the hashes are indeed computed at compile time. Output is as expected, no collisions from identical strings. (This page confirms the hex is correct for 3.14):

Sending 94b7555c 1
Sending 62fce13e 2
Sending 506e9a0c 4048f5c3

All you need now is a text-processing script that can be run on the code which extracts the strings from Debug_Print, calculates the hashes and populates a table your reciever side. The reciever gets a hash value from the Send call, looks up the string that goes with it, and passes that, along with the argument(s) to a normal printf call.

The only problem I see is that the nested macros in the compile time hash are confusing my refactoring plug-in and killing my IDE responsiveness. Disabling the add-in removed that issue.

回答2:

I've seen something that accomplishes something similar on the ARM platform. I believe it's called the "Embedded Trace Macrocell". A series of macros translates statements like TRACE_POWER_SYSTEM_VOLTAGE_REGULATOR_TRIGGER(inputX); to two register writes into the ETM registers. Note that this ONLY accepts 16bit, 32bit and 64bit integers as arguments, though.

We can use the ARM tools to extract these (timestamped) buffers. Then we apply a pre-compiled bit of trickery to convert the first (index) register write into an output file that looks like this:

timestamp  | POWER SYSTEM    |    VOLTAGE REGULATOR TRIGGER    | 0x2380FF23

The code has been examined to determine the data type of the argument, so we don't have to bother. It can also be annotated with a "real time" timestamp (instead of ms since powerup), and file and line numbers of the trace statements.

ARM is setup to store this circular buffer internally (and very quickly), so it can be used in production. Even if you don't have the hardware support, though... some aspects of this could be easily reproduced.

Note that it's extremely important when analyzing a trace, that you only use a 'decode' file that matches the particular version of the code running on the device.

回答3:

I seem to recall many tools for extracting string literals for the purpose of internationalization. GNU strings can extract the strings directly from the executable. This should help with part of the task.

回答4:

I had the same problem PLUS I wanted to reduce the image size (due to tiny embedded flash). My solution is sending file name and line (which should be 14-20 Byte) and having a source parser on the server side, which will generate map of the actual texts. This way the actual code will contain no "format" strings, but single "filename" string for each file. Furthermore, file names can be easily replaced with enum (unlike replacing every string in the code) to reduce the COMM throughput.

I hope the sample psaudo-code will help clarifying the idea:

/* target code */
#define PRINT(format,...) send(__FILE__,__LINE__,__VA_ARGS__)
...

/* host code (c++) */
void PrintComm(istream& in)
{
    string fileName;
    int    line,nParams;
    int*   params;
    in>>fileName>>line>>nParams;
    if (nParams>0)
    {
        params = new int[nParams];
        for (int i=0; i<nParams; ++i)
            in>>params[i];
    }
    const char* format = FindFormat(fileName,line);
    ...
    delete[] params;
}

来源：https://stackoverflow.com/questions/6912406/printf-debugging-library-using-string-table-decoder-ring

标签

debugging

printf

string-table