Error when attempting to use a type from GNU's GMP library as the yylval type for Bison

浪尽此生 提交于 2021-01-27 06:37:19

问题


I'm attempting to use the type mpz_t from the GMP library as the type for yylval by including the following in the Bison file:

%define api.value.type {mpz_t}

I checked the generated parser and it correctly generates the line typedef mpz_t YYSTYPE, with YYSTYPE later being used to create yylval.

mpz_t is typdefed as typedef __mpz_struct mpz_t[1]; in the GMP header file gmp.h. In turn, __mpz_struct is typdefed as

typedef struct
{
    // struct members here - don't believe they're important
} __mpz_struct;

Bison runs without error, but whenever I try to create an executable I get the following error:

calc.tab.c: In function ‘yyparse’:

calc.tab.c:1148:12: error: incompatible types when assigning to type ‘YYSTYPE’ from type ‘struct __mpz_struct *’

*++yyvsp = yylval;

yyvsp is defined as a pointer to YYSTYPE.

Any idea of how to fix this?


回答1:


As you say, mpz_t is typedef'ed as an alias for an array type:

typedef __mpz_struct mpz_t[1];

As a result, assignment to a variable of type mpz_t is illegal:

mpz_t a, b;
mpz_init(b);
a = b;  /* Error: incompatible types when assigning to type ‘mpz_t’ */
        /* from type ‘struct __mpz_struct *’                        */

Instead, it is necessary to use one of the built-in assignment functions:

mpz_t a, b;
mpz_inits(a, b, 0);
mpz_set(a, b);   /* a is now a copy of b */

The ban on direct assignment to an mpz_t is necessary because of the way gmp manages memory. See Note 1 below.

Bison assumes that the semantic type YYSTYPE can be assigned to (see note 2), which means that it cannot be an array type. That's normally not a problem, because normally YYSTYPE is a union type, and a union with an array member can be assigned to. So there is no problem using an array type with bison provided that you include it in a %union declaration.

But you MUST NOT do that with gmp, because although it will compile, it won't work. You'll end up with lots of leaked memory and you're likely to end up with obscure bugs, where gmp computes the wrong values (or fails in more obvious ways, like freeing memory out from under an mpz_t).

Using mpz_t objects directly as semantic values is possible, but it will not be easy. You will end up spending a lot of time thinking about which stack slots have semantic values which have been initialized; which ones have values which need to be mpz_cleared, and a lot of other troubling details.

A simpler (but not simple) solution is to make the semantic value a pointer to an mpz_t. If you are just making a bignum calculator, you could bypass the semantic value altogether and maintain your own value stack. That will work out as long as every reduction action pops all its arguments from the value stack and pushes its result.

This value stack would also be a vector of mpz_t values, but it differs from the parser stack in several important ways, because it is entirely under your control:

  1. You are under no obligation to create the temporary value which bison needs to create (see note 2). If you want to do an addition, for example, which would pop two operands off the stack and push the result back on, you could just do that:

    mpz_add(val_stack[top - 2], val_stack[top - 2], val_stack[top - 1]);
    --top;
    
  2. You can initialize the value stack before parsing, and clear all elements when the parse is done. That makes memory management much simpler, and lets you reuse allocated limb vectors.

  3. Tokens like operators and parentheses, which have no associated semantic value, do not occupy space on the value stack. That doesn't save much space, but it avoids the need to initialize and clear stack slots which never have useful data in them.

Notes

1. Why GMP discourages direct assignment

According to the gmp manual, making mpz_t (and other similar types) arrays of size 1 was simply to compensate for C's lack of pass-by-reference. Since the array decays to a pointer when used as a function argument, you get pass-by-reference without having to explicit mark the argument. But it must have crossed someone's mind that the use of an array type also prevents direct assignment to an mpz_t. Direct assignment cannot work because of the way gmp manages memory.

Gmp values necessarily include a reference to allocated storage. (Necessarily, because there is no limit to the size of a bignum, so different bignums are different sizes.) Generally speaking, there are two ways of managing objects like that:

  1. Make the object immutable. Then it can be shared arbitrarily, because no modification is possible.

  2. Always copy the object on assignment, making sharing impossible. Then objects can be modified without affecting any other object.

These two strategies are exemplified by the Java and C++ approaches to strings, for example. Unfortunately, both strategies depend on some infrastructure in the language:

  • Immutable strings require garbage collection. Without a garbage collector, there is no way to tell when the storage for a string can be released. It would be possible to reference count the memory allocations, but the reference counts need to be incremented and decremented and unless you're prepared to make your code a swamp of reference count maintenance, you need some language support.

  • Copying strings requires overriding the assignment operator. That's possible in C++, but few other languages are so flexible.

There are also performance problems with both of the above strategies.

  • Immutable objects need to be copied when modified, which can turn simple linear complexity into quadratic complexity. This is a well-known problem with repetitive appends to Java or Python strings; Java's StringBuilder is intended to compensate for this issue. Immutable integers would be annoying; it is very common to accumulate sums, for example (sum += value;), and having to copy sum each time through such a loop could drastically slow down the loop.

  • On the other hand, forcing copy on assignment makes it impossible to share constants, or even to rearrange vectors. That can cause a lot of extra copying, again leading to linear algorithms turning into quadratic algorithms.

Gmp chose the mutable object strategy. Bignums must be copied on assignment, and since C does not allow overriding the assignment operator, the easiest solution was to ban use of the assignment operator, forcing the use of library functions.

Since there are occasions when it is useful to move bignums around without copying -- shuffling an array of bignums, for example -- gmp also provides a swap function. And, if you are very very careful and know more about gmp's internals than I do, it is probably possible to just use the union hack mentioned above, or to use memcpy(), in order to do a more complicated rearrangement of gmp objects, provided you maintain the important invariant:

Every vector of limbs must be referenced by precisely one and only one mpz_t object.

The reason that is important is that gmp will resize a bignum if necessary, using realloc. Suppose that a and b are mpz_t, and we use some hack to make them both the same bignum, sharing memory:

memcpy(a, b, sizeof(a));

Now, we make b much bigger:

mpz_mul(b, b, b);  /* Set b to b squared */

This will work fine, but internally it will do something like

tmp = realloc(b->_mp_d, 2 * b->_mp_size);
if (tmp) b->_mp_d = tmp;

in order to make b large enough to hold the result. That will work fine for b, but it could result in the limbs being pointed by a going into limbo, since a successful realloc which allocates new storage will automatically free the old storage.

The same thing would happen with any operation which increased the size of b; squaring it in place was just an example. a could end up with a dangling pointer after almost any modification which increases the size of b: mpz_add(b, tmp1, tmp2); (assuming tmp1 and/or tmp2 are larger than b.)

2. Why Bison requires that semantic values be assignable

Bison creates a temporary YYSTYPE object for every reduction; this temporary is the actual variable represented as $$ in the bison action. Before the reduction action is executed, the parser executes the equivalent of $$ = $1;. Once the action has been completed, $1 through $n are popped off the stack, and $$ is pushed onto it. In effect, this overwrites the old $1 with $$, which is why a temporary must be used. (Otherwise, setting $$ in an action would surprisingly invalidate $1.)



来源:https://stackoverflow.com/questions/32516093/error-when-attempting-to-use-a-type-from-gnus-gmp-library-as-the-yylval-type-fo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!