Just use a temporary variable and move the last bit into that variable, then shift the bit in that direction and end of masking in the bits in the tmp var and you are done.
Update:
Let's add some code and then you can choose what is more readable.
The working one liner
unsigned int data = 0x7654;
data = (data ^ data & 0xff) | ((data & 0xf) << 4) | ((data & 0xf0) >> 4);
printf("data %x \n", data);
the same code but with some tmp vars
unsigned int data = 0x7654;
unsigned int tmp1 = 0;
unsigned int tmp2 = 0;
tmp1 = (0x0f&data)<<4;
tmp2 = (0xf0&data)>>4;
tmp1 = tmp1 | tmp2;
data = data ^ (data & 0xff);
data = data | tmp1;
printf("data %x \n", data);
Well the one liner is shorter anyway :)
Update:
And if you look at the asm code that gcc generated with -Os -S, my guess is that they are more or less identical since the overhead is removed during the "compiler optimisation" part.