I need to read instructions one-by-one from a small code segment in memory and I have to find out the size of the instructions which I have in memory.
The following
There is XED library from Intel to work with x86/x86_64 instructions: https://github.com/intelxed/xed, and it is the only correct way to work with intel machine codes both in x86 and x86_64 modes. It is used by Intel (and was part of their Pin): https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-software-library
https://software.intel.com/sites/landingpage/pintool/docs/67254/Xed/html/main.html XED User Guide (2014) https://software.intel.com/sites/landingpage/pintool/docs/56759/Xed/html/main.html XED2 User Guide (2011)
xed_decode function will provide you all information about instruction: https://intelxed.github.io/ref-manual/group__DEC.html
https://intelxed.github.io/ref-manual/group__DEC.html#ga9a27c2bb97caf98a6024567b261d0652
And xed_ild_decode will only decode instruction for its length:
https://intelxed.github.io/ref-manual/group__DEC.html#ga4bef6152f61997a47c4e0fe4327a3254
XED_DLL_EXPORT xed_error_enum_t xed_ild_decode ( xed_decoded_inst_t * xedd, const xed_uint8_t * itext, const unsigned int bytes )This function just does instruction length decoding. It does not return a fully decoded instruction.
Parameters
- xedd the decoded instruction of type xed_decoded_inst_t . Mode/state sent in via xedd; See the xed_state_t .
- itext the pointer to the array of instruction text bytes
- bytes the length of the itext input array. 1 to 15 bytes, anything more is ignored.
Returns:
xed_error_enum_t indiciating success (XED_ERROR_NONE) or failure. Only two failure codes are valid for this function: XED_ERROR_BUFFER_TOO_SHORT and XED_ERROR_GENERAL_ERROR. In general this function cannot tell if the instruction is valid or not. For valid instructions, XED can figure out if enough bytes were provided to decode the instruction. If not enough were provided, XED returns XED_ERROR_BUFFER_TOO_SHORT. From this function, the XED_ERROR_GENERAL_ERROR is an indication that XED could not decode the instruction's length because the instruction was so invalid that even its length may across implementations.
To get length from xedd struct, filled by xed_ild_decode, use xed_decoded_inst_get_length: https://intelxed.github.io/ref-manual/group__DEC.html#gad1051f7b86c94d5670f684a6ea79fcdf
static XED_INLINE xed_uint_t xed_decoded_inst_get_length ( const xed_decoded_inst_t * p )Return the length of the decoded instruction in bytes.
Example code ("Apache License, Version 2.0", by Intel 2016): https://github.com/intelxed/xed/blob/master/examples/xed-ex-ild.c
#include "xed/xed-interface.h"
#include
int main()
{
xed_bool_t long_mode = 1;
xed_decoded_inst_t xedd;
xed_state_t dstate;
unsigned char itext[15] = { 0xf2, 0x2e, 0x4f, 0x0F, 0x85, 0x99,
0x00, 0x00, 0x00 };
xed_tables_init(); // one time per process
if (long_mode)
dstate.mmode=XED_MACHINE_MODE_LONG_64;
else
dstate.mmode=XED_MACHINE_MODE_LEGACY_32;
xed_decoded_inst_zero_set_mode(&xedd, &dstate);
xed_ild_decode(&xedd, itext, XED_MAX_INSTRUCTION_BYTES);
printf("length = %u\n",xed_decoded_inst_get_length(&xedd));
return 0;
}
Any other solution like manual prefix/opcode parsing or using third-party disassembler may give you wrong results for some rare cases. We don't know which library is used inside Intel to verify their hardware instruction decoders, but xed is the library used by their software decoders in various binary tools. The ild decoder of xed has more than 1600 lines of code: https://github.com/intelxed/xed/blob/master/src/dec/xed-ild.c, and should be more precise than any other library.