How to implement CRC32 taking advantage of Intel specific instructions?

Deadly 提交于 2019-12-20 10:02:10

问题


Intel has a specific CRC32 instruction available in the SSE4.2 instruction set. How can I take advantage of this instruction to speed up CRC32 calculations?


回答1:


First of all the Intel's CRC32 instruction serves to calculate CRC-32C (that is uses a different polynomial that regular CRC32. Look at the Wikipedia CRC32 entry)

To use Intel's hardware acceleration for CRC32C using gcc you can:

  1. Inline assembly language in C code via the asm statement
  2. Use intrinsics _mm_crc32_u8, _mm_crc32_u16, _mm_crc32_u32 or _mm_crc32_u64. See Intel Intrinsics Guide for a description of those for the Intel's compiler icc but gcc also implements them.

This is how you would do it with __mm_crc32_u8 that takes one byte at a time, using __mm_crc32_u64 would give further performance improvement since it takes 8 bytes at a time.

uint32_t sse42_crc32(const uint8_t *bytes, size_t len)
{
  uint32_t hash = 0;
  size_t i = 0;
  for (i=0;i<len;i++) {
    hash = _mm_crc32_u8(hash, bytes[i]);
  }

  return hash;
}

To compile this you need to pass -msse4.2 in CFLAGS. Like gcc -g -msse4.2 test.c otherwise it will complain about undefined reference to _mm_crc32_u8.

If you want to revert to a plain C implementation if the instruction is not available in the platform where the executable is running you can use GCC's ifunc attribute. Like

uint32_t sse42_crc32(const uint8_t *bytes, size_t len)
{
  /* use _mm_crc32_u* here */
}

uint32_t default_crc32(const uint8_t *bytes, size_t len)
{
  /* pure C implementation */
}

/* this will be called at load time to decide which function really use */
/* sse42_crc32 if SSE 4.2 is supported */
/* default_crc32 if not */
static void * resolve_crc32(void) {
  __builtin_cpu_init();
  if (__builtin_cpu_supports("sse4.2")) return sse42_crc32;

  return default_crc32;
}

/* crc32() implementation will be resolved at load time to either */
/* sse42_crc32() or default_crc32() */
uint32_t crc32(const uint8_t *bytes, size_t len) __attribute__ ((ifunc ("resolve_crc32")));



回答2:


See this answer for fast hardware and software implementations of CRC-32C. The hardware implementation effectively runs three crc32 instructions in parallel for speed.



来源:https://stackoverflow.com/questions/31184201/how-to-implement-crc32-taking-advantage-of-intel-specific-instructions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!