How can I combine nom parsers to get a more bit-oriented interface to the data?

不问归期 提交于 2020-08-08 04:02:53

问题


I'm working on decoding AIS messages in Rust using nom.

AIS messages are made up of a bit vector; the various fields in each message are an arbitrary number of bits long, and they don't always align on byte boundaries.

This bit vector is then ASCII encoded, and embedded in an NMEA sentence.

From http://catb.org/gpsd/AIVDM.html:

The data payload is an ASCII-encoded bit vector. Each character represents six bits of data. To recover the six bits, subtract 48 from the ASCII character value; if the result is greater than 40 subtract 8. According to [IEC-PAS], the valid ASCII characters for this encoding begin with "0" (64) and end with "w" (87); however, the intermediate range "X" (88) to "_" (95) is not used.

Example

  • !AIVDM,1,1,,A,D03Ovk1T1N>5N8ffqMhNfp0,0*68 is the NMEA sentence
  • D03Ovk1T1N>5N8ffqMhNfp0 is the encoded AIS data
  • 010100000000000011011111111110110011000001100100000001011110001110000101011110001000101110101110111001011101110000011110101110111000000000 is the decoded AIS data as a bit vector

Problems

I list these together because I think they may be related...

1. Decoding ASCII to bit vector

I can do this manually, by iterating over the characters, subtracting the appropriate values, and building up a byte array by doing lots of work bitshifting, and so on. That's fine, but it seems like I should be able to do this inside nom, and chain it with the actual AIS bit parser, eliminating the interim byte array.

2. Reading arbitrary number of bits

It's possible to read, say, 3 bits from a byte array in nom. But, each call to bits! seems to consume a full byte at once (if reading into a u8).

For example:

named!(take_3_bits<u8>, bits!(take_bits!(u8, 3)));

will read 3 bits into a u8. But if I run take_3_bits twice, I'll have consumed 16 bits of my stream.

I can combine reads:

named!(get_field_1_and_2<(u8, u8)>, bits!(pair!(take_bits!(u8, 2), take_bits!(u8, 3))));

Calling get_field_1_and_2 will get me a (u8, u8) tuple, where the first item contains the first 2 bits, and the second item contains the next 3 bits, but nom will then still advance a full byte after that read.

I can use peek to prevent the nom's read pointer from advancing, and then manually manage it, but again, that seems like unnecessary extra work.

来源:https://stackoverflow.com/questions/48131656/how-can-i-combine-nom-parsers-to-get-a-more-bit-oriented-interface-to-the-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!