问题

I've read tons of web hits related to this issue, and I still haven't come across any definitive answer.

What I'd like to do is to make a database of chess positions, capable of identifying transpositions (generally which pieces are on which squares).

EDIT: it should also be capable to identify similar (but not exactly identical) positions.

This is a discussion almost 20 years ago (when space was an issue): https://groups.google.com/forum/#!topic/rec.games.chess.computer/wVyS3tftZAA

One of the discussants talk about encoding pieces on a square matrix, using 4 x 64 bits plus some bits more for the additional information (castling, en passant etc): there are six pieces (Pawn, Rook, Knight, Bishop, Queen, King) plus an empty square, that would be 3 bits (2^3), and one more bit for the color of the piece.

In total, there would be 4 numbers of 64bits each, plus some additional information.

Question: is there any other, more efficient way of storing a chess position?

I should probably mention this question is database centric, not game centric (i.e. my sole interest is to efficiently store and retrieve, not to create any AI or to generate any moves).

Thanks, Adrian

回答1:

There are 32 pieces on the board, and 64 squares. Square index can be represented with a 6-bit number, so to represent the locations of each piece you need 32 six-bit numbers, or a total of 192 bits, which is less than 4x64.

You can do a bit better by realizing that not all positions are possible (e.g. a pawn cannot reach the end row of its own color) and using less than six bits for the position in these cases. Also, a position already occupied by another piece makes that position unavailable for other pieces.

As a piece may also be totally missing from the board, you should start with the kings' positions, as they are always there - and then, encoding another piece's position as the same of a king would mean that the piece has been taken.

Edit:

A short analysis of the pieces' possible positions:

Kings, queens, knights and rooks can be anywhere on the board (64 positions)
Bishops are restricted to 32 positions each
Pawns are restricted to 21, 26, 30, 32, 32, 30, 26, and 21 positions (columns A-H).

Thus, this set of legal chess positions can be described trivially with an integer from zero up to (64^12 * 32^4 * 21^4 * 26^4 * 30^4 * 32^8)-1, or 391935874857773690005106949814449284944862535808450559999, which fits into 188 bits. Encoding and decoding a position to and from this is very straightforward - however, there are multiple numbers that decode into the same position (e.g. white knight 1 at B1 and white knight 2 at G1; and white knight 1 at G1 and white knight 2 at B1).

Due to the fact that no two pieces can occupy the same square, there is a tighter limit but it is a bit difficult to both encode and decode, so probably not useful in a real application. Also, the number shown above is pretty close to 2^188, so I don't think even this tighter encoding would fit into 187 bits.

回答2:

Take a loot at the Forsyth–Edwards Notation (FEN). It is described here. It is also well known and supported by many engines and chess programs.

Here is the FEN for the starting position:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Fen is seperated in 6 segments.

Segment 1 contains the pieces. black pieces are in lower case, white pieces are in upper case.

Segment 2 states, who's turn is it. (w or b)

Segment 3 is for castling. KQkq means both can castle on both sides. K = King side white q = queen side black

Segment 4 En passant target square in algebraic notation. If there's no en passant target square, this is "-". If a pawn has just made a two-square move, this is the position "behind" the pawn. This is recorded regardless of whether there is a pawn in position to make an en passant capture

Segment 5 Halfmove clock: This is the number of halfmoves since the last capture or pawn advance. This is used to determine if a draw can be claimed under the fifty-move rule.

Segment 6 Fullmove number: The number of the full move. It starts at 1, and is incremented after Black's move.

回答3:

You could use a modified run length encoding where each piece is encoded as a piece number (3 bits), with 0y111 used to skip ahead spaces. As there are many situations where pieces are next to each other, you end up omitting the positional information:

         All pieces are followed by color bit
0y000c 0 Pawn
0y001c 1 Rook
0y010c 2 Knight
0y011c 3 Bishop
0y100c 4 Queen
0y101c 5 King
0y110 6 Empty space
0y111 7 Repeat next symbol (count is next 6 bits, then symbol)

The decoder starts off at a1 and proceed to the right, moving up at the end of a row, so the encoding for a starting board would be:

12354321      Literal white encoding from a1 to h1    32 bits
7 8 0         repeat white pawn 8 times               13 bits
7 32 6        repeat 32 empty spaces                  12 bits
7 8 8         repeat black pawn 8 times               13 bits
9abcdba9      Literal encoding of black               32 bits
                                                    ---------
                                                     102 bits total

That being said, the additional complexity and uncertainty of a variable length encoding is probably not worth the space savings. Further, it may be worse than a constant width format in certain plays.

回答4:

If you do not need a decodable position representation for comparisons then you could look at Zobrist hashing. This is used by chess engines to produce a 64 bit oneway hash of a position for spotting transpositions in search trees. As it is a oneway hash you obviously cannot reverse the position from the hash. The size of the hash is tunable, but 64 bits seems to be the accepted minimum size that results in few collisions. It would be ideal as a database index key with a fixed length of just 8 bytes. As collisions, though infrequent, are possible you could do a second pass comparing the actual positions to filter out any positions that have hashed to the same value if it is a concern. I use Zobrist hashes in one of my own applications (using SQLite) that I use to manage my openings and it has no trouble in finding transpositions.

回答5:

Consider up to 32 pieces on the board. Each piece can be on one of the 64 squares. Representing the piece positions independendly in predetermined order requires 32*6=192 bits.

In addition to that each pawn can be promoted to a rook, bishop, knight or queen, so for each pawn we need to encode its state in 3 additional bits (4 possible piecetypes and normal pawn).

32*6+16*3 = 240 bit/ 30 byte

In many cases you will need additional information about the state of the game/variant the position arises in:

EnPassent File: 4 bits (8 files and none)

Castlerights: 4 bits (short/long white/black)

sideToMove : 1 bit (white/black)

which adds up to 249 bit/32 bytes.

This might not be the most compact representation, but it is easy to en-/decode.

回答6:

My two cents

Short version

Choose data format that makes it easy to count the similarity of two positions.
Store positional data near the searching program (possibly in memory).
Brute force the search through all the positions when searching similar positions.
Possibly divide the search to multiple threads/processes.

Longer version

32 bytes (4*64 bits) is quite small amount of data. 1000 million chess positions could fit in to 30 gigabytes. 192 bits is 24 bytes this would make in to 23 gigabytes. Probably database use some kind of compression and thus the in disk might be less than these figures. I don't know what kind of limits there are for storage, but because these seems quite tight encodings it might not be worth the effort try to minimize more.

Because ability to find similar positions was required I think the encoding should make it easy to compare different positions. Preferably this could be counted without decoding. For this to work the encoding should probably be constant length (can't think easy way to do this with variable length coding).

Indexing might speed up similarity search. Naive approach would be index all the positions by piece locations in database. This would make 32 indexes (and maybe for additional information also). It would make the search lightning fast at least in theory.

Indexes are going to take quite much space. Probably more than the actual positional data. And still they might not help that much. For example finding positions where black king is in, or near e4 required 9 searches using the index and then hopping around the 30 gigabytes of positional information which is likely need disk access in random locations. And probably finding similar positions is done for more than one piece...

If the storage format is efficient it might just be enough to brute force (like this)through all positional data and check the similarity position by position. This will use the CPU caches efficiently. Also because of the constant length record it is easy to divide the work to multiple processors or machines.

Whether to use piece-centric or board-based storage format depends on how you are going to calculate the similarity of two positions compared to each others. Piece-centric gives easy way to calculate distance of one piece in two different positions. However in piece-centric approach every piece is identified separately thus it is not so easy to find a pawn in certain location. One has to check every pawns location. If the piece identity is not so important, then board-based storage makes it easy to just check if a pawn is in wanted location. On the other hand it is not possible to check which exact pawn there is.

回答7:

There are two simple ways to store the information of the board: either by storing the locations of each piece or by storing for each square what is in it.

As Mitch explains, there is a way to compress a little bit using RLE, however the examples given is the start position, which is particularly simple to describe. In another case where pieces are spread on the board, you could have space and pieces alternating, and RLE would not compress anything. So unless a more complex algorithm is used, you're back to no compression.

I think jlahd made a mistake in the computation by counting twice the center pawns, so that in fact the space required to store the location for each piece is not 188 bits but 168 bits. To that you need to store as well if the pawns have been promoted. So in fact, for pawns there are (32 + 4x64)^16 possibilities. That's total of 223 bits = 28 bytes.

If instead we store for each square its content, we need to count the possibilities for on square. For most squares, there are 6 possible white pieces and same for black. For the top and bottom row, one color of pawn cannot appear. So that is 13 possibilites for center squares and 12 possibilites for top and bottom squares, so 13^48 x 12^16 possibilites. The location of the en-passant is 17 possibilites. So that's about 240 bits.

To conclude, it seems you can gain 12.5% space by storing pieces positions instead of the content of each square.

来源：https://stackoverflow.com/questions/21374846/efficient-storage-of-a-chess-position

标签

database

position

chess

efficient storage of a chess position

问题