How to call a minimax method(with alpha beta pruning) properly

问题

This is my minimax method which implements alpha beta pruning and memoization:

public int[] newminimax499(int a, int b){
    int bestPos=-1;
    int alpha= a;
    int beta= b;
    int currentScore;
    //boardShow();
    String stateString = "";                                                
    for (int i=0; i<state.length; i++) 
        stateString += state[i];                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) 
        return oldAnswer;
    if(isGameOver2()!='N'){
        int[] answer = {score(), bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='O'){  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                //revert(x);
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
                /*if(alpha>=beta){
                    break;
                }*/
            }
            else {  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                //revert(x);
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
                /*if(alpha>=beta)
                    break;*/
            }
            revert(x);
            if(alpha>=beta)
                break;
        }
    }
    if(turn=='O'){ 
        int[] answer = {alpha, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else {
        int[] answer = {beta, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
}

As a test game, in my main method I place an X somewhere(X is the player), and then call newminimax499 to see where I should place O(the computer):

 public static void main(String[] args) {
    State3 s=new State3(3);
    int [] result=new int[2];
    s.setX(4);
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);
    System.out.println("Score: "+result[0]+" Position: "+ result[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();
}

}

The method returns the position where the computer should play it's O(in this scenario it's 6), so I place O as instructed, play an X for myself, call newminimax499 and run the code again to see where O wants to play and so on and so forth.

public static void main(String[] args) {
    State3 s=new State3(3);
    int [] result=new int[2];
    s.setX(4);
    s.setO(6);//Position returned from previous code run
    s.setX(2);
    s.setO(8);//Position returned from previous code run
    s.setX(3);
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);
    System.out.println("Score: "+result[0]+" Position: "+ result[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();
}

After this particular run I get the result

Score: 10 Position: 7

Which is good. However, in my GUI this isn't how newminimax gets called. Over there the board doesn't get reset every time a new X or O is placed. If I were to put it in a main method like in the previous examples it would look something like this(keep in mind that it's the exact same sequence of input):

public static void main(String[] args) {
    State3 s=new State3(3);
    int [] result=new int[2];
    s.setX(4); //Player makes his move
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);//Where should pc play?
    s.setO(result[1]);//PC makes his move
    s.setX(2);//Player makes his move
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);//Where should PC make his move?
    s.setO(result[1]);//PC makes his move
    s.setX(3);//Player makes his move
    result=s.newminimax499(Integer.MIN_VALUE, Integer.MAX_VALUE);
    System.out.println("Score: "+result[0]+" Position: "+ result[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();
}

Now, when the method is called this way(which is how it's called in the GUI) it returns:

Score: 0 Position: 5

Which means that instead of taking the winning move, it blocked the opponent. After playing a few games this way it became clear that the PC actually loses. So why is it that these 2 ways of calling newminimax499 return different results?

This is how it looks on the GUI:

Note: All methods needed to run the program can be found in this post.

回答1:

The problem you have encountered here is the same as in chess with transposition tables and alpha beta. I have gotta contradict you in the point that they are incompatible!

As I suggested multiple times before, please read the corresponding chessprogramming wiki articles before you try to implement something!

In order to make memo and AB work together, you have to save a flag for every postion in your memo table that differentiates between alpha-cut-nodes, beta-cut-nodes and precise nodes.

And believe me, I know from experience that they work together ;)

回答2:

After playing around with a bunch of ideas I finally found the answer so might as well post it. The method in question here, newminimax499, is trying to implement both memoization AND alpha beta pruning. For some reason it seems that these 2 utilities are incompatible(or at least my implementation of these 2 utilities makes them incompatible). After removing the parts related to memoization the method becomes a pure alpha beta pruning minimax algorithm, works fine, and looks like this:

public int[] newminimax499(int alpha, int beta){
    int bestPos=-1;
    int currentScore;
    if(isGameOver2()!='N'){
        int[] answer = {score(), bestPos};                                    
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='O'){  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
            }
            else {  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
            }
            revert(x);
            if(alpha>=beta)
                break;
        }
        if(turn=='O'){ 
            int[] answer = {alpha, bestPos};                                    
            return answer;
        }
        else {
            int[] answer = {beta, bestPos};                                    
            return answer;
        }
    }
}

Not only does this method now work(however you call in the main method), but it's also much faster than a minimax with memoization. This method calculates the 2nd move in a 4x4 game in a mere 7 seconds. Whereas a minimax which implements memoization calculates it in about 23 seconds.

来源：https://stackoverflow.com/questions/32154533/how-to-call-a-minimax-methodwith-alpha-beta-pruning-properly

标签

java

algorithm

artificial-intelligence

tic-tac-toe

minimax