Generate equation with the result value closest to the requested one, have speed problems

问题

I am writing some quiz game and need computer to solve 1 game in the quiz if players fail to solve it.

Given data :

List of 6 numbers to use, for example 4, 8, 6, 2, 15, 50.
Targeted value, where 0 < value < 1000, for example 590.
Available operations are division, addition, multiplication and division.
Parentheses can be used.

Generate mathematical expression which evaluation is equal, or as close as possible, to the target value. For example for numbers given above, expression could be : (6 + 4) * 50 + 15 * (8 - 2) = 590

My algorithm is as follows :

Generate all permutations of all the subsets of the given numbers from (1) above
For each permutation generate all parenthesis and operator combinations
Track the closest value as algorithm runs

I can not think of any smart optimization to the brute-force algorithm above, which will speed it up by the order of magnitude. Also I must optimize for the worst case, because many quiz games will be run simultaneously on the server.

Code written today to solve this problem is (relevant stuff extracted from the project) :

from operator import add, sub, mul, div
import itertools


ops = ['+', '-', '/', '*']
op_map = {'+': add, '-': sub, '/': div, '*': mul}

# iterate over 1 permutation and generates parentheses and operator combinations
def iter_combinations(seq):
    if len(seq) == 1:
        yield seq[0], str(seq[0])
    else:
        for i in range(len(seq)):
            left, right = seq[:i], seq[i:]  # split input list at i`th place
            # generate cartesian product
            for l, l_str in iter_combinations(left):
                for r, r_str in iter_combinations(right):
                    for op in ops:
                        if op_map[op] is div and r == 0:  # cant divide by zero
                            continue
                        else:
                            yield op_map[op](float(l), r), \
                                  ('(' + l_str + op + r_str + ')')

numbers = [4, 8, 6, 2, 15, 50]
target = best_value = 590
best_item = None

for i in range(len(numbers)):
    for current in itertools.permutations(numbers, i+1): # generate perms
        for value, item in iter_combinations(list(current)):
            if value < 0:
                continue

            if abs(target - value) < best_value:
                best_value = abs(target - value)
                best_item = item

print best_item

It prints : ((((4*6)+50)*8)-2). Tested it a little with different values and it seems to work correctly. Also I have a function to remove unnecessary parenthesis but it is not relevant to the question so it is not posted.

Problem is that this runs very slowly because of all this permutations, combinations and evaluations. On my mac book air it runs for a few minutes for 1 example. I would like to make it run in a few seconds tops on the same machine, because many quiz game instances will be run at the same time on the server. So the questions are :

Can I speed up current algorithm somehow (by orders of magnitude)?
Am I missing on some other algorithm for this problem which would run much faster?

回答1:

You can build all the possible expression trees with the given numbers and evalate them. You don't need to keep them all in memory, just print them when the target number is found:

First we need a class to hold the expression. It is better to design it to be immutable, so its value can be precomputed. Something like this:

class Expr:
    '''An Expr can be built with two different calls:
           -Expr(number) to build a literal expression
           -Expr(a, op, b) to build a complex expression. 
            There a and b will be of type Expr,
            and op will be one of ('+','-', '*', '/').
    '''
    def __init__(self, *args):
        if len(args) == 1:
            self.left = self.right = self.op = None
            self.value = args[0]
        else:
            self.left = args[0]
            self.right = args[2]
            self.op = args[1]
            if self.op == '+':
                self.value = self.left.value + self.right.value
            elif self.op == '-':
                self.value = self.left.value - self.right.value
            elif self.op == '*':
                self.value = self.left.value * self.right.value
            elif self.op == '/':
                self.value = self.left.value // self.right.value

    def __str__(self):
        '''It can be done smarter not to print redundant parentheses,
           but that is out of the scope of this problem.
        '''
        if self.op:
            return "({0}{1}{2})".format(self.left, self.op, self.right)
        else:
            return "{0}".format(self.value)

Now we can write a recursive function that builds all the possible expression trees with a given set of expressions, and prints the ones that equals our target value. We will use the itertools module, that's always fun.

We can use itertools.combinations() or itertools.permutations(), the difference is in the order. Some of our operations are commutative and some are not, so we can use permutations() and assume we will get many very simmilar solutions. Or we can use combinations() and manually reorder the values when the operation is not commutative.

import itertools
OPS = ('+', '-', '*', '/')
def SearchTrees(current, target):
    ''' current is the current set of expressions.
        target is the target number.
    '''
    for a,b in itertools.combinations(current, 2):
        current.remove(a)
        current.remove(b)
        for o in OPS:
            # This checks whether this operation is commutative
            if o == '-' or o == '/':
                conmut = ((a,b), (b,a))
            else:
                conmut = ((a,b),)

            for aa, bb in conmut:
                # You do not specify what to do with the division.
                # I'm assuming that only integer divisions are allowed.
                if o == '/' and (bb.value == 0 or aa.value % bb.value != 0):
                    continue
                e = Expr(aa, o, bb)
                # If a solution is found, print it
                if e.value == target:
                    print(e.value, '=', e)
                current.add(e)
                # Recursive call!
                SearchTrees(current, target)
                # Do not forget to leave the set as it were before
                current.remove(e)
        # Ditto
        current.add(b)
        current.add(a)

And then the main call:

NUMBERS = [4, 8, 6, 2, 15, 50]
TARGET = 590

initial = set(map(Expr, NUMBERS))
SearchTrees(initial, TARGET)

And done! With these data I'm getting 719 different solutions in just over 21 seconds! Of course many of them are trivial variations of the same expression.

回答2:

All combinations for six number, four operations and parenthesis are up to 5 * 9! at least. So I think you should use some AI algorithm. Using genetic programming or optimization seems to be the path to follow.

In the book Programming Collective Intelligence in the chapter 11 Evolving Intelligence you will find exactly what you want and much more. That chapter explains how to find a mathematical function combining operations and numbers (as you want) to match a result. You will be surprised how easy is such task.

PD: The examples are written using Python.

回答3:

I would try using an AST at least it will
make your expression generation part easier
(no need to mess with brackets).

http://en.wikipedia.org/wiki/Abstract_syntax_tree

1) Generate some tree with N nodes
(N = the count of numbers you have).
I've read before how many of those you
have, their size is serious as N grows.
By serious I mean more than polynomial to say the least.

2) Now just start changing the operations
in the non-leaf nodes and keep evaluating
the result.

But this is again backtracking and too much degree of freedom.
This is a computationally complex task you're posing. I believe if you
ask the question as you did: "let's generate a number K on the output
such that |K-V| is minimal" (here V is the pre-defined desired result,
i.e. 590 in your example) , then I guess this problem is even NP-complete.

Somebody please correct me if my intuition is lying to me.

So I think even the generation of all possible ASTs (assuming only 1 operation
is allowed) is NP complete as their count is not polynomial. Not to talk that more
than 1 operation is allowed here and not to talk of the minimal difference requirement
(between result and desired result).

回答4:

24 game is 4 numbers to target 24, your game is 6 numbers to target x (0 < x < 1000).

That's much similar.

Here is the quick solution, get all results and print just one in my rMBP in about 1-3s, I think one solution print is ok in this game :), I will explain it later:

def mrange(mask):
    #twice faster from Evgeny Kluev
    x = 0
    while x != mask:
        x = (x - mask) & mask
        yield x 

def f( i ) :
    global s
    if s[i] :
        #get cached group
        return s[i]
    for x in mrange(i & (i - 1)) :
        #when x & i == x
        #x is a child group in group i
        #i-x is also a child group in group i
        fk = fork( f(x), f(i-x) )
        s[i] = merge( s[i], fk )
    return s[i] 

def merge( s1, s2 ) :
    if not s1 :
        return s2
    if not s2 :
        return s1
    for i in s2 :
        #print just one way quickly
        s1[i] = s2[i]
        #combine all ways, slowly
        # if i in s1 :
        #   s1[i].update(s2[i])
        # else :
        #   s1[i] = s2[i]
    return s1   

def fork( s1, s2 ) :
    d = {}
    #fork s1 s2
    for i in s1 :
        for j in s2 :
            if not i + j in d :
                d[i + j] = getExp( s1[i], s2[j], "+" )
            if not i - j in d :
                d[i - j] = getExp( s1[i], s2[j], "-" )
            if not j - i in d :
                d[j - i] = getExp( s2[j], s1[i], "-" )
            if not i * j in d :
                d[i * j] = getExp( s1[i], s2[j], "*" )
            if j != 0 and not i / j in d :
                d[i / j] = getExp( s1[i], s2[j], "/" )
            if i != 0 and not j / i in d :
                d[j / i] = getExp( s2[j], s1[i], "/" )
    return d    

def getExp( s1, s2, op ) :
    exp = {}
    for i in s1 :
        for j in s2 :
            exp['('+i+op+j+')'] = 1
            #just print one way
            break
        #just print one way
        break
    return exp  

def check( s ) :
    num = 0
    for i in xrange(target,0,-1):
        if i in s :
            if i == target :
                print numbers, target, "\nFind ", len(s[i]), 'ways'
                for exp in s[i]:
                    print exp, ' = ', i
            else :
                print numbers, target, "\nFind nearest ", i, 'in', len(s[i]), 'ways'
                for exp in s[i]:
                    print exp, ' = ', i
            break
    print '\n'  

def game( numbers, target ) :
    global s
    s = [None]*(2**len(numbers))
    for i in xrange(0,len(numbers)) :
        numbers[i] = float(numbers[i])
    n = len(numbers)
    for i in xrange(0,n) :
        s[2**i] = { numbers[i]: {str(numbers[i]):1} }   

    for i in xrange(1,2**n) :
        #we will get the f(numbers) in s[2**n-1]
        s[i] = f(i) 

    check(s[2**n-1])    



numbers = [4, 8, 6, 2, 2, 5]
s = [None]*(2**len(numbers))    

target = 590
game( numbers, target ) 

numbers = [1,2,3,4,5,6]
target = 590
game( numbers, target )

Assume A is your 6 numbers list.

We define f(A) is all result that can calculate by all A numbers, if we search f(A), we will find if target is in it and get answer or the closest answer.

We can split A to two real child groups: A1 and A-A1 (A1 is not empty and not equal A) , which cut the problem from f(A) to f(A1) and f(A-A1). Because we know f(A) = Union( a+b, a-b, b-a, a*b, a/b(b!=0), b/a(a!=0) ), which a in A, b in A-A1.

We use fork f(A) = Union( fork(A1,A-A1) ) stands for such process. We can remove all duplicate value in fork(), so we can cut the range and make program faster.

So, if A = [1,2,3,4,5,6], then f(A) = fork( f([1]),f([2,3,4,5,6]) ) U ... U fork( f([1,2,3]), f([4,5,6]) ) U ... U stands for Union.

We will see f([2,3,4,5,6]) = fork( f([2,3]), f([4,5,6]) ) U ... , f([3,4,5,6]) = fork( f([3]), f([4,5,6]) ) U ..., the f([4,5,6]) used in both.

So if we can cache every f([...]) the program can be faster.

We can get 2^len(A) - 2 (A1,A-A1) in A. We can use binary to stands for that.

For example: A = [1,2,3,4,5,6], A1 = [1,2,3], then binary 000111(7) stands for A1. A2 = [1,3,5], binary 010101(21) stands for A2. A3 = [1], then binary 000001(1) stands for A3...

So we get a way stands for all groups in A, we can cache them and make all process faster!

回答5:

1. Fast entirely online algorithm

The idea is to search not for a single expression for target value, but for an equation where target value is included in one part of the equation and both parts have almost equal number of operations (2 and 3). Since each part of the equation is relatively small, it does not take much time to generate all possible expressions for given input values. After both parts of equation are generated it is possible to scan a pair of sorted arrays containing values of these expressions and find a pair of equal (or at least best matching) values in them. After two matching values are found we could get corresponding expressions and join them into a single expression (in other words, solve the equation).

To join two expression trees together we could descend from the root of one tree to "target" leaf, for each node on this path invert corresponding operation ('*' to '/', '/' to '*' or '/', '+' to '-', '-' to '+' or '-'), and move "inverted" root node to other tree (also as root node).

This algorithm is faster and easier to implement when all operations are invertible. So it is best to use with floating point division (as in my implementation) or with rational division. Truncating integer division is most difficult case because it produces same result for different inputs (42/25=1 and 25/25 is also 1). With zero-remainder integer division this algorithm gives result almost instantly when exact result is available, but needs some modifications to work correctly when approximate result is needed.

See implementation on Ideone.

2. Even faster approach with off-line pre-processing

As noticed by @WolframH, there are not so many possible input number combinations. Only 3*3*(₄^9+4-1) = 4455 if repetitions are possible. Or 3*3*(₄⁹) = 1134 without duplicates. Which allows us to pre-process all possible inputs off-line, store results in compact form, and when some particular result is needed quickly unpack one of pre-processed values.

Pre-processing program should take array of 6 numbers and generate values for all possible expressions. Then it should drop out-of-range values and find nearest result for all cases where there is no exact match. All this could be performed by algorithm proposed by @Tim. His code needs minimal modifications to do it. Also it is the fastest alternative (yet). Since pre-processing is offline, we could use something better than interpreted Python. One alternative is PyPy, other one is to use some fast interpreted language. Pre-processing all possible inputs should not take more than several minutes.

Speaking about memory needed to store all pre-processed values, the only problem are the resulting expressions. If stored in string form they will take up to 4455*999*30 bytes or 120Mb. But each expression could be compressed. It may be represented in postfix notation like this: arg1 arg2 + arg3 arg4 + *. To store this we need 10 bits to store all arguments' permutations, 10 bits to store 5 operations, and 8 bits to specify how arguments and operations are interleaved (6 arguments + 5 operations - 3 pre-defined positions: first two are always arguments, last one is always operation). 28 bits per tree or 4 bytes, which means it is only 20Mb for entire data set with duplicates or 5Mb without them.

3. Slow entirely online algorithm

There are some ways to speed up algorithm in OP:

Greatest speed improvement may be achieved if we avoid trying each commutative operation twice and make recursion tree less branchy.
Some optimization is possible by removing all branches where the result of division operation is zero.
Memorization (dynamic programming) cannot give significant speed boost here, still it may be useful.

After enhancing OP's approach with these ideas, approximately 30x speedup is achieved:

from itertools import combinations

numbers = [4, 8, 6, 2, 15, 50]
target = best_value = 590
best_item = None
subsets = {}


def get_best(value, item):
    global best_value, target, best_item

    if value >= 0 and abs(target - value) < best_value:
        best_value = abs(target - value)
        best_item = item

    return value, item


def compare_one(value, op, left, right):
    item = ('(' + left + op + right + ')')
    return get_best(value, item)


def apply_one(left, right):
    yield compare_one(left[0] + right[0], '+', left[1], right[1])
    yield compare_one(left[0] * right[0], '*', left[1], right[1])
    yield compare_one(left[0] - right[0], '-', left[1], right[1])
    yield compare_one(right[0] - left[0], '-', right[1], left[1])

    if right[0] != 0 and left[0] >= right[0]:
        yield compare_one(left[0] / right[0], '/', left[1], right[1])

    if left[0] != 0 and right[0] >= left[0]:
        yield compare_one(right[0] / left[0], '/', right[1], left[1])


def memorize(seq):
    fs = frozenset(seq)

    if fs in subsets:
        for x in subsets[fs].items():
            yield x
    else:
        subsets[fs] = {}
        for value, item in try_all(seq):
            subsets[fs][value] = item
            yield value, item


def apply_all(left, right):
    for l in memorize(left):
        for r in memorize(right):
            for x in apply_one(l, r):
                yield x;


def try_all(seq):
    if len(seq) == 1:
        yield get_best(numbers[seq[0]], str(numbers[seq[0]]))

    for length in range(1, len(seq)):
        for x in combinations(seq[1:], length):
            for value, item in apply_all(list(x), list(set(seq) - set(x))):
                yield value, item


for x, y in try_all([0, 1, 2, 3, 4, 5]): pass

print best_item

More speed improvements are possible if you add some constraints to the problem:

If integer division is only possible when the remainder is zero.
If all intermediate results are to be non-negative and/or below 1000.

回答6:

Well I don't will give up. Following the line of all the answers to your question I come up with another algorithm. This algorithm gives the solution with a time average of 3 milliseconds.

#! -*- coding: utf-8 -*-
import copy

numbers = [4, 8, 6, 2, 15, 50]
target = 590

operations  = {
    '+': lambda x, y: x + y, 
    '-': lambda x, y: x - y,
    '*': lambda x, y: x * y,
    '/': lambda x, y: y == 0 and 1e30 or x / y   # Handle zero division
}

def chain_op(target, numbers, result=None, expression=""):
    if len(numbers) == 0:
        return (expression, result)
    else:
        for choosen_number in numbers:
            remaining_numbers = copy.copy(numbers)
            remaining_numbers.remove(choosen_number)
            if result is None:
                return chain_op(target, remaining_numbers, choosen_number, str(choosen_number))
            else:
                incomming_results = []
                for key, op in operations.items():
                    new_result = op(result, choosen_number)
                    new_expression = "%s%s%d" % (expression, key, choosen_number)
                    incomming_results.append(chain_op(target, remaining_numbers, new_result, new_expression))
                diff = 1e30
                selected = None
                for exp_result in incomming_results:
                    exp, res = exp_result
                    if abs(res - target) < diff:
                        diff = abs(res - target)
                        selected = exp_result
                    if diff == 0:
                        break
                return selected

if __name__ == '__main__':
    print chain_op(target, numbers)

Erratum: This algorithm do not include the solutions containing parenthesis. It always hits the target or the closest result, my bad. Still is pretty fast. It can be adapted to support parenthesis without much work.

回答7:

Actually there are two things that you can do to speed up the time to milliseconds.

You are trying to find a solution for given quiz, by generating the numbers and the target number. Instead you can generate the solution and just remove the operations. You can build some thing smart that will generate several quizzes and choose the most interesting one, how ever in this case you loose the as close as possible option.

Another way to go, is pre-calculation. Solve 100 quizes, use them as build-in in your application, and generate new one on the fly, try to keep your quiz stack at 100, also try to give the user only the new quizes. I had the same problem in my bible games, and I used this method to speed thing up. Instead of 10 sec for question it takes me milliseconds as I am generating new question in background and always keeping my stack to 100.

回答8:

What about Dynamic programming, because you need same results to calculate other options?

来源：https://stackoverflow.com/questions/20555559/generate-equation-with-the-result-value-closest-to-the-requested-one-have-speed

标签

python

performance

algorithm

permutation