I\'m working on an algorithm which goal is to find a minimum set of packages to install package \"X\".
I\'ll explain better with an example:
A lot of the answers here focus on how this is a theoretically hard problem due to its NP-hard status. While this means you will experience asymptotically poor performance exactly solving the problem (given current solution techniques), you may still be able to solve it quickly (enough) for your particular problem data. For instance, we are able to exactly solve enormous traveling salesman problem instances despite the fact that the problem is theoretically challenging.
In your case, a way to solve the problem would be to formulate it as a mixed integer linear program, where there is a binary variable x_i
for each package i
. You can convert requirements A requires (B or C or D) and (E or F) and (G)
to constraints of the form x_A <= x_B + x_C + x_D ; x_A <= x_E + x_F ; x_A <= x_G
, and you can require that a package P
be included in the final solution with x_P = 1
. Solving such a model exactly is relatively straightforward; for instance, you can use the pulp package in python:
import pulp
deps = {"X": [("A"), ("E", "C")],
"A": [("E"), ("H", "Y")],
"E": [("B"), ("Z", "Y")],
"C": [("A", "K")],
"H": [],
"B": [],
"Y": [],
"Z": [],
"K": []}
required = ["X"]
# Variables
x = pulp.LpVariable.dicts("x", deps.keys(), lowBound=0, upBound=1, cat=pulp.LpInteger)
mod = pulp.LpProblem("Package Optimization", pulp.LpMinimize)
# Objective
mod += sum([x[k] for k in deps])
# Dependencies
for k in deps:
for dep in deps[k]:
mod += x[k] <= sum([x[d] for d in dep])
# Include required variables
for r in required:
mod += x[r] == 1
# Solve
mod.solve()
for k in deps:
print "Package", k, "used:", x[k].value()
This outputs the minimal set of packages:
Package A used: 1.0
Package C used: 0.0
Package B used: 1.0
Package E used: 1.0
Package H used: 0.0
Package Y used: 1.0
Package X used: 1.0
Package K used: 0.0
Package Z used: 0.0
For very large problem instances, this might take too long to solve. You could either accept a potentially sub-optimal solution using a timeout (see here) or you could move from the default open-source solvers to a commercial solver like gurobi or cplex, which will likely be much faster.