As part of my undergraduate CompSci degree, we were assigned the problem of finding optimal jvm flags for the Jikes research virtual machine. This was evaluated using the Dicappo benchmark suite which returns a time to the console. I wrote a distributed gentic alogirthm that switched these flags to improve the runtime of the benchmark suite, although it took days to run to compensate for hardware jitter affecting the results. The only problem was I didn't properly learn about the compiler theory (which was the intent of the assignment).
I could have seeded the initial population with the exisiting default flags, but what was interesting was that the algorithm found a very similar configuration to the O3 optimisation level (but was actually faster in many tests).
Edit: Also I wrote my own genetic algorithm framework in Python for the assignment, and just used the popen commands to run the various benchmarks, although if it wasn't an assessed assignment I would have looked at pyEvolve.