I do not agree with the current accepted answer.
The foremost important aspect of multicore machines is that CPU and main memory are far apart. This means that unless the application is "embarrassingly parallel" or easy to parallelize, it is highly likely that it would be memory bound, rather than CPU bound. A floating point multiplication takes about 4 clock cycles, while a memory fetch from main memory takes hundreds of clock cycles. Therefore, exploiting cache locality becomes important.
For difficult-to-parallelize applications, if the achieved performance on single core is sufficient (majority of the applications would belong the this class), there is no need to parallelize. But if it is not (or your competitor's application is much more responsive since they parallelized), then you would do better to refactor your application to better exploit parallelism and cache locality. Vaguely, the refactored application would consist of relatively independent (or less communicative) submodules, which run in parallel (see this example, for one).
See http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html for a good overview on multicore and the way things are heading. The major points they say are:
- Clock speed is not increasing anymore as before. It is more cost effective to manufacture more number of slower, simpler cores, than a small number of fast processors.
- The memory is (increasingly) far from CPU
- In a few years, there will be 1000s of cores in web servers, 100s on desktops. So plan to scale your application (probably auto-scale) to 100s or 1000s of cores. This means you should create several independent tasks.
- Threads are difficult to work with, therefore better work with "tasks".