I\'m not very experienced with subjects such as Concurrency and Multithreading. In fact, in most of my web-development career I had never needed to touch these subjects.
Well for one thing, multiple threads are not the same as multiple processes, so fork() really does not apply here.
Multithreading/parallel processing is hard. First you have to figure out how to actually partition the task to be done. Then you have to coordinate all of the parallel bits, which may need to talk to each other or share resources. Then you need to consolidate the results, which in some cases can be every bit as difficult as the previous two steps. I'm simplifying here, but hopefully you get the idea.
So your question is, why would some languages be better at it? Well, several things can make it easier:
Optimized immutable data structures. You want to stick to immutable structures whenever possible in parallel processing, because they are much easier to reason about. Some languages have better support for these, and some have various optimizations, i.e. the ability to splice collections together without any actual copying while still enforcing the immutability. You can always build your own structures like these, but it's easier if the language or framework does it for you.
Synchronization primitives and ease of using them. When different threads do share state, they need to be synchronized and there are many different ways to accomplish this. The wider the array of sync primitives you get, the easier your task will ultimately be. Performance will take a hit if you have to sync with a critical section instead of a reader-writer lock.
Atomic transactions. Even better than a wide array of sync primitives is not having to use them at all. Database engines are very good at this; instead of you, the programmer, having to figure out exactly which resources you need to lock and when and how, you just say to the compiler or interpreter, "all of the stuff below this line needs to happen together, so make sure nobody else messes around with it while I'm using it." And the engine will figure out the locking for you. You almost never get this kind of simplicity in an abstract programming language, but the closer you can come, the better. Thread-safe objects that combine multiple common operations into one are a start.
Automatic parallelism. Let's say you have to iterate through a long list of items and transform them somehow, like multiply 50,000 10x10 matrices. Wouldn't it be nice if you could just tell the compiler: Hey, each operation can be done independently, so use a separate CPU core for each one? Without having to actually implement the threading yourself? Some languages support this kind of thing; for example, the .NET team has been working on PLINQ.
Those are just a few examples of things that can make your life easier in parallel/multi-threaded applications. I'm sure that there are many more.