The big problem with your Haskell code is that you are dividing, which you don't in the C version.
Yes, you wrote n % 2 and n / 2, but the compiler replaces that with shifts and bitwise and. GHC has unfortunately not yet been taught to do that.
If you do the substitution yourself
module Main where
import System.Environment (getArgs)
import Data.Int (Int32, Int64)
import Data.Bits
main :: IO ()
main = do
    arg <- getArgs
    print $ maxCol 0 (read (head arg) :: Int64)
col :: Int64 -> Int32
col x = col' x 0
col' :: Int64 -> Int32 -> Int32
col' 1 n            = n
col' x n
    | x .&. 1 == 0  = col' (x `shiftR` 1) (n + 1)
    | otherwise     = col' (3 * x + 1) (n + 1)
maxCol :: Int32 -> Int64 -> Int32
maxCol maxS 2   = maxS
maxCol maxS n
    | s > maxS  = maxCol s (n - 1)
    | otherwise = maxCol maxS (n - 1)
    where s = col n
with a 64-bit GHC you get comparable speed (0.35s vs C's 0.32s on my box for a limit of 1000000). If you compile using the LLVM backend, you don't even need to replace the % 2 and / 2 with bitwise operations, LLVM does that for you (but it produces slower code, 0.4s, for your original Haskell source, surprisingly - normally, LLVM is not worse than the native code generator at loop optimisation).
With a 32-bit GHC, you won't get comparable speed, since with those, the primitive operations on 64-bit integers are implemented through C calls - there never was enough demand for fast 64-bit operations on 32-bit systems for them to be implemented as primops; the few people working on GHC spent their time doing other, more important, things.
  TL;DR: Is Haskell code quick to write and simple to maintain only for computationally simple tasks and loses this characteristic when performance is crucial?
That depends. You must have some idea of what sort of code GHC generates from what sort of input, and you must avoid some performance traps. With a bit of practice, it is quite easy to get within say 2× the speed of gcc -O3 for such tasks.