I am trying to train custom object classifier in Darknet YOLO v2 https://pjreddie.com/darknet/yolo/
I gathered a dataset for images most of them are 6000 x 4000 px and s
It is very common to resize images before training. 416x416 is slightly larger than common. Most imagenet models resize and square the images to 256x256 for example. So I would expect the same here. Trying to train on 6000x4000 is going to require a farm of GPUs. The standard process is to square the image to the largest dimension (height, or width), padding with 0's on the shorter side, then resizing using standard image resizing tools like PIL.