I am trying to predict total number of items in videos. And every video have different number of items in it? I want to do it by 2dcnn. But i am confused that what exactly w