My dataset contains videos with varying length and frame rates (25 fps with max length of approx. 10 min). My goal is to classify videos by recognizing certain type of activity -a typical video activity recognition. This is how I'm planning to approach the problem :
- As a first step prepare equal sized videos with a fixed number of frames and feed it thru the network
- Are there any existing network architectures which can be used for such a task
Any suggestions ?