Action Bank: A High-Level Representation of Activity in Video
MetadataShow full item record
Activity recognition in video is dominated by low and mid-level features, and while demonstrably capable, by nature, these features carry little semantic meaning. We present Action Bank, a conceptually simple yet effectively powerful method for carrying out high-level activity recognition on a wide variety of realistic videos "in the wild." The method leverages on the fact that a large number of smaller action detectors, when pooled appropriately, can provide high-level semantically rich features that are superior to low-level features in discriminating videos. Our method builds a high-level representation using the output of a large bank of individual, viewpoint-tuned action detectors. This high-level representation has rich applicability in a wide-variety of video understanding problems, and we have shown its capability on activity recognition by means of exhaustive experimentation on most of the benchmark activity datasets in use by the vision community. Our results shows a significant improvement on every major benchmark dataset we have attempted. This encourages further research on this subject in terms of optimization for speed and ability to run in real time. Findings in this research topic have been submitted to the IEEE conference on Computer Vision and Pattern Recognition, 2012 (CVPR).