Re-identification for Online Person Tracking using Discriminative Spatio-temporal Features
MetadataShow full item record
The goal of this dissertation is to develop an effective online multi-person multi-camera tracking system without unrealistic assumptions. The key technical elements necessary for this include: (i) transformation of detection of each person into a feature space to identify individuals and handle changes in position and posture, (ii) automated procedure to extract features and form trajectories without the need of prior information of individuals, and (iii) learning the evolution of the frame-by-frame spatial representation with the temporal dependencies. The thesis presents a novel model named Continuous Entity Association that combines the two acts of tracking within and across cameras and reformulates it as a single problem of continuous re-identification. The approach unifies the two separate tasks and presents a much clearer and simpler online solution which has the advantage of not requiring temporally contiguous sequences of video frames for tracking. This is accomplished by extracting appearance and facial features, and modeling location constraints across cameras. The approach is validated by using a simple and efficient inference algorithm. Next, a discriminative spatio-temporal learning approach for online tracking using LSTM networks is proposed. The idea is to exploit LSTM's temporal step-by-step functionality to identify detections as belonging to the same individual and recovering from past errors in associating different individuals to a particular trajectory. State-of-the-art tracking results are obtained on two large publicly available datasets, CamNeT and DukeMTMC. Composite Appearance Network (CAN) - a simple and novel metadata-based architecture with jointly attentive spatio-temporal pooling, is proposed for studying the implications of CAN for inter-camera tracking. It measures the relative quality of every feature map in a trajectory and weakens the noisy features to narrow down variances for an identity, thus making the trajectory representation more discriminative. Finally, a continuous error metric called ``Inference Error'' that provides a better estimate of tracking error, by treating within-camera and inter-camera errors uniformly, is presented. The proposed tracking algorithm is completely automatic, giving reliably correct identities even for multi-camera scenarios with complex indoor and outdoor movements, and varying number of persons. The approach is not limited to the mentioned detection features and will encourage research in modeling other constraints in the form of speed, social grouping and travel time.