# Person Re-identification: Past, Present and Future

Liang Zheng, Yi Yang, Alexander G. Hauptmann, 2016.10

University of Technology at Sydney, Carnegie Mellon University

## 1. Introduction

“ To re-identify a particular, then, is to identify it as (numerically) the same particular as one encountered on a previous occasion “.

### 1.2 A Brief History of Person Re-ID

Person re-ID的研究始于多摄像头追踪[8]。自那时起已发展处多个重要方向。在本调查中，我们简短介绍了person re-ID的一些里程碑（图2）。

Multi-camera tracking. 在早年，person re-ID这个词还未被正式提出的时候，紧紧地跟多摄像头跟踪一起出现，它将外观模型与不相交的摄像头的几何校正geometry calibration集成。1997年，Huang和Russel[9]提出了一个用于在给与其它摄像头观察证据后，估计它在一个摄像头的后续出现预测的贝叶斯公式。其外观模型包括多个时空spartial-temporal特征如颜色，车长vehicle length，宽高，速度和观测时间。关于这一主题的综合调查见[8]。

Multi-camera tracking with explicit “re-identification”. 据我们所知，最早提出“person re-identification”的关于多摄像头追踪的论文是2005年来自阿姆斯特丹大小的Wojciech Zajdel, Zoran Zivkovic和Ben J.A Krose[10]提出的。在他们的ICRA’05年的“Keeping track of humans: Have I seen this person before?”中，他们的目标是“在一个人离开视野又回来后重识别他”。在他们的方法中，为每个人赋予了一个独特，潜在latent的标签，并定义了一个动态的贝叶斯网络来编码标签和来自踪片（追踪小片段tracklets）的特征（颜色和时空线索）的概率性关系。一个进入的人的ID通过用近似贝叶斯推理算法Bayesian inference algorithm计算的后续标签分布决定。

The independence of re-ID (image-based). 2006年，Gheissari等人[11]在一个用于前景检测的时空分割算法后，仅用了人的视觉线索。基于颜色和显著边缘直方图salient edgel histograms的视觉匹配是通过一个清晰的行人模型或Hessian-Affine interest point operator完成的。实验在一个由3个有适度视野重叠的摄像头捕获的44个人的数据集上进行。值得一提的是尽管它们的时空分段方法使用了视频帧，特征设计和匹配过程都没有使用视频信息，故我们将其分类为基于图片的re-ID。这一论文标志着person re-ID与多摄像头追踪的分离，以及作为独立计算机视觉任务的开始。

Video-based re-ID. 最初用于在视频进行追踪，但大部分re-ID研究着重于图片匹配。在2010年，两个论文提出了multishot re-ID[12, 13]，它们随机选取帧。颜色是两个论文都使用了的特征，而Farenzena等人的[13]额外使用了一个分割模型来检测前景。对于距离测量，两个论文都是计算两个图片集的限位框间最小距离，Bazzani等人[12]进一步对颜色和一般缩影特征generic epitome features使用了巴氏Bhattacharyya距离。证明了对每个人使用多帧比单帧有效，而随着选择的帧数量增加，re-ID准确度会饱和。

Deep learing for re-ID. 2014年，深度学习在图片分类[14]上的成功散布到了re-ID，Yi等人[15]和Li等人[16]都使用一个孪生siamese卷积网络[17]来决定是否一对输入图片属于同一个ID。选择孪生模型的原因也许是各个身份的训练样本数量有限（通常是两个）。除开一些参数设定上的变种，主要区别在于[15]为网络增加了一个额外的cost函数，而[16]使用了更好的身体划分。它们使用的数据集不同，因此两个方法没法直接比较。尽管在小数据集上的性能还未稳定，深度学习方法已成为re-ID的流行选项。

End-to-end image-based re-ID. 尽管大部分研究实验中都使用手工切分的限位框或由固定检测器输出的限位框，研究行人检测器对于re-ID准确度的影响仍有必要。2014年，Xu等人[18]通过结合检测(commonness)和re-ID(uniqueness)得分解决了这一题目。在CAMPUS数据集上发现，联合考虑检测和re-ID得分能比分开使用得到更高的行人检索准确度。

## 2. Image-Based Person Re-ID

### 2.1 Hand-crafted Systems

#### 2.1.2 Distance Metric Learning

global metric learing的大体思想是让相同类的向量更近，不同类的向量更远。最常用的方程基于马氏Mahalanobis距离，通过用特征空间用线性缩放和旋转将欧氏距离泛化而来。两个向量$x_i, x_j$的平方距离可以写作：

## 3. Video-Based Person Re-ID

### 3.1 Hand-crafted Systems

2010的前两个尝试[12, 13]都是手工系统。它们主要使用了基于颜色的描述符并有选择的使用前景分割来检测行人。他们使用了与基于图片的re-ID方法的类似图片特征，主要差别是匹配函数。如章节1.2所提到的，两个方法都计算两个限位框集的特征的最小欧氏距离作为集的相似度。本质上这种方法应被分类到多帧“multi-shot” person re-ID中，其两个帧集合的相似性扮演了关键角色。这一多帧匹配策略被后来的研究[97, 98]所采用。在[86]中，多帧被用于训练一个基于协方差特征集的有辨识力的boosting模型。在[99]中，SURF局部特征被用于检测和描述兴趣点，接着在KD树中建立索引以加速匹配。在[11]中生成了一个时空图来为前景分割辨识时空稳定区域。接着用聚类方法对时间段内计算出局部描述，以提升匹配性能。Cong等人[100]使用来自视频序列的复写地理结构manifold geometric structure来构建基于颜色特征的更紧凑的的空间描述符。Karaman等人[101]提出使用条件随机野conditional random field（CRF）来体现时空领域的约束。在[102]中使用了颜色和选出的图片来构建了对帧的模型，它捕捉其有特点的外表以及随时间变化的变化。Karanam等人[103]使用了一个人的多帧并提出probe特征，表现为同一个人在图集中的线性组合。一个身份的多帧也能被用来增强身体部分的校准。在[85]中寻找部分与部分的精确一致的成果，Cheng等人提出了一个迭代算法，由于身体部分检测器的提升，每次迭代后绘画结构拟合得就越精确。在[104]中，行人姿态被加入估计，有着相同姿态的帧会匹配出更高的信心分。

## 4. Future: Detection, Tracking and Person Re-ID

### 4.2 Future Issues

#### 4.2.1 System Performance Evaluation

AP/MR计算中的重要参数是IoU。如果与gt框的IoU大于阈值，被检查的限位框就被认为是正确的。通常阈值为0.5。KITTI benchmark对于车辆检测要求0.7的IoU，而对行人是0.5。需要注意使用更大的阈值比更小的性能会好。图6提供了在PRW数据集上检测准确率AP与re-ID准确率（rank-1或mAP）之间的关系。显然它们在IoU阈值为0.7时是线性关系。而0.5的IoU下比较离散。这一相关性说明应使用更大的IoU。

#### 4.2.2 The Influence of Detector/Tracker on Re-ID

Person re-ID始自行人追踪[9]，如果认为它们是一个身份，来自多个摄像头的tracklet就会关联起来。这项研究将re-ID当做追踪系统的一部分，并没有评估定位/追踪准确率对re-ID准确率的影响。但是，就算re-ID独立以来，许多研究在手绘图片限位框上进行，而这是离现实很远的理想情况。因此，在端到端re-ID系统中，理解检测/追踪对re-id的影响很关键。

## 5. Future: Person Re-ID in Very Large Galleries

Inverted index-based. 倒排索引是基于Bag-of-Words（BoW）的检索方法[22, 147, 148]实际使用的数据结构。基于局部描述符的数字化结果，倒排索引有k个条目，k指密码本大小。索引结构有k个条目，每个都关联一个倒排列表，局部描述符在那里索引。基本倒排索引结构见图9。许多研究使用一个账本posting记录图片ID以及索引的描述符的词频term frequency(TF)，许多其他元数据也能被存储，比如二元签名binary signature[148]，特征坐标[149]等。关于倒排索引在实体检索的基础知识和前沿进展，我们推荐一个近期的调查[19]。

Hashing-based. 哈希被广泛研究以用于近似最近邻搜索，目标是在图集很大或距离计算花费大时降低准确找到最近邻的成本[23]。自从光谱Spectral Hashing里程碑[150]之后，社区里就开始流行训练hash。它是训练哈希函数，$y=h(x)$将向量x映射到压缩的y，目标是在排名表中找到高排名的真值最近邻true nearest neighbor at high-ranks，同时保持搜索过程的高效。有一些经典哈希训练方法，如乘积量化product quantization(PQ)[117]，递归量化iterative quantization(ITQ)[151]等。这些方法训练都很高效，且有着不错的检索准确率。它们不需要标注数据，故非常适合re-ID任务。

## 6. Other Important Yet Under-Developed Open Issues

### 6.2 Re-ranking Re-ID Results

re-ID过程（图5.b）可被看做一个检索任务，其中re-ranking是提升检索准确率的重要步骤。它是指在能找到re-ranking知识时，为初始排序结果进行重排序。关于re-ranking方法的调查，我们推荐[164]。

## References

[1] A. Plantinga, “Things and persons,” The Review of Metaphysics, pp. 493–519, 1961.
[2] A. O. Rorty, “The transformations of persons,” Philosophy, vol. 48, no. 185, pp. 261–275, 1973.
[3] N. B. Cocchiarella, “Sortals, natural kinds and re-identification,” Logique et analyse, vol. 80, pp. 439–474, 1977.
[4] T. D’Orazio and G. Cicirelli, “People re-identification and tracking from multiple cameras: a review,” in 2012 19th IEEE International Conference on Image Processing. IEEE, 2012, pp. 1601–1604.
[5] A. Bedagkar-Gala and S. K. Shah, “A survey of approaches and trends in person re-identification,” Image and Vision Computing, vol. 32, no. 4, pp. 270–286, 2014.
[6] S. Gong, M. Cristani, S. Yan, and C. C. Loy, Person re-identification. Springer, 2014, vol. 1.
[7] R. Satta, “Appearance descriptors for person re-identification: a comprehensive review,” arXiv preprint arXiv:1307.5748, 2013.
[8] X. Wang, “Intelligent multi-camera video surveillance: A review,” Pattern recognition letters, vol. 34, no. 1, pp. 3–19, 2013.
[9] T. Huang and S. Russell, “Object identification in a bayesian context,” in IJCAI, vol. 97, 1997, pp. 1276–1282.
[10] W. Zajdel, Z. Zivkovic, and B. Krose, “Keeping track of humans: Have i seen this person before?” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation. IEEE, 2005, pp. 2081–2086.
[11] N. Gheissari, T. B. Sebastian, and R. Hartley, “Person reidentification using spatiotemporal appearance,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2. IEEE, 2006, pp. 1528–1535.
[12] L. Bazzani, M. Cristani, A. Perina, M. Farenzena, and V. Murino, “Multiple-shot person re-identification by hpe signature,” in Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE, 2010, pp. 1413–1416.
[13] M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani, “Person re-identification by symmetry-driven accumulation of local features,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 2360–2367.
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[15] D. Yi, Z. Lei, S. Liao, S. Z. Li et al., “Deep metric learning for person re-identification.” in ICPR, vol. 2014, 2014, pp. 34–39.
[16] W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152–159.
[17] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Sackinger, and R. Shah, “Signature verification using a siamese ¨ time delay neural network,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, no. 04, pp. 669–688, 1993.
[18] Y. Xu, B. Ma, R. Huang, and L. Lin, “Person search in a scene by jointly modeling people commonness and person uniqueness,” in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp. 937–940.
[19] L. Zheng, Y. Yang, and Q. Tian, “Sift meets cnn: A decade survey of instance retrieval,” arXiv preprint arXiv:1608.01807, 2016.
[20] S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2197–2206.
[21] L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, and Q. Tian, “Mars: A video benchmark for large-scale person re-identification,” in European Conference on Computer Vision, 2016.
[22] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.
[23] J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, “A survey on learning to hash,” arXiv:1606.00185, 2016.
[24] D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European conference on
computer vision. Springer, 2008, pp. 262–275.
[25] B. Prosser, W.-S. Zheng, S. Gong, T. Xiang, and Q. Mary, “Person re-identification by support vector ranking.” in BMVC, vol. 2, no. 5, 2010, p. 6.
[26] W.-S. Zheng, S. Gong, and T. Xiang, “Reidentification by relative distance comparison,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 3, pp. 653–668, 2013.
[27] A. J. Ma, P. C. Yuen, and J. Li, “Domain transfer support vector ranking for person re-identification without target camera label information,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3567–3574.
[28] A. Mignon and F. Jurie, “Pcca: A new approach for distance learning from sparse pairwise constraints,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 2666–2672.
[29] W.-S. Zheng, X. Li, T. Xiang, S. Liao, J. Lai, and S. Gong, “Partial person re-identification,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4678–4686.
[30] R. Zhao, W. Ouyang, and X. Wang, “Unsupervised salience learning for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3586–3593.
[31] Z. Li, S. Chang, F. Liang, T. S. Huang, L. Cao, and J. R. Smith, “Learning locally-adaptive decision functions for person verification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3610–3617.
[32] D. Chen, Z. Yuan, B. Chen, and N. Zheng, “Similarity learning with spatial constraints for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1268–1277.
[33] R. Zhao, W. Ouyang, and X. Wang, “Person re-identification by salience matching,” in Proceedings of the IEEE International
Conference on Computer Vision, 2013, pp. 2528–2535.
[34] ——, “Learning mid-level filters for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2014, pp. 144–151.
[35] Y. Shen, W. Lin, J. Yan, M. Xu, J. Wu, and J. Wang, “Person re-identification with correspondence structure learning,” in
Proceedings of the IEEE International Conference on Computer Vision,2015, pp. 3200–3208.
[36] A. Das, A. Chakraborty, and A. K. Roy-Chowdhury, “Consistent re-identification in a camera network,” in European Conference on Computer Vision, 2014, pp. 330–345.
[37] X. Zhou, N. Cui, Z. Li, F. Liang, and T. S. Huang, “Hierarchical gaussianization for image classification,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp.1971–1977.
[38] D. Chen, Z. Yuan, G. Hua, N. Zheng, and J. Wang, “Similarity learning on an explicit polynomial kernel feature map for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1565–1573.
[39] S. Pedagadi, J. Orwell, S. Velastin, and B. Boghossian, “Local fisher discriminant analysis for pedestrian re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3318–3325.
[40] X. Liu, M. Song, D. Tao, X. Zhou, C. Chen, and J. Bu, “Semi-supervised coupled dictionary learning for person reidentification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3550–3557.
[41] Y. Yang, J. Yang, J. Yan, S. Liao, D. Yi, and S. Z. Li, “Salient color names for person re-identification,” in European Conference on Computer Vision. Springer, 2014, pp. 536–551.
[42] L. Zhang, T. Xiang, and S. Gong, “Learning a discriminative null space for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[43] Y. Zhang, B. Li, H. Lu, A. Irie, and X. Ruan, “Sample-specific svm learning for person re-identification.”
[44] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of the IEEEInternational Conference on Computer Vision, 2015, pp. 1116–1124.
[45] J. Van De Weijer, C. Schmid, J. Verbeek, and D. Larlus, “Learning color names for real-world applications,” IEEE Transactions on Image Processing, vol. 18, no. 7, pp. 1512–1523, 2009.
[46] T. Matsukawa, T. Okabe, E. Suzuki, and Y. Sato, “Hierarchical gaussian descriptor for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1363–1372.
[47] R. Layne, T. M. Hospedales, S. Gong, and Q. Mary, “Person reidentification by attributes.” in BMVC, vol. 2, no. 3, 2012, p. 8.
[48] X. Liu, M. Song, Q. Zhao, D. Tao, C. Chen, and J. Bu, “Attributerestricted latent topic model for person re-identification,” Pattern recognition, vol. 45, no. 12, pp. 4204–4213, 2012.
[49] C. Liu, S. Gong, C. C. Loy, and X. Lin, “Person re-identification: What features are important?” in European Conference on Computer Vision Workshops. Springer, 2012, pp. 391–401.
[50] C. Su, F. Yang, S. Zhang, Q. Tian, L. S. Davis, and W. Gao, “Multitask learning with low rank attribute embedding for person reidentification,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3739–3747.
[51] Z. Shi, T. M. Hospedales, and T. Xiang, “Transferring a semantic representation for person re-identification and search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4184–4193.
[52] D. Li, Z. Zhang, X. Chen, H. Ling, and K. Huang, “A richly annotated dataset for pedestrian attribute recognition,” arXiv preprint arXiv:1603.07054, 2016.
[53] L. Yang and R. Jin, “Distance metric learning: A comprehensive survey,” Michigan State Universiy, vol. 2, p. 78, 2006.
[54] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, “Distance metric learning with application to clustering with side-information,” Advances in neural information processing systems, vol. 15, pp. 505– 512, 2003.
[55] M. Kostinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof, ¨ “Large scale metric learning from equivalence constraints,” in IEEE
Conference on Computer Vision and Pattern Recognition, 2012, pp. 2288–2295.
[56] K. Q. Weinberger, J. Blitzer, and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” in Advances in neural information processing systems, 2005, pp. 1473– 1480.
[57] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, “Informationtheoretic metric learning,” in Proceedings of the 24th international conference on Machine learning. ACM, 2007, pp. 209–216.
[58] M. Hirzer, P. M. Roth, M. Kostinger, and H. Bischof, “Relaxed ¨ pairwise learned metric for person re-identification,” in European
Conference on Computer Vision. Springer, 2012, pp. 780–793.
[59] S. Liao and S. Z. Li, “Efficient psd constrained asymmetric metric learning for person re-identification,” in Proceedings of the IEEE
International Conference on Computer Vision, 2015, pp. 3685–3693.
[60] Y. Yang, S. Liao, Z. Lei, and S. Z. Li, “Large scale similarity learning using similar pairs for person verification,” in Thirtieth
AAAI Conference on Artificial Intelligence, 2016.
[61] B. Scholkopft and K.-R. Mullert, “Fisher discriminant analysis with kernels,” Neural networks for signal processing IX, vol. 1, no. 1, p. 1, 1999.
[62] F. Xiong, M. Gou, O. Camps, and M. Sznaier, “Person reidentification using kernel-based metric learning methods,” in European Conference on Computer Vision. Springer, 2014, pp. 1–16.
[63] X. Liu, H. Wang, Y. Wu, J. Yang, and M.-H. Yang, “An ensemble color model for human re-identification,” in IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 868–875.
[64] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
[65] F. Radenovic, G. Tolias, and O. Chum, “Cnn image retrieval ´ learns from bow: Unsupervised fine-tuning with hard examples,” arXiv:1604.02426, 2016.
[66] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 815–823.
[67] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
[68] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[69] E. Ahmed, M. Jones, and T. K. Marks, “An improved deep learning architecture for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3908–3916.
[70] L. Wu, C. Shen, and A. v. d. Hengel, “Personnet: Person reidentification with deep convolutional neural networks,” arXiv preprint arXiv:1601.07255, 2016.
[71] R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang, “A siamese long short-term memory architecture for human re-identification,” in European Conference on Computer Vision, 2016.
[72] R. R. Varior, M. Haloi, and G. Wang, “Gated siamese convolutional neural network architecture for human re-identification,” in European Conference on Computer Vision, 2016.
[73] H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, “End-to-end comparative attention networks for person re-identification,” arXiv preprint arXiv:1606.04404, 2016.
[74] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person re-identification by multi-channel parts-based cnn with improved triplet loss function,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1335–1344.
[75] C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian, “Deep attributes driven multi-camera person re-identification,” in European Conference on Computer Vision, 2016.
[76] T. Xiao, H. Li, W. Ouyang, and X. Wang, “Learning deep feature representations with domain guided dropout for person reidentification,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[77] L. Zheng, H. Zhang, S. Sun, M. Chandraker, and Q. Tian, “Person re-identification in the wild,” arXiv preprint arXiv:1604.02531, 2016.
[78] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[79] L. Wu, C. Shen, and A. v. d. Hengel, “Deep linear discriminant analysis on fisher networks: A hybrid architecture for person re-identification,” arXiv preprint arXiv:1606.01595, 2016.
[80] F. Perronnin, J. Sanchez, and T. Mensink, “Improving the fisher ´ kernel for large-scale image classification,” in European Conference
on Computer Vision, 2010, pp. 143–156.
[81] S. Wu, Y.-C. Chen, X. Li, A.-C. Wu, J.-J. You, and W.-S. Zheng, “An enhanced deep feature representation for person re-identification,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2016, pp. 1–8.
[82] D. Gray, S. Brennan, and H. Tao, “Evaluating appearance models for recognition, reacquisition, and tracking,” in Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), vol. 3, no. 5. Citeseer, 2007.
[83] W.-S. Zheng, S. Gong, and T. Xiang, “Associating groups of people,” in Proceedings of the British Machine Vision Conference, 2009, pp. 23.1–23.11.
[84] C. C. Loy, T. Xiang, and S. Gong, “Multi-camera activity correlation analysis,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1988–1995.
[85] D. S. Cheng, M. Cristani, M. Stoppa, L. Bazzani, and V. Murino, “Custom pictorial structures for re-identification,” in British Machine
Vision Conference, 2011.
[86] M. Hirzer, C. Beleznai, P. M. Roth, and H. Bischof, “Person reidentification by descriptive and discriminative classification,” in
Scandinavian conference on Image analysis, 2011, pp. 91–102.
[87] N. Martinel and C. Micheloni, “Re-identify people in wide area camera network,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 31–36.
[88] W. Li, R. Zhao, and X. Wang, “Human reidentification with transferred metric learning,” in Asian Conference on Computer Vision, 2012, pp. 31–44.
[89] W. Li and X. Wang, “Locally aligned feature transforms across views,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2013, pp. 3594–3601.
[90] P. M. Roth, M. Hirzer, M. Koestinger, C. Beleznai, and H. Bischof, “Mahalanobis distance learning for person re-identification,” in Person Re-Identification, ser. Advances in Computer Vision and Pattern Recognition, S. Gong, M. Cristani, S. Yan, and C. C. Loy, Eds. London, United Kingdom: Springer, 2014, pp. 247–267.
[91] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
[92] P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature ´ pyramids for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 8, pp. 1532–1545, 2014.
[93] W. Huang, R. Hu, C. Liang, Y. Yu, Z. Wang, X. Zhong, and C. Zhang, “Camera network based person re-identification by leveraging spatial-temporal constraint and multiple cameras relations,” in International Conference on Multimedia Modeling. Springer, 2016, pp. 174–186.
[94] J. Garcia, N. Martinel, C. Micheloni, and A. Gardel, “Person re-identification ranking optimisation by discriminant context information analysis,” in ICCV, 2015.
[95] S. Paisitkriangkrai, C. Shen, and A. van den Hengel, “Learning to rank in person re-identification with metric ensembles,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1846–1855.
[96] H. Wang, S. Gong, X. Zhu, and T. Xiang, “Human-in-the-loop person re-identification,” in European Conference on Computer Vision, 2016.
[97] B. Ma, Y. Su, and F. Jurie, “Bicov: a novel image representation for person re-identification and face verification,” in British Machive Vision Conference, 2012, p. 11.
[98] ——, “Local descriptors encoded by fisher vectors for person re-identification,” in European Conference on Computer Vision. Springer, 2012, pp. 413–422.
[99] O. Hamdoun, F. Moutarde, B. Stanciulescu, and B. Steux, “Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences,” in ACM/IEEE International Conference on Distributed Smart Cameras, 2008, pp. 1–6.
[100] D. N. T. Cong, C. Achard, L. Khoudour, and L. Douadi, “Video sequences association for people re-identification across multiple
non-overlapping cameras,” in International Conference on Image Analysis and Processing. Springer, 2009, pp. 179–189.
[101] S. Karaman and A. D. Bagdanov, “Identity inference: generalizing person re-identification scenarios,” in European Conference on Computer Vision. Springer, 2012, pp. 443–452.
[102] A. Bedagkar-Gala and S. K. Shah, “Part-based spatio-temporal model for multi-person re-identification,” Pattern Recognition Letters, vol. 33, no. 14, pp. 1908–1915, 2012.
[103] S. Karanam, Y. Li, and R. J. Radke, “Sparse re-id: Block sparsity for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 33–40.
[104] Y.-J. Cho and K.-J. Yoon, “Improving person re-identification via pose-aware multi-shot matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1354–1362.
[105] T. Wang, S. Gong, X. Zhu, and S. Wang, “Person re-identification by video ranking,” in European Conference on Computer Vision, 2014, pp. 688–703.
[106] A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, 2008, pp. 275–1.
[107] J. Man and B. Bhanu, “Individual recognition using gait energy image,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 2, pp. 316–322, 2006.
[108] K. Liu, B. Ma, W. Zhang, and R. Huang, “A spatiotemporal appearance representation for viceo-based pedestrian re-identification,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3810–3818.
[109] C. Gao, J. Wang, L. Liu, J.-G. Yu, and N. Sang, “Temporally aligned pooling representation for video-based person re-identification,” in IEEE International Conference on Image Processing (ICIP), 2016, pp. 4284–4288.
[110] Z. Liu, J. Chen, and Y. Wang, “A fast adaptive spatio-temporal 3d feature for video-based person re-identification,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 4294–4298.
[111] W.-S. Zheng, S. Gong, and T. Xiang, “Transfer re-identification: From person to set-based verification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2650–2657.
[112] X. Zhu, X.-Y. Jing, F. Wu, and H. Feng, “Video-based person re-identification by simultaneously learning intra-video and intervideo distance metrics,” in IJCAI, 2016.
[113] J. You, A. Wu, X. Li, and W.-S. Zheng, “Top-push video-based person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[114] N. McLaughlin, J. Martinez del Rincon, and P. Miller, “Recurrent convolutional network for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[115] Y. Yan, B. Ni, Z. Song, C. Ma, Y. Yan, and X. Yang, “Person re-identification via recurrent feature aggregation,” in European Conference on Computer Vision, 2016.
[116] Z. Xu, Y. Yang, and A. G. Hauptmann, “A discriminative cnn video representation for event detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1798–1807.
[117] H. Jegou, M. Douze, C. Schmid, and P. P ´ erez, “Aggregating local ´ descriptors into a compact image representation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3304–3311.
[118] B. Fernando, E. Gavves, J. Oramas, A. Ghodrati, and T. Tuytelaars, “Rank pooling for action recognition,” IEEE transactions on pattern analysis and machine intelligence, 2016.
[119] P. Wang, Y. Cao, C. Shen, L. Liu, and H. T. Shen, “Temporal pyramid pooling based convolutional neural networks for action recognition,” arXiv preprint arXiv:1503.01224, 2015.
[120] L. Wu, C. Shen, and A. van den Hengel, “Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach,” arXiv:1606.01595, 2016.
[121] Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue, “Modeling spatial-temporal clues in a hybrid deep learning framework for video classification,” in Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015, pp. 461–470.
[122] A. Ess, B. Leibe, and L. Van Gool, “Depth and appearance for mobile scene analysis,” in IEEE International Conference on Computer Vision. IEEE, 2007, pp. 1–8.
[123] D. Baltieri, R. Vezzani, and R. Cucchiara, “3dpes: 3d people dataset for surveillance and forensics,” in Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding. ACM, 2011, pp. 59–64.
[124] G. Lisanti, I. Masi, A. D. Bagdanov, and A. Del Bimbo, “Person re-identification by iterative re-weighted sparse ranking,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 8, pp. 1629–1642, 2015.
[125] A. Dehghan, S. Modiri Assari, and M. Shah, “Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4091–4099.
[126] N. Martinel, A. Das, C. Micheloni, and A. K. Roy-Chowdhury, “Re-identification in the function space of feature warps,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 8, pp. 1656–1669, 2015.
[127] Y. Zhang, B. Li, H. Lu, A. Irie, and X. Ruan, “Sample-specific svm learning for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[128] T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang, “End-to-end deep learning for person search,” arXiv preprint arXiv:1604.01850, 2016.
[129] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards realtime object detection with region proposal networks,” in Advances in Neural Information Processing Systems, 2015, pp. 91–99.
[130] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
[131] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 886–893.
[132] W. Nam, P. Dollar, and J. H. Han, “Local decorrelation for ´ improved pedestrian detection,” in Advances in Neural Information Processing Systems, 2014, pp. 424–432.
[133] J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, “Multiple object tracking using k-shortest paths optimization,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 9, pp. 1806–1819, 2011.
[134] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
[135] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 4, pp. 743–761, 2012.
[136] S. Zhang, R. Benenson, M. Omran, J. Hosang, and B. Schiele, “How far are we from solving pedestrian detection?” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[137] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354–3361.
[138] J. Hosang, R. Benenson, P. Dollar, and B. Schiele, “What makes for ´ effective detection proposals?” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 4, pp. 814–830, 2016.
[139] L. Leal-Taixe, A. Milan, I. Reid, S. Roth, and K. Schindler, “Motchal- ´ lenge 2015: Towards a benchmark for multi-target tracking,” arXiv preprint arXiv:1504.01942, 2015.
[140] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, no. 1, pp. 1–10, 2008.
[141] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
[142] S.-I. Yu, Y. Yang, and A. Hauptmann, “Harry potter’s marauder’s map: Localizing and tracking multiple persons-of-interest by nonnegative discretization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3714–3720.
[143] L. Deng, M. L. Seltzer, D. Yu, A. Acero, A.-r. Mohamed, and G. E. Hinton, “Binary coding of speech spectrograms using a deep auto-encoder.” in Interspeech. Citeseer, 2010, pp. 1692–1695.
[144] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
[145] J. Wang and S. Li, “Query-driven iterated neighborhood graph search for large scale indexing,” in Proceedings of the 20th ACM international conference on Multimedia. ACM, 2012, pp. 179–188.
[146] A. Chakraborty, A. Das, and A. Roy-Chowdhury, “Network consistent data association,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014.
[147] D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2161–2168.
[148] H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in European conference on computer vision, 2008, pp. 304–317.
[149] Y. Zhang, Z. Jia, and T. Chen, “Image retrieval with geometrypreserving visual phrases,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 809–816.
[150] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Advances in neural information processing systems, 2009, pp. 1753– 1760.
[151] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approach to learning binary codes,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 817–824.
[152] X. Liu, X. Fan, C. Deng, Z. Li, H. Su, and D. Tao, “Multilinear hyperplane hashing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5119–5127.
[153] Z. Zhang, Y. Chen, and V. Saligrama, “Efficient training of very deep neural networks for supervised hashing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1487–1495.
[154] F. Zhao, Y. Huang, L. Wang, and T. Tan, “Deep semantic ranking based hashing for multi-label image retrieval,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1556–1564.
[155] V. Erin Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou, “Deep hashing for compact binary codes learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2475–2483.
[156] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009.
[157] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “Nuswide: a real-world web image database from national university of singapore,” in Proceedings of the ACM international conference on image and video retrieval, 2009, p. 48.
[158] R. Zhang, L. Lin, R. Zhang, W. Zuo, and L. Zhang, “Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 4766–4779, 2015.
[159] L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian, “Disturblabel: Regularizing cnn on the loss layer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[160] H. Wang, S. Gong, and T. Xiang, “Unsupervised learning of generative topic saliency for person re-identification,” 2014.
[161] X. Wang, W.-S. Zheng, X. Li, and J. Zhang, “Cross-scenario transfer person re-identification,” 2015.
[162] A. J. Ma, J. Li, P. C. Yuen, and P. Li, “Cross-domain person reidentification using domain adaptation ranking svms,” IEEE Transactions on Image Processing, vol. 24, no. 5, pp. 1599–1613, 2015.
[163] P. Peng, T. Xiang, Y. Wang, M. Pontil, S. Gong, T. Huang, and Y. Tian, “Unsupervised cross-dataset transfer learning for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[164] T. Mei, Y. Rui, S. Li, and Q. Tian, “Multimedia search reranking: A literature survey,” ACM Computing Surveys, vol. 46, no. 3, p. 38, 2014.
[165] C. Liu, C. Change Loy, S. Gong, and G. Wang, “Pop: Person reidentification post-rank optimisation,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 441–448.
[166] N. Martinel, A. Das, C. Micheloni, and A. K. Roy-Chowdhury, “Temporal model adaptation for person re-identification,” in European Conference on Computer Vision, 2016.
[167] L. Zheng, S. Wang, L. Tian, F. He, Z. Liu, and Q. Tian, “Queryadaptive late fusion for image search and person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1741–1750.
[168] Q. Leng, R. Hu, C. Liang, Y. Wang, and J. Chen, “Person reidentification with content and context re-ranking,” Multimedia Tools and Applications, vol. 74, no. 17, pp. 6989–7014, 2015.
[169] D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool, “Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 777–784.
[170] X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu, “Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3013–3020.
[171] R. Arandjelovic and A. Zisserman, “Three things everyone should ´ know to improve object retrieval,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2911–2918.
[172] W.-S. Zheng, S. Gong, and T. Xiang, “Towards open-world person re-identification by one-shot group-based verification,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 3, pp. 591–606, 2016.
[173] S. Liao, Z. Mo, J. Zhu, Y. Hu, and S. Z. Li, “Open-set person re-identification,” arXiv preprint arXiv:1408.0872, 2014.
[174] B. DeCann and A. Ross, “Modelling errors in a biometric reidentification system,” IET Biometrics, vol. 4, no. 4, pp. 209–219, 2015.