一篇将行人属性识别结合进Re-ID的论文。未来有一个项目需要识别属性,所以来学习了一下。4个小时完成,对速度感到满意。
Improving Person Re-identification by Attribute and Identity Learning
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Yi Yang
University of Technology Sydney, 2017
1. Introduction
本论文的目标是使用属性标签作为补充线索,提升person Re-ID的性能。
本论文的主要出发点是person Re-ID依靠全局描述符,而属性识别通常指示一个人的局部结构。猜测属性的正确预测能有助于Re-ID的辨识力。如图1第四行,Re-ID没能分辨出着装相似的人,而属性或许能通过性别、帽子和包来区别出。
本论文与之前的Re-ID和属性论文有两个区别。大部分方法使用属性来加强图片对或triplet的关系$^{[33, 34, 16, 21]}$。本论文主要讨论ID级属性,而不是Instance级属性。ID级属性与人有关,如性别年龄。Instance属性是短期外观或与外部环境有关的,如打电话和骑自行车。
我们是首个将属性集成进CNN分类模型,用于Re-ID的。我们提出了attribute-person recognition(APR)网络,将两个任务在loss级别结合了起来。两个基线都是CNN分类架构。APR结合了person Re-ID loss 和属性预测loss(图2)。实验发现Re-ID准确率达到前沿,同时提升了属性识别性能。
2. Related Work
CNN-based person Re-ID. 基于CNN的方法统治了Re-ID社区,可分作两类:深度度量学习和深度representative学习。前一类通常将图片pair或triplet输入到网络。Representative方法包括[44, 23]。通常将空间约束集成到相似度学习过程中$^{[1, 23, 44, 5]}$。例如在[38]中每个卷积层插入门函数,以捕获两个输入图片的细微差别。[5]中提出了为一个triplet输入通过实行一个ranking loss 和一个verification loss的多任务方法。总的来讲,深度度量学习在小数据集训练有优势,但在大的图集上效率有问题。
第二类因其优秀的准确率和不错的效率而变得流行,包括[41, 49, 42, 9, 53]。[41]提出通过训练一个来自多领域的分类模型,习得泛化特征。[53, 9]中verification 和 Classification loss的结合被证明是有效的。
Attributes for person Re-ID. Re-ID领域已有几篇对属性的探索。属性主要用作Re-ID的额外信息。[21, 20, 19]中使用低级描述符和SVM来训练属性检测器,属性被集成到多个度量训练方法中。[33]通过多任务学习训练了一个有辨识力的模型,开发了多摄像头共享的特征和属性。[16]提出同时优化Re-ID的triplet loss和属性分类loss,但没显示出对属性识别的提升。
Attributes for face applications. 用于人脸识别的属性已研究了很久。早期,[29]提出使用Haar特征来通过SVM预测性别。[18]比较了用于年龄预测的多种分类器。近期提出了许多深度学习方法。[48]把人脸属性识别当做额外任务,提升使用CNN的人脸对齐性能。[27]层叠两个CNN并同时调优,通过属性标签预测人脸属性。
3. Attribute Annotation
出于两个原因,我们重新手动标注了Market-1501和DukeMTMC-reID。首先,当前最大的行人属性集RAP,不包含ID标注。其次,PETA是相对小的Re-ID数据集组合的,每个ID的样本很有限,不利于深度学习。
尽管Market-1501和DukeMTMC-reID都采集自大学,ID主要是学生。但它们季节不同(夏vs冬)因此衣着差别很大。例如Market-1501中主要穿裙子和短裤。因此这两个数据集我们有两个不同属性集。属性考虑到数据集特性小心选取,避免属性分布(如是否戴帽子)有严重偏差。
Market-1501我们标注了27个属性:性别,头发长短,袖子长短,下装长短,下装类型(裤子,裙子),是否戴帽子,是否有包,是否背包,是否手包,8色上装(黑白红紫黄灰蓝绿),9色下装(黑白粉紫黄灰蓝绿棕),年龄(小孩,青年,成年,老年)。颜色属性都是二分的。图3是Market-1501的一些代表性正负样本。
DukeMTMC-reID我们标注了23个属性。上下装颜色少一点,没有年龄,多了鞋子颜色。图4中有一些有代表属性的相关性。图5是属性的分布。
所有属性都在身份级标注。例如图3中,第二行前两个图片是一个身份,所以虽然第二张图看不见背包,其标签仍是背包。
4. Proposed Method
4.1 Baseline Methods
我们构建了两个用于Re-ID和行人属性识别的baseline。baseline是ImageNet上预训练的ResNet-50。我们分别使用新标注的属性和近期得到的身份标签调优。
Baseline 1(person Re-ID).我们将base模型的最后一个FC层神经元数量设为K,K是训练身份数量。为了避免过拟合,在FC前插入了一个0.9的dropout层。测试时,为每张图片从pool5层提取一个2048维特征向量,在查询和图集间计算欧氏距离,再排序。结果见表1。
Baseline 2(pedestrian attribute recognition & Re-ID). 我们使用M个后接softmax层的FC层来识别属性,M是属性的数量。在CaffeNet中,M个FC层替换了FC8;ResNet-50中替换了FC。有m类的属性,FC层就是m维。我们也像Baseline 1一样插入了dropout层。结果见表3。
4.2 Attribute-Person Recognition (APR) Network
Architecture. APR网络包括一个base模型,loss计算前的M+1个FC层,用于属性分类的M个loss,M是属性数量。新的FC层记做$FC_0, FC_1,…,FC_M$。其中$FC_0$用于ID分类,其余用于属性识别。它们的维度与Baseline 1,2中一致。网络会为输入图片同时预测其ID和属性集。预训练模型可以是ResNet-50和CaffeNet。
ResNet-50的FC层连接到Pool5,CaffeNet中FC连接到FC7。它们分别使用224和227见方的输入。
Loss computation. 假设我们有K个身份的n张图片,每个身份有M个属性。设$D_i = \{ x_i, d_i, l_i \}$为训练集,$x_i$指第i张图片,$d_i$指它的ID,$l_i =\{ l_i^1,…,l_j^M\}$指第i张图的M个属性标签。
以ResNet-50为例,给定一个训练样本x,我们模型首先计算pool5描述符f。其输出向量为$1\times1\times2048$。$FC_0$的输出为$z = [z_1,z_2,…,z_K] \in R^K$。故每个身份标签$k \in 1,…,K$的概率如此计算:$p(k|x) = \frac {\exp z_k} {\sum _{i=1} ^K \exp (z_i)}$。让我们暂时忽略k和x之间的相关性,故ID分类的CE loss计算如下:
设y为gt ID标签,故$q(y) = 1$且对所有$k \neq y$,$q(k) = 0$。这一情况下最小化交叉熵loss等价于最大化赋予gt类的概率。
属性预测我们也使用 M softmax loss。我们设一个属性有m类,为样本x赋予属性类$j \in 1, … , m$的概率可写作$p(j|x) = \frac {\exp z_j} {\sum _{i=1} ^K \exp (z_j)}$。类似的,分类loss可以写作:
设$y_m$为gt标签,故$q(y_m) = 1$,而所有$j \neq y_m$的$q(j) = 0$,其它符号如等式1。
最终loss函数如下:
图6是CNN中的特征图可视化,说明了属性集成后对网络的可接受性的增强。
5. Experiment
5.1 Datasets and Evaluation Protocol
Evaluation metrics. 对Re-ID任务我们使用Cumulative Matching Characteristic(CMC)曲线和mAP。对每个查询,其average precision(AP)计算自其precision-recall曲线。mAP是所有查询的AP平均。推测是CMC反映检索准确率,mAP反映召回率。我们使用了[51, 54]的评估包。
5.2 Implementation Details
5.3 Evaluation of Person Re-ID
Parameter validation. 我们验证了参数$\lambda$的效果,见图7。mAP和rank-1都是先增后减,最终我们使用了$\lambda = 8$
Attribute recognition improves re-ID over the baselines. 我们评估了APR是否超越了两个baseline,结果见表1和表2。
首先,如我们预期的B1获得了好性能。但B2也有不错的准确率。B2仅利用了没有ID loss 的属性标签。这说明属性有能力区分不同的人。
图8是一些Market-1501的Re-ID结果。Baseline 1的top-8中没有任何正确匹配。其中有背包和男性的都匹配了出来。当使用APR时,所有6个真匹配都找到了。本例中包和女性是关键属性。
Results between camera pairs. 为了进一步理解在Market-1501上的性能,我们提供了所有摄像头对的Re-ID结果,见图10。尽管摄像头6是一个$720\times 576$SD摄像头,且与其它HD摄像头拍摄到的背景明显不同,Cam-6与其它的cam的Re-ID准确率相对高。
Scalability of the learned representation. 为了测试我们方法的可扩展性,我们报告了在Market-1501+500k上的结果。500k干扰集是由大量背景和无关行人组成。我们的模型(ResNet,751训练身份)的Re-ID准确率见图11。
Ablation studies. 我们评估了独立属性对Re-ID准确率的贡献。我们在固定$\lambda = 8$时移除一个属性,结果见图9。两个数据集影响最大的属性分别是包类和鞋子颜色。这说明两数据集的行人外观不一样。
5.4 Evaluation of Attribute Recognition
我们测试了属性识别,结果见表3、4。
我们在图12中展示了两个属性预测的例子。我们的系统为左边的人正确预测了所有标签。右边的人,在长发和是否戴帽子上出错。
6. Conclusions
[1] E. Ahmed, M. Jones, and T. K. Marks. An improved deeplearning architecture for person re-identification. In CVPR,2015. 2
[2] I. B. Barbosa, M. Cristani, B. Caputo, A. Rognhaugen,and T. Theoharis. Looking beyond appearances:Synthetic training data for deep cnns in re-identification.arXiv:1701.03153, 2017. 2
[3] L. Bourdev, S. Maji, and J. Malik. Describing people: Aposelet-based approach to attribute classification. In ICCV,2011.
[4] D. Chen, Z. Yuan, B. Chen, and N. Zheng. Similarity learningwith spatial constraints for person re-identification. InCVPR, 2016. 6
[5] W. Chen, X. Chen, J. Zhang, and K. Huang. A multi-taskdeep network for person re-identification. In AAAI, 2017. 2
[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei.Imagenet: A large-scale hierarchical image database. InCVPR, 2009. 4
[7] Y. Deng, P. Luo, C. C. Loy, and X. Tang. Pedestrian attributerecognition at far distance. In ACM MM, 2014. 3
[8] K. Duan, D. Parikh, D. Crandall, and K. Grauman. Discoveringlocalized attributes for fine-grained recognition. InCVPR, 2012. 9
[9] M. Geng, Y. Wang, T. Xiang, and Y. Tian. Deep transferlearning for person re-identification. arXiv:1611.05244,2016. 2, 6, 7
[10] G. Gkioxari, R. Girshick, and J. Malik. Actions and attributesfrom wholes and parts. In Proceedings of the IEEEInternational Conference on Computer Vision, pages 2470–2478, 2015.
[11] G. Gkioxari, R. Girshick, and J. Malik. Contextual actionrecognition with r* cnn. In Proceedings of the IEEE internationalconference on computer vision, pages 1080–1088,2015.
[12] D. Gray and H. Tao. Viewpoint invariant pedestrian recognitionwith an ensemble of localized features. In ECCV, 2008.3
[13] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor image recognition. In CVPR, 2016. 4
[14] M. Hirzer, C. Beleznai, P. M. Roth, and H. Bischof. Personre-identification by descriptive and discriminative classification.In Scandinavian conference on Image analysis, pages91–102. Springer, 2011.
[15] C. Jose and F. Fleuret. Scalable metric learning via weightedapproximate rank component analysis. arXiv:1603.00370,2016. 6
[16] S. Khamis, C.-H. Kuo, V. K. Singh, V. D. Shet, and L. S.Davis. Joint learning for attribute-consistent person reidentification.In ECCV, 2014. 1, 3
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenetclassification with deep convolutional neural networks. InNIPS, 2012. 4
[18] A. Lanitis, C. Draganova, and C. Christodoulou. Comparingdifferent classifiers for automatic age estimation. IEEETransactions on Systems, Man, and Cybernetics, Part B (Cybernetics),34(1):621–628, 2004. 3
[19] R. Layne, T. M. Hospedales, and S. Gong. Attributes-basedre-identification. In Person Re-Identification, pages 93–117.Springer, 2014. 3
[20] R. Layne, T. M. Hospedales, and S. Gong. Re-id: Huntingattributes in the wild. In BMVC, 2014. 3
[21] R. Layne, T. M. Hospedales, S. Gong, and Q. Mary. Personre-identification by attributes. In BMVC, 2012. 1, 3
[22] D. Li, Z. Zhang, X. Chen, H. Ling, and K. Huang. Arichly annotated dataset for pedestrian attribute recognition.arXiv:1603.07054, 2016. 3
[23] W. Li, R. Zhao, T. Xiao, and X. Wang. Deepreid: Deep filterpairing neural network for person re-identification. In CVPR,2014. 2
[24] S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identificationby local maximal occurrence representation and metriclearning. In CVPR, 2015. 6
[25] Y. Lin, L. Zheng, Z. Zheng, Y. Wu, and Y. Yang. Improvingperson re-identification by attribute and identity learning.arXiv preprint arXiv:1703.07220, 2017.
[26] C. Liu, S. Gong, C. C. Loy, and X. Lin. Person reidentification:What features are important? In ECCV, pages391–401. Springer, 2012.
[27] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning faceattributes in the wild. In ICCV, 2015. 3
[28] C. C. Loy, C. Liu, and S. Gong. Person re-identification bymanifold ranking. In ICIP, 2013. 3
[29] B. Moghaddam and M.-H. Yang. Learning gender with supportfaces. TPAMI, 24(5):707–711, 2002. 3
[30] D. Parikh and K. Grauman. Relative attributes. In ICCV,2011. 9
[31] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi.Performance measures and a data set for multi-target, multicameratracking. In ECCV, 2016. 5
[32] L. Shuang, X. Tong, L. Hongsheng, Z. Bolei, Y. Dayu, andW. Xiaogang. Person search with natural language description.arXiv:1702.05729, 2017. 3
[33] C. Su, F. Yang, S. Zhang, Q. Tian, L. S. Davis, and W. Gao.Multi-task learning with low rank attribute embedding forperson re-identification. In ICCV, 2015. 1, 3
[34] C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian.Deep attributes driven multi-camera person re-identification.arXiv:1605.03259, 2016. 1, 6
[35] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning facerepresentation by joint identification-verification. In NIPS,2014. 2
[36] E. S. Tetsu Matsukawa. Person re-identification using cnnfeatures learned from combination of attributes. ICPR, 2016.3
[37] E. Ustinova, Y. Ganin, and V. Lempitsky. Multiregionbilinear convolutional neural networks for person reidentification.arXiv:1512.05300, 2015. 6
[38] R. R. Varior, M. Haloi, and G. Wang. Gated siameseconvolutional neural network architecture for human reidentification.In ECCV, 2016. 2, 6
[39] R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang. Asiamese long short-term memory architecture for human reidentification.In ECCV, 2016. 6
[40] L. Wu, C. Shen, and A. v. d. Hengel. Deep linear discriminantanalysis on fisher networks: A hybrid architecture forperson re-identification. arXiv:1606.01595, 2016. 6
[41] T. Xiao, H. Li, W. Ouyang, and X. Wang. Learning deep featurerepresentations with domain guided dropout for personre-identification. arXiv:1604.07528, 2016. 2
[42] T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang. End-to-enddeep learning for person search. arXiv:1604.01850, 2016. 2
[43] S. Yang, P. Luo, C.-C. Loy, and X. Tang. From facial partsresponses to face detection: A deep learning approach. InICCV, 2015. 3
[44] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Deep metric learning forperson re-identification. In ICPR, 2014. 2
[45] L. Zhang, T. Xiang, and S. Gong. Learning a discriminativenull space for person re-identification. arXiv:1603.02139,2016. 6
[46] N. Zhang, R. Farrell, F. Iandola, and T. Darrell. Deformablepart descriptors for fine-grained recognition and attributeprediction. In ICCV, 2013.
[47] N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev.Panda: Pose aligned networks for deep attribute modeling. InCVPR, 2014.
[48] Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmarkdetection by deep multi-task learning. In ECCV, 2014. 3
[49] L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, andQ. Tian. Mars: A video benchmark for large-scale personre-identification. In ECCV, 2016. 2
[50] L. Zheng, Y. Huang, H. Lu, and Y. Yang. Pose invariant embeddingfor deep person re-identification. arXiv:1701.07732,2017. 6
[51] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian.Scalable person re-identification: A benchmark. In ICCV,2015. 1, 2, 3, 5, 6, 7
[52] L. Zheng, Y. Yang, and A. G. Hauptmann. Person reidentification:Past, present and future. arXiv:1610.02984,2016. 1, 2, 4, 6
[53] Z. Zheng, L. Zheng, and Y. Yang. A discriminativelylearned cnn embedding for person re-identification.arXiv:1611.05666, 2016. 2, 6, 7, 8
[54] Z. Zheng, L. Zheng, Y. Zheng, and Yang. Unlabeled samplesgenerated by gan improve the person re-identificationbaseline in vitro. arXiv:1701.07717v3, 2017. 1, 2, 3, 5, 6