Cross-Age LFW (CALFW) Database

Motivation

Attention! We updated the positive/negative lists and baselines for CALFW in September 19th,2018. Please use the new list to do experiments.

Welcome to Cross-Age LFW (CALFW) database, a renovation of Labeled Faces in the Wild (LFW), the de facto standard testbed for unconstraint face verification.

Labeled Faces in the Wild (LFW) database has been widely utilized as the benchmark of unconstrained face verification and due to big data driven machine learning methods, the performance on the database approaches nearly 100%. However, we argue that this accuracy may be too optimistic. Besides different poses, illuminations, occlusions and expressions, cross-age face is another challenge in face recognition yet LFW does not pay much attention on it. Thereby we construct a Cross-Age LFW (CALFW) which deliberately searches and selects 3,000 positive face pairs with age gaps to add aging process intra-class variance. Negative pairs with same gender and race are also selected to reduce the influence of attribute difference between positive/negative pairs. We evaluate several metric learning and deep learning methods on the new database. Compared to the accuracy on LFW, the accuracy drops about 10\%-17\% on CALFW. There are three motivations behind the construction of CALFW benchmark as follows:

1.Establishing a relatively more difficult database to evaluate the performance of real world face verification so the effectiveness of several face verification methods can be fully justified.

2.CALFW emphasizes age gap of positive pairs to further enlarge intra-class variance and still considers other intra-class variations. Also, negative pairs are deliberately selected to avoid different gender or race. CALFW considers both the large intra-class variance and the tiny inter-class variance simultaneously.

3.Maintaining the data size, the face verification protocol which provides a 'same/different' benchmark and the same identities in LFW, so one can easily apply CALFW to evaluate the performance of face verification.

Comparison with LFW

Age gap comparison

Compared to the positive pairs in LFW, the age gaps of positive pairs in CALFW is larger. This shows we successfully add aging process to intra-class variations. Also, in LFW, age gaps of most positive pairs are less than 10 years while that of most negative pairs are larger than 10 years, in CALFW, there is no clear boundary to distinguish the two kinds of pairs, so age gap can not be a big influence on face verification in CALFW.

Positive pairs comparison

CALFW is collected by crowdsourcing efforts to seek the pictures of people in LFW with age gap as large as possible on the Internet. Compared to LFW, the positive pairs in CALFW contain obvious age difference.

Compared to LFW, the negative pairs in CALFW have same gender and race, which reduces the influence of attribute difference between positive pairs and negative pairs in face verification.

We dedicate to maintain the protocols, dataset size, and the identities in each fold of LFW database in order to encourage fair and meaningful comparisons. You can find more information about standard LFW protocol in Labeled Faces in the Wild (LFW).

We expect CALFW could promote algorithms to make reliable verification judgement, and close the large gap between the reported performance on benchmarks and performance on real world tasks.

Baseline Results

We select three SOTA deep face recognition methods that have achieved top performance on major benchmark databases: LFW, IJB-A and MegaFace..

COMPARISON OF VERIFICATION ACCURACY (%) ON LFW AND CALFW USING FOUR SOTA DEEP FACE RECOGNITION MODELS.

Method	LFW	CALFW
Centerface¹	98.75%	85.48%
SphereFace²	99.27%	90.30%
VGGFace2³	99.43%	90.57%
ArcFace⁴	99.82%	95.87%
HUMAN-Individual	97.27%	82.32%
HUMAN-Fusion	99.85%	86.50%

COMPARISON OF 10-FOLD VALIDATION ERROR (%) OF FOUR SOTA DEEP FACE RECOGNITION MODELS. THE INCREASE OF ERROR IS ALSO ENUMERATED WHEN TRANSFERRING FROM LFW TO CALFW.

Method	LFW	CALFW
Centerface¹	1.17	14.52 ( ↑ 1241%)
SphereFace²	0.65	9.70 ( ↑ 1492%)
VGGFace2³	0.49	9.43 ( ↑ 1924%)
ArcFace⁴	0.10	4.55 ( ↑ 4550%)

A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision, Springer, 2016, pp. 499–515.

Deep hyperspherical learning. In NIPS, 2017, pp. 3953–3963.

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. arXiv preprint arXiv：1710.08092, 2017.

Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv：1801.05599, 2018.

Cross-Age LFW (CALFW) Database

Menu

Motivation

Comparison with LFW

Age gap comparison

Positive pairs comparison

Baseline Results

Reference

Download the database

Contact