# SegFace
**Repository Path**: chunfengshi/SegFace
## Basic Information
- **Project Name**: SegFace
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-16
- **Last Updated**: 2025-12-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# *SegFace* : Face Segmentation of Long-Tail Classes
AAAI 2025
[Kartik Narayan](https://kartik-3004.github.io/portfolio/) [Vibashan VS](https://vibashan.github.io) [Vishal M. Patel](https://engineering.jhu.edu/faculty/vishal-patel/)
Johns Hopkins University
## Contributions
Figure 1. The qualitative comparison highlights the superior performance of our method, SegFace, compared to DML-CSR. In (a), SegFace effectively segments both long-tail classes like earrings and necklaces as well as head classes such as hair and neck. In (b), it also excels in challenging scenarios involving multiple faces, human-resembling features, poor lighting, and occlusion, where DML-CSR struggles.
The key contributions of our work are,
1️⃣ We introduce a lightweight transformer decoder with learnable class-specific tokens, that ensures each token is dedicated to a specific class, thereby enabling independent modeling of classes. The design effectively addresses the challenge of poor segmentation performance of long-tail classes, prevalent in existing methods.
2️⃣ Our multi-scale feature extraction and MLP fusion strategy, combined with a transformer decoder that leverages learnable class-specific tokens, mitigates the dominance of head classes during training and enhances the feature representation of long-tail classes.
3️⃣ SegFace establishes a new state-of-the-art performance on the LaPa dataset (93.03 mean F1 score) and the CelebAMask-HQ dataset (88.96 mean F1 score). Moreover, our model can be adapted for fast inference by simply swapping the backbone with a MobileNetV3 backbone. The mobile version achieves a mean F1 score of 87.91 on the CelebAMask-HQ dataset with 95.96 FPS.
> ** Abstract:** *Face parsing refers to the semantic segmentation of human faces into
> key facial regions such as eyes, nose, hair, etc. It serves as a prerequisite for various advanced applications,
> including face editing, face swapping, and facial makeup, which often require segmentation masks for classes
> like eye-glasses, hats, earrings, and necklaces. These infrequently occurring classes are called long-tail
> classes, which are over-shadowed by more frequently occurring classes known as head classes. Existing methods,
> primarily CNN-based, tend to be dominated by head classes during training, resulting in suboptimal representation
> for long-tail classes. Previous works have largely overlooked the problem of poor segmentation performance of
> long-tail classes. To address this issue, we propose SegFace, a simple and efficient approach that uses a
> lightweight transformer-based model which utilizes learnable class-specific tokens. The transformer decoder
> leverages class-specific tokens, allowing each token to focus on its corresponding class, thereby enabling
> independent modeling of each class. The proposed approach improves the performance of long-tail classes, thereby
> boosting overall performance. To the best of our knowledge, SegFace is the first work to employ transformer models
> for face parsing. Moreover, our approach can be adapted for low-compute edge devices, achieving 95.96 FPS. We
> conduct extensive experiments demonstrating that SegFace significantly outperforms previous state-of-the-art models,
> achieving a mean F1 score of 88.96 (+2.82) on the CelebAMask-HQ dataset and 93.03 (+0.65) on the LaPa dataset.*
# Framework
Figure 2. The proposed architecture, SegFace, addresses face segmentation by enhancing the performance on long-tail classes through a transformer-based approach. Specifically, multi-scale features are first extracted from an image encoder and then fused using an MLP fusion module to form face tokens. These tokens, along with class-specific tokens, undergo self-attention, face-to-token, and token-to-face cross-attention operations, refining both class and face tokens to enhance class-specific features. Finally, the upscaled face tokens and learned class tokens are combined to produce segmentation maps for each facial region.
# :rocket: News
- [12/11/2024] 🔥 We release *SegFace*.
# Installation
```bash
conda env create --file environment.yml
conda activate segface
# Create a .env file inside the main directory (SegFace) and setup LOG_PATH, DATA_PATH and ROOT_PATH in the .env file.
# Provided below is an example which we used as per our directory structure.
# DATA_PATH: Path to your dataset folder.
# ROOT_PATH: Path to your code directory.
# LOG_PATH: Path where the model checkpoints are stored and the training is logged.
touch .env
echo 'ROOT_PATH=/data/knaraya4/SegFace' >> .env
echo 'DATA_PATH=/data/knaraya4/data/SegFace' >> .env
echo 'LOG_PATH=/mnt/store/knaraya4/SegFace' >> .env
```
# Download Data
The datasets can be downloaded from their respective webpages or by mailing the authors:
1. [CelebAMask-HQ](https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebAMask_HQ.html)
2. [LaPa](https://github.com/jd-opensource/lapa-dataset)
3. [Helen](https://github.com/JPlin/Relabeled-HELEN-Dataset)
Arrange the dataset in the following manner:
```python
[DATA_PATH]/SegFace/
├── CelebAMask-HQ/
│ ├── CelebA-HQ-img/
│ ├── CelebA-HQ-to-CelebA-mapping.txt
│ ├── CelebAMask-HQ-attribute-anno.txt
│ ├── CelebAMask-HQ-mask-anno/
│ ├── CelebAMask-HQ-pose-anno.txt
│ ├── list_eval_partition.txt
│ └── README.txt
├── helen/
│ ├── f1_score.py
│ ├── label_names.txt
│ ├── landmarks.txt
│ ├── list_68pt_rect_attr_test.txt
│ ├── list_68pt_rect_attr_train.txt
│ ├── list_annos_trn.txt
│ ├── list_annos_tst.txt
│ ├── README.md
│ ├── test/
│ ├── test_resize/
│ └── train/
└── LaPa/
├── test/
├── train/
└── val/
```
| Arch | Resolution | Dataset | Link | Mean F1 |
|------|------------|-----------------|---------------------------------------------------------------------------------|---------|
| ConvNext | 512 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/convnext_celeba_512) | 89.22 |
| EfficientNet | 512 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/efficientnet_celeba_512) | 88.94 |
| MobileNet | 512 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/mobilenet_celeba_512) | 87.91 |
| ResNet100 | 512 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/resnet_celeba_512) | 87.50 |
| Swin_Base | 224 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_celeba_224) | 87.47 |
| Swin_Base | 256 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_celeba_256) | 87.66 |
| Swin_Base | 448 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_celeba_448) | 88.77 |
| Swin_Base | 512 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_celeba_512) | 88.96 |
| Swinv2_Base | 512 | CelebAMask-HQ | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinv2b_celeba_512) | 88.73 |
| | | | | |
| Swin_Base | 224 | LaPa | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_lapa_224) | 92.50 |
| Swin_Base | 256 | LaPa | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_lapa_256) | 92.61 |
| Swin_Base | 448 | LaPa | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_lapa_448) | 93.03 |
| Swin_Base | 512 | LaPa | [HuggingFace](https://huggingface.co/kartiknarayan/SegFace/tree/main/swinb_lapa_512) | 93.03 |
# Download Model weights
The pre-traind model can be downloaded manually from [HuggingFace](https://huggingface.co/kartiknarayan/SegFace) or using python:
```python
from huggingface_hub import hf_hub_download
# The filename "convnext_celeba_512" indicates that the model has a convnext bakcbone and trained
# on celeba dataset at 512 resolution.
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="convnext_celeba_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="efficientnet_celeba_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="mobilenet_celeba_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="resnet_celeba_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_celeba_224/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_celeba_256/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_celeba_448/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_celeba_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_lapa_224/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_lapa_256/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_lapa_448/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_lapa_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinv2b_celeba_512/model_299.pt", local_dir="./weights")
hf_hub_download(repo_id="kartiknarayan/SegFace", filename="swinb_helen_512/model_299.pt", local_dir="./weights")
```
# Usage
Download the trained weights from [HuggingFace](https://huggingface.co/kartiknarayan/SegFace) and ensure the data is downloaded with appropriate directory structure.
### Training
```python
NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=29440 /data/knaraya4/SegFace/train.py \
--ckpt_path ckpts \
--expt_name swin_base_celeba_512 \
--dataset celebamask_hq \
--backbone segface_celeb \
--model swin_base \
--lr 1e-4 \
--lr_schedule 80,200 \
--input_resolution 512 \
--train_bs 4 \
--val_bs 1 \
--test_bs 1 \
--num_workers 4 \
--epochs 300
### You can change the model backbone by changing --model
# --model swin_base, swinv2_base, swinv2_small, swinv2_tiny
# --model convnext_base, convnext_small, convnext_tiny
# --model mobilenet
# --model efficientnet
### You can change the dataset on which the model is trained on by changing --dataset and --backbone
# CelebAMaskHQ: --model segface_celeb --dataset celebamask_hq
# LaPa: --model segface_lapa --dataset lapa
# Helen: --model segface_helen --dataset helen
```
The trained models are stored at [LOG_PATH]//.
NOTE: The training scripts are provided at [SegFace/scripts](scripts).
### Inference
```python
NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0 python /data/knaraya4/SegFace/test.py \
--ckpt_path ckpts \
--expt_name \
--dataset \
--backbone \
--model \
--input_resolution 512 \
--test_bs 1 \
--model_path [LOG_PATH]///model_299.pt
# --dataset celebamask_hq
# --dataset lapa
# --dataset helen
# --backbone segface_celeb
# --backbone segface_lapa
# --backbone segface_helen
# --model swin_base, swinv2_base, swinv2_small, swinv2_tiny
# --model convnext_base, convnext_small, convnext_tiny
# --model mobilenet
# --model efficientnet
```
NOTE: The inference script is provided at [SegFace/scripts](scripts).
## Citation
If you find *SegFace* useful for your research, please consider citing us:
```bibtex
@inproceedings{narayan2025segface,
title={Segface: Face segmentation of long-tail classes},
author={Narayan, Kartik and Vs, Vibashan and Patel, Vishal M},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={6},
pages={6182--6190},
year={2025}
}
```
## Contact
If you have any questions, please create an issue on this repository or contact at knaraya4@jhu.edu