# seq2seq

**Repository Path**: mirrors_keon/seq2seq

## Basic Information

- **Project Name**: seq2seq
- **Description**: Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-01-10
- **Last Updated**: 2026-06-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# mini seq2seq
Minimal Seq2Seq model with attention for neural machine translation in PyTorch.

This implementation focuses on the following features:

- Modular structure to be used in other projects
- Minimal code for readability
- Full utilization of batches and GPU.

Dataset (Multi30k DE→EN) is loaded via HuggingFace [`datasets`](https://github.com/huggingface/datasets); tokenization uses [spaCy](https://spacy.io/).

## Model description

* Encoder: Bidirectional GRU
* Decoder: GRU with Attention Mechanism
* Attention: [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473)

![](http://www.wildml.com/wp-content/uploads/2015/12/Screen-Shot-2015-12-30-at-1.16.08-PM.png)

## Requirements

* Python 3.9+
* PyTorch >= 2.0 (CPU, CUDA, or Apple MPS)
* `datasets` (HuggingFace, replaces torchtext)
* Spacy >= 3.7

```
pip install -r requirements.txt
python -m spacy download de_core_news_sm
python -m spacy download en_core_web_sm
```

## Train

```
python train.py -epochs 30 -batch_size 32 -lr 3e-4
```

Device is auto-detected (CUDA → MPS → CPU). Smaller `-hidden_size` / `-embed_size` flags are useful for CPU smoke runs.

Sanity check (CPU, 500 batches, hidden=128/embed=64):

| step | train loss | perplexity |
|------|-----------:|-----------:|
| init |       9.19 |      9803 |
|   50 |       6.98 |      1071 |
|  100 |       5.48 |       239 |
|  250 |       5.15 |       173 |
|  500 |       4.84 |       127 |

Final val loss: **4.93** (random-init prior is `log(|V|) ≈ 9.19`).

## References

Based on the following implementations

* [PyTorch Tutorial](http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
* [@spro/practical-pytorch](https://github.com/spro/practical-pytorch)
* [@AuCson/PyTorch-Batch-Attention-Seq2seq](https://github.com/AuCson/PyTorch-Batch-Attention-Seq2seq)