# SSD_MobileNet

**Repository Path**: lu_lee/SSD_MobileNet

## Basic Information

- **Project Name**: SSD_MobileNet
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-06-03
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# SSD_MobileNet
SSD: Single Shot MultiBox Detector | a PyTorch Model for Object Detection | VOC , COCO | Custom Object Detection  

This repo contains code for [Single Shot Multibox Detector (SSD)](https://arxiv.org/abs/1512.02325) with custom backbone networks. The authors' original implementation can be found [here](https://github.com/weiliu89/caffe/tree/ssd).

### Dataset

* Pascal Visual Object Classes (VOC) data from the years 2007 and 2012.
* COCO.
* Custom Dataset.

## VOC dataset
VOC dataset contains images with twenty different types of objects.
```python
{'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'}
```

Each image can contain one or more ground truth objects.

Each object is represented by –

- a bounding box in absolute boundary coordinates

- a label (one of the object types mentioned above)

-  a perceived detection difficulty (either `0`, meaning _not difficult_, or `1`, meaning _difficult_)

### Download

Specfically, you will need to download the following VOC datasets –

- [2007 _trainval_](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (460MB)

- [2012 _trainval_](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2GB)

- [2007 _test_](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar) (451MB)


### Inputs to model

We will need three inputs.

#### Images

* For SSD300 variant, the images would need to be sized at `300, 300` pixels and in the RGB format.
* PyTorch follows the NCHW convention, which means the channels dimension (C) must precede the size dimensions(1, 3, 300, 300).

Therefore, **images fed to the model must be a `Float` tensor of dimensions `N, 3, 300, 300`**, and must be normalized by the aforesaid mean and standard deviation. `N` is the batch size.


#### Objects' Bounding Boxes

For each image, the bounding boxes of the ground truth objects follows (x_min, y_min, x_max, y_max) format`.

# Training
* In config.json change the paths. 
* "backbone_network" : "MobileNetV2" or "MobileNetV1"
* For training run
  ```
  python train.py config.json
  ```
# Inference 
  ```
  python inference.py image_path checkpoint
  ```