# DeepLearning-vs-HighLevelVision **Repository Path**: cjy777/DeepLearning-vs-HighLevelVision ## Basic Information - **Project Name**: DeepLearning-vs-HighLevelVision - **Description**: Code and database for Jacquot et al. CVPR 2020. Can we decode subtle human activities? - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-25 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README This repository contains the code and images used for the publication ["Can Deep Learning Recognize Subtle Human Activities?"](https://arxiv.org/abs/2003.13852). ## Summary Our work builds on the observation that image datasets used in machine learning contain many biases. Those biases help convolutional neural networks to classify images. For example, in the UCF101 dataset, algorithms can rely on the background color to classify human activities.


To address this issue, we followed a rigorous method to build three image datasets corresponding to three human behaviors: drinking, reading, and sitting. Below are some example images from our dataset. The models misclassified the bottom left, middle top, and bottom right pictures, whereas humans correctly classified all six pictures.


We reduced biases in our image datasets by applying 100 to 300 cross-validations of a fine-tuned deep convolutional network ([computer-vision/keras/misclassification\_rate](https://github.com/kreimanlab/DeepLearning-vs-HighLevelVision/tree/master/computer-vision/keras) and [computer-vision/matlab/alexnet_misclass_rate.m](https://github.com/kreimanlab/DeepLearning-vs-HighLevelVision/blob/master/computer-vision/matlab/alexnet_misclass_rate.m)). The many cross-validations allowed to rank images along their misclassification rate. We then excluded images that were classified too easily. Thus, we obtained datasets that were less biased, more difficult to classify by algorithms.
The ground truth labels for each image was created by asking 3 participants to assign each image to a yes or no class for each action. We also conducted a separate psychophysics experiment ([human-vision](https://github.com/kreimanlab/DeepLearning-vs-HighLevelVision/tree/master/human-vision)): images were presented to human participants; each trial consisted of fixation (500 ms), image presentation (50, 150, 400, or 800 ms), and a forced choice yes/no question until the participant answered.

Example gif with 800ms image presentation, played on repeat.

## License We release our work under the Kreiman Lab's [license](http://klab.tch.harvard.edu//resources/license_agreement.pdf). ## Citation If you find our dataset useful in your research, please consider citing: @InProceedings{Jacquot_2020_CVPR, author = {Jacquot, Vincent and Ying, Zhuofan and Kreiman, Gabriel}, title = {Can Deep Learning Recognize Subtle Human Activities?}, booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} }