mirror of
https://github.com/lucidrains/vit-pytorch.git
synced 2025-12-30 08:02:29 +00:00
29
README.md
29
README.md
@@ -1,5 +1,34 @@
|
||||
<img src="./images/vit.gif" width="500px"></img>
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Vision Transformer - Pytorch](#vision-transformer---pytorch)
|
||||
- [Install](#install)
|
||||
- [Usage](#usage)
|
||||
- [Parameters](#parameters)
|
||||
- [Distillation](#distillation)
|
||||
- [Deep ViT](#deep-vit)
|
||||
- [CaiT](#cait)
|
||||
- [Token-to-Token ViT](#token-to-token-vit)
|
||||
- [CCT](#cct)
|
||||
- [Cross ViT](#cross-vit)
|
||||
- [PiT](#pit)
|
||||
- [LeViT](#levit)
|
||||
- [CvT](#cvt)
|
||||
- [Twins SVT](#twins-svt)
|
||||
- [RegionViT](#regionvit)
|
||||
- [NesT](#nest)
|
||||
- [Masked Autoencoder](#masked-autoencoder)
|
||||
- [Masked Patch Prediction](#masked-patch-prediction)
|
||||
- [Dino](#dino)
|
||||
- [Accessing Attention](#accessing-attention)
|
||||
- [Research Ideas](#research-ideas)
|
||||
* [Efficient Attention](#efficient-attention)
|
||||
* [Combining with other Transformer improvements](#combining-with-other-transformer-improvements)
|
||||
- [FAQ](#faq)
|
||||
- [Resources](#resources)
|
||||
- [Citations](#citations)
|
||||
|
||||
## Vision Transformer - Pytorch
|
||||
|
||||
Implementation of <a href="https://openreview.net/pdf?id=YicbFdNTTy">Vision Transformer</a>, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch. Significance is further explained in <a href="https://www.youtube.com/watch?v=TrdevFK_am4">Yannic Kilcher's</a> video. There's really not much to code here, but may as well lay it out for everyone so we expedite the attention revolution.
|
||||
|
||||
Reference in New Issue
Block a user