cleanup

2025-12-30 08:02:29 +00:00 · 2021-03-27 22:14:16 -07:00
parent 15294c304e
commit ab7315cca1
4 changed files with 3 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-<img src="./vit.gif" width="500px"></img>
+<img src="./images/vit.gif" width="500px"></img>

 ## Vision Transformer - Pytorch

@@ -63,7 +63,7 @@ Embedding dropout rate.

 ## Distillation

-<img src="./distill.png" width="300px"></img>
+<img src="./images/distill.png" width="300px"></img>

 A recent <a href="https://arxiv.org/abs/2012.12877">paper</a> has shown that use of a distillation token for distilling knowledge from convolutional nets to vision transformer can yield small and efficient vision transformers. This repository offers the means to do distillation easily.

@@ -145,7 +145,7 @@ preds = v(img) # (1, 1000)

 ## Token-to-Token ViT

-<img src="./t2t.png" width="400px"></img>
+<img src="./images/t2t.png" width="400px"></img>

 <a href="https://arxiv.org/abs/2101.11986">This paper</a> proposes that the first couple layers should downsample the image sequence by unfolding, leading to overlapping image data in each token as shown in the figure above. You can use this variant of the `ViT` as follows.

--- a/images/distill.png
+++ b/images/distill.png
--- a/images/t2t.png
+++ b/images/t2t.png
--- a/images/vit.gif
+++ b/images/vit.gif