From ab7315cca1e1ecdb398f6a1ae794421717c378b1 Mon Sep 17 00:00:00 2001
From: Phil Wang <lucidrains@gmail.com>
Date: Sat, 27 Mar 2021 22:14:16 -0700
Subject: [PATCH] cleanup

---
 README.md                         |   6 +++---
 distill.png => images/distill.png | Bin
 t2t.png => images/t2t.png         | Bin
 vit.gif => images/vit.gif         | Bin
 4 files changed, 3 insertions(+), 3 deletions(-)
 rename distill.png => images/distill.png (100%)
 rename t2t.png => images/t2t.png (100%)
 rename vit.gif => images/vit.gif (100%)
diff --git a/README.md b/README.md
index 953fac6..115f2b7 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-<img src="./vit.gif" width="500px"></img>
+<img src="./images/vit.gif" width="500px"></img>
 
 ## Vision Transformer - Pytorch
 
@@ -63,7 +63,7 @@ Embedding dropout rate.
 
 ## Distillation
 
-<img src="./distill.png" width="300px"></img>
+<img src="./images/distill.png" width="300px"></img>
 
 A recent <a href="https://arxiv.org/abs/2012.12877">paper</a> has shown that use of a distillation token for distilling knowledge from convolutional nets to vision transformer can yield small and efficient vision transformers. This repository offers the means to do distillation easily.
 
@@ -145,7 +145,7 @@ preds = v(img) # (1, 1000)
 
 ## Token-to-Token ViT
 
-<img src="./t2t.png" width="400px"></img>
+<img src="./images/t2t.png" width="400px"></img>
 
 <a href="https://arxiv.org/abs/2101.11986">This paper</a> proposes that the first couple layers should downsample the image sequence by unfolding, leading to overlapping image data in each token as shown in the figure above. You can use this variant of the `ViT` as follows.
 
diff --git a/distill.png b/images/distill.png
similarity index 100%
rename from distill.png
rename to images/distill.png
diff --git a/t2t.png b/images/t2t.png
similarity index 100%
rename from t2t.png
rename to images/t2t.png
diff --git a/vit.gif b/images/vit.gif
similarity index 100%
rename from vit.gif
rename to images/vit.gif