additional diagram

2025-12-30 08:02:29 +00:00 · 2021-11-22 14:05:39 -08:00
parent 6665fc6cd1
commit de0b8ba189
2 changed files with 2 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -497,6 +497,8 @@ pred = model(img) # (1, 1000)

 <img src="./images/crossformer.png" width="400px"></img>

+<img src="./images/crossformer2.png" width="400px"></img>
+
 This <a href="https://arxiv.org/abs/2108.00154">paper</a> beats PVT and Swin using alternating local and global attention. The global attention is done across the windowing dimension for reduced complexity, much like the scheme used for axial attention.

 They also have cross-scale embedding layer, which they shown to be a generic layer that can improve all vision transformers. Dynamic relative positional bias was also formulated to allow the net to generalize to images of greater resolution.
--- a/images/crossformer2.png
+++ b/images/crossformer2.png