allow channels to be customizable for cvt

1.6.2
Fix typo in LayerNorm (#285 )
2026-01-10 15:30:16 +00:00 · 2023-10-25 14:47:58 -07:00 · 2023-10-24 12:47:38 -07:00 · 2023-10-24 12:47:21 -07:00 · 2023-10-19 18:16:46 -07:00 · 2023-10-19 09:38:35 -07:00
5 changed files with 7 additions and 13 deletions
--- a/README.md
+++ b/README.md
@@ -93,7 +93,7 @@ preds = v(img) # (1, 1000)
 - `image_size`: int.  
 Image size. If you have rectangular images, make sure your image size is the maximum of the width and height
 - `patch_size`: int.  
-Number of patches. `image_size` must be divisible by `patch_size`.  
+Size of patches. `image_size` must be divisible by `patch_size`.  
 The number of patches is: ` n = (image_size // patch_size) ** 2` and `n` **must be greater than 16**.
 - `num_classes`: int.  
 Number of classes to classify.
@@ -777,7 +777,7 @@ pred = mbvit_xs(img) # (1, 1000)

 <img src="./images/xcit.png" width="400px"></img>

-This <a href="https://arxiv.org/abs/2106.09681">paper</a> introduces the cross correlation attention (abbreviated XCA). One can think of it as doing attention across the features dimension rather than the spatial one (another perspective would be a dynamic 1x1 convolution, the kernel being attention map defined by spatial correlations).
+This <a href="https://arxiv.org/abs/2106.09681">paper</a> introduces the cross covariance attention (abbreviated XCA). One can think of it as doing attention across the features dimension rather than the spatial one (another perspective would be a dynamic 1x1 convolution, the kernel being attention map defined by spatial correlations).

 Technically, this amounts to simply transposing the query, key, values before executing cosine similarity attention with learned temperature.

--- a/setup.py
+++ b/setup.py
@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
 setup(
  name = 'vit-pytorch',
  packages = find_packages(exclude=['examples']),
-  version = '1.6.0',
+  version = '1.6.3',
  license='MIT',
  description = 'Vision Transformer (ViT) - Pytorch',
  long_description_content_type = 'text/markdown',
--- a/vit_pytorch/init.py
+++ b/vit_pytorch/init.py
@@ -1,10 +1,3 @@
-import torch
-from packaging import version
-
-if version.parse(torch.__version__) >= version.parse('2.0.0'):
-    from einops._torch_specific import allow_ops_in_compiled_graph
-    allow_ops_in_compiled_graph()
-
 from vit_pytorch.vit import ViT
 from vit_pytorch.simple_vit import SimpleViT

--- a/vit_pytorch/cvt.py
+++ b/vit_pytorch/cvt.py
@@ -140,12 +140,13 @@ class CvT(nn.Module):
        s3_heads = 6,
        s3_depth = 10,
        s3_mlp_mult = 4,
-        dropout = 0.
+        dropout = 0.,
+        channels = 3
    ):
        super().__init__()
        kwargs = dict(locals())

-        dim = 3
+        dim = channels
        layers = []

        for prefix in ('s1', 's2', 's3'):
--- a/vit_pytorch/vit_1d.py
+++ b/vit_pytorch/vit_1d.py
@@ -10,7 +10,7 @@ class FeedForward(nn.Module):
    def __init__(self, dim, hidden_dim, dropout = 0.):
        super().__init__()
        self.net = nn.Sequential(
-            nn.Layernorm(dim),
+            nn.LayerNorm(dim),
            nn.Linear(dim, hidden_dim),
            nn.GELU(),
            nn.Dropout(dropout),
Author	SHA1	Message	Date
lucidrains	0ad09c4cbc	allow channels to be customizable for cvt	2023-10-25 14:47:58 -07:00
Phil Wang	92b69321f4	1.6.2	2023-10-24 12:47:38 -07:00
Artem Lukin	fb4ac25174	Fix typo in LayerNorm (#285 ) Co-authored-by: Artem Lukin <artyom.lukin98@gmail.com>	2023-10-24 12:47:21 -07:00
lucidrains	53fe345e85	no longer needed with einops 0.7	2023-10-19 18:16:46 -07:00
Phil Wang	efb94608ea	readme	2023-10-19 09:38:35 -07:00
lucidrains	51310d1d07	add xcit diagram	2023-10-13 09:18:12 -07:00
Phil Wang	1616288e30	add xcit (#284 ) * add xcit * use Rearrange layers * give cross correlation transformer a final norm at end * document	2023-10-13 09:15:13 -07:00
Jason Chou	9e1e824385	Update README.md (#283 ) `patch_size` is size of patches, not number of patches	2023-10-09 11:33:56 -07:00