diff --git a/README.md b/README.md
index ce3b4ce..19192b8 100644
--- a/README.md
+++ b/README.md
@@ -338,7 +338,7 @@ pred = v(img) # (1, 1000)
 
 <img src="./images/twins_svt.png" width="400px"></img>
 
-This <a href="https://arxiv.org/abs/2104.13840">paper</a> mixes local and global attention, along with positiona encoding generator (proposed in <a href="https://arxiv.org/abs/2102.10882">CPVT</a>) and global average pooling, to achieve the same results as <a href="https://arxiv.org/abs/2103.14030">Swin</a>, without the extra complexity of shifted windows, etc.
+This <a href="https://arxiv.org/abs/2104.13840">paper</a> mixes local and global attention, along with position encoding generator (proposed in <a href="https://arxiv.org/abs/2102.10882">CPVT</a>) and global average pooling, to achieve the same results as <a href="https://arxiv.org/abs/2103.14030">Swin</a>, without the extra complexity of shifted windows, etc.
 
 ```python
 import torch
diff --git a/setup.py b/setup.py
index d6744fc..e159a8d 100644
--- a/setup.py
+++ b/setup.py
@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
 setup(
   name = 'vit-pytorch',
   packages = find_packages(exclude=['examples']),
-  version = '0.16.13',
+  version = '0.17.0',
   license='MIT',
   description = 'Vision Transformer (ViT) - Pytorch',
   author = 'Phil Wang',