Lecturer: Borja Espejo-Garcia
Duration: 2 hours
Participants: 15 people
In this event, we analyzed the difference between CNNs and transformers by implementing computer vision systems using both technologies. CNNs are particularly suited for processing grid-like data, such as images, where spatial relationships matter. The CNN architecture uses filters and pooling layers to extract hierarchical features from the data enabling the recognition of complex patterns in the data. However, While CNNs have a fixed receptive field (a limited context window for each element), Transformers have a global receptive field, meaning they can consider the entire context at once. This ability makes Transformers more skillful in handling long-range dependencies, which may be helpful in some image recognition problems.
This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101070496.