Lecturer: Borja Espejo-Garcia
Duration: 2 hours
Participants: 15 people
In this event, we analyzed the difference between CNNs and transformers by implementing computer vision systems using both technologies. CNNs are particularly suited for processing grid-like data, such as images, where spatial relationships matter. The CNN architecture uses filters and pooling layers to extract hierarchical features from the data enabling the recognition of complex patterns in the data. However, While CNNs have a fixed receptive field (a limited context window for each element), Transformers have a global receptive field, meaning they can consider the entire context at once. This ability makes Transformers more skillful in handling long-range dependencies, which may be helpful in some image recognition problems.
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.