Retail product detection in fisheye camera capture scenes frequently suffers from excessive object occlusion and deformation, as well as difficulty in distinguishing products with small fine-grained differences, so accurately classifying and localizing products in these images presents a challenge for computer vision. We propose an efficient product detection network called EPformer by fusing a visual transformer and convolutional neural network to reliably detect retail products in fisheye images. We employ a shifted window strategy for feature information interaction across windows to more p...