Despite the significant advancements in camouflaged object detection achieved by convolutional neural network (CNN) methods and vision transformer (ViT) methods, both have limitations. CNN-based methods fail to explore long-range dependencies due to their limited receptive fields, while ViT-based methods lose detailed information due to large-span aggregation. To address these issues, we introduce a novel model, the double-extraction and triple-fusion network (DTNet), which leverages the global context modeling capabilities of ViT-based encoders and the detail capture capabilities of CNN-based...