Recent research on text -guided image style transfer using CLIP (Contrastive Language -Image Pre -training) models has made good progress. Existing work does not rely on additional generative models, but it cannot guarantee the quality of the generated images, and often suffers from problems such as distortion of content images and uneven stylization of the generated images. To address such problems, this work proposes the TextStyler model, a CLIP -based approach for text -guided style transfer. In the TextStyler model, we propose a style transformation network STNet, which consists of an enco...