通讯机构:
[Zhao, HH ] H;Hengyang Normal Univ, Sch Comp Sci & Technol, Hengyang, Peoples R China.;Hunan Univ, Natl Engn Lab Robot Visual Percept & Control Techn, Changsha, Peoples R China.
关键词:
Auto-encoder;Computer vision;De-raining;LSTM;SSIM loss function
摘要:
Video de-raining is of significant importance problem in computer vision as rain streaks adversely affect the visual quality of images and hinder subsequent vision-related tasks. Existing video de-raining methods still face challenges such as black shadows and loss of details. In this paper, we introduced a novel de-raining framework called STVDNet, which effectively solves the issues of black shadows and detail loss after de-raining. STVDNet utilizes a Spatial Detail Feature Extraction Module based on an auto-encoder to capture the spatial characteristics of the video. Additionally, we introduced an innovative interaction between the extracted spatial features and Spatio-Temporal features using LSTM to generate initial de-raining results. Finally, we employed 3D convolution and 2D convolution for the detailed processing of the coarse videos. During the training process, we utilized three loss functions, among which the SSIM loss function was employed to process the generated videos, aiming to enhance their detail structure and color recovery. Through extensive experiments conducted on three public datasets, we demonstrated the superiority of our proposed method over state-of-the-art approaches. We also provide our code and pre-trained models at
https://github.com/O-Y-ZONE/STVDNet.git
.
摘要:
Hair editing is challenging due to the complexity and variety of hair materials and shapes. Existing methods employ reference images or user-painted masks to edit hair and have achieved promising results. However, discrepancies in color and shape between the source and target hair can occasionally result in unrealistic results. Therefore, we propose a new hair editing method named HairManip, which decouples the hair information from the input source image into shape and color components. We then train hairstyle and hair color editing sub-networks to handle this complex information independently. To further enhance editing efficiency and accuracy, we introduce a latent code preprocessing module that effectively extracts meaningful features from hair regions, thereby improving the model’s editing capabilities. The experimental results demonstrate that our method achieves significant results in editing accuracy and authenticity, thanks to the carefully designed network structure and loss functions. Code can be found at https://github.com/Zlin0530/HairManip .
关键词:
3D pose shape estimator;Generative adversarial networks;Motion transfer;Video synthesis
摘要:
Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model's learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.
作者:
Zhao, Hui-huang;Ji, Tian-le;Rosin, Paul L.;Lai, Yu-Kun;Meng, Wei-liang;...
期刊:
Pattern Recognition,2024年155 ISSN:0031-3203
通讯作者:
Zhao, HH
作者机构:
[Zhao, Hui-huang; Wang, Yao-nan] Hengyang Normal Univ, Sch Comp Sci & Technol, Hengyang 421002, Peoples R China.;[Ji, Tian-le; Zhao, Hui-huang] Hunan Univ, Natl Engn Lab Robot Visual Percept & Control Techn, Changsha, Peoples R China.;[Lai, Yu-Kun; Rosin, Paul L.] Cardiff Univ, Sch Comp Sci & Informat, Cardiff, Wales.;[Meng, Wei-liang] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100098, Peoples R China.;[Meng, Wei-liang] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China.
通讯机构:
[Zhao, HH ] H;Hengyang Normal Univ, Sch Comp Sci & Technol, Hengyang 421002, Peoples R China.;Hunan Univ, Natl Engn Lab Robot Visual Percept & Control Techn, Changsha, Peoples R China.
关键词:
Cross-lingual;Full-domain convolutional attention;Multi-layer perceptual discriminator;Font style transfer
摘要:
In this paper, we propose a new cross -lingual font style transfer model, FCAGAN, which enables font style transfer between different languages by observing a small number of samples. Most previous work has been on style transfer of different fonts for single language content, but in our task we can learn the font style of one language and migrate it to another. We investigated the drawbacks of related studies and found that existing cross -lingual approaches cannot perfectly learn styles from other languages and maintain the integrity of their own content. Therefore, we designed a new full -domain convolutional attention (FCA) module in combination with other modules to better learn font styles, and a multi -layer perceptual discriminator to ensure character integrity. Experiments show that using this model provides more satisfying results than the current cross -lingual font style transfer methods. Code can be found at https://github.com/jtlxlf/FCAGAN.
通讯机构:
[Zhao, HH ] H;Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421008, Peoples R China.
关键词:
Convex optimization;block compressive sensing;split Bregman iteration;Poisson function
摘要:
To improve reconstruction performance in imagery compressive sensing, the present paper changes solving a block image compressive sensing reconstruction into a convex optimization problem. First, a Total-Variation norm minimization constraints model that includes both L1 and L2 norm functions is established. The split Bregman iterative method solves the model with convex optimization. Then, a robust adaptive image block compressive sensing algorithm is studied based on an analysis of the image features. The image is divided into blocks, and an overlap image block compressive reconstruction method is proposed. Finally, to solve the block effect caused by block compressive sensing reconstruction, a novel image overlap block compressive sensing reconstruction based on the Poisson function is suggested to avoid the block effect in the reconstruction process. The experimental results show that compared with other traditional compressive sensing reconstruction algorithms, the proposed method can generate a better image reconstruction result. According to the PSNR evaluation, when the sampling rate is 0.3, the proposed method is improved by more than 20.98% compared to the conventional techniques, and according to the SSIM evaluation, it has improved by more than 11.92% from the traditional methods. We can also find that the proposed method has better construction effect for traffic sign image recognition compared with ordinary natural image reconstruction. When the sampling rate is only 0.1, the PSNR value reaches 44.28dB, and the SSIM reconstruction accuracy reaches 98.14%. After reconstructing different types and characteristic images, it is supported that the proposed algorithm has good robustness and anti-noise performance.
摘要:
Recent research on text -guided image style transfer using CLIP (Contrastive Language -Image Pre -training) models has made good progress. Existing work does not rely on additional generative models, but it cannot guarantee the quality of the generated images, and often suffers from problems such as distortion of content images and uneven stylization of the generated images. To address such problems, this work proposes the TextStyler model, a CLIP -based approach for text -guided style transfer. In the TextStyler model, we propose a style transformation network STNet, which consists of an encoder and a multi -scale decoder. The network can capture the hierarchical features of the content image, and the decoder feature fusion module in the network, designed based on the channel attention mechanism, helps the network to maximize the retention of the detailed information of the content image while realizing texture transfer. In addition, we design a patch -wise perceptual loss, which is able to transfer the stylized texture to each local region of the image and improve the balance of model stylization. The experimental results show that the TextStyler model can achieve a wider range of style transfer than existing methods using stylized images, and the generated artistic images are more in line with human visual perception than state-of-the-art text -guided style transfer methods.
摘要:
Recent research in arbitrary style transfer has highlighted challenges in maintaining the balance between content structure and style patterns. Moreover, the improper application of style patterns onto the content image often results in suboptimal quality. In this paper, a novel style transfer network, called MCNet, is proposed. It is based on multi-feature correlations. To better explore the intrinsic relationship between the style image and the content image and to transfer the most suitable style onto the content image, a novel Global Style-Attentional Transfer Module, named GSATM, is introduced in this work. GSATM comprises two parts: Forward Adaptive Style Transformation (FAST) and Delayed Style Transformation (DST). The former analyzes the relationship between style and content features and fine-tunes the style features, whereas the latter transfers the content features based on the fine-tuned style features. Moreover, a new encoding and decoding structure is designed to effectively handle the output of GSATM. Extensive quantitative and qualitative experiments fully demonstrate the superiority of our algorithm. Project page: https://github.com/XiangJinCherry/MCNet.
通讯机构:
[Huihuang Zhao] C;College of Computer Science and Technology, Hengyang Normal University, Hengyang, 421002, China<&wdkj&>Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, 421002, China
作者机构:
[Hui-huang Zhao] College of Computer Science and technology, Hengyang Normal University, Hengyang;421008, China;Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang;[Han Liu] School of Computer Science and Informatics, Cardiff University, Queen’s Buildings, 5 The Parade, Cardiff;CF24 3AA, United Kingdom
通讯机构:
[Liu, H.] S;School of Computer Science and Informatics, Queen’s Buildings, 5 The Parade, United Kingdom
摘要:
This paper develops a novel adaptive gradient-based block compressive sensing (AGbBCS_SP) methodology for noisy image compression and reconstruction. The AGbBCS_SP approach splits an image into blocks by maximizing their sparsity, and reconstructs images by solving a convex optimization problem. In block compressive sensing, the commonly used square block shapes cannot always produce the best results. The main contribution of our paper is to provide an adaptive method for block shape selection, improving noisy image reconstruction performance. The proposed algorithm can adaptively achieve better results by using the sparsity of pixels to adaptively select block shape. Experimental results with different image sets demonstrate that our AGbBCS_SP method is able to achieve better performance, in terms of peak signal to noise ratio (PSNR) and computational cost, than several classical algorithms.
作者机构:
[Zhao, Hui-Huang] Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang, Peoples R China.;[Lai, Yu-Kun; Rosin, Paul L.] Cardiff Univ, Sch Comp Sci & Informat, Cardiff, Wales.;[Wang, Yao-Nan] Hunan Univ, Coll Elect & Informat Engn, Changsha, Peoples R China.
通讯机构:
[Zhao, Hui-Huang] H;Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang, Peoples R China.
关键词:
Deep neural networks;Style transfer;Soft mask;Semantic segmentation
摘要:
This paper presents an automatic image synthesis method to transfer the style of an example image to a content image. When standard neural style transfer approaches are used, the textures and colours in different semantic regions of the style image are often applied inappropriately to the content image, ignoring its semantic layout and ruining the transfer result. In order to reduce or avoid such effects, we propose a novel method based on automatically segmenting the objects and extracting their soft semantic masks from the style and content images, in order to preserve the structure of the content image while having the style transferred. Each soft mask of the style image represents a specific part of the style image, corresponding to the soft mask of the content image with the same semantics. Both the soft masks and source images are provided as multichannel input to an augmented deep CNN framework for style transfer which incorporates a generative Markov random field model. The results on various images show that our method outperforms the most recent techniques.
期刊:
The Journal of Engineering,2019年2019(23):8923-8926 ISSN:2051-3305
通讯作者:
Huihuang Zhao
作者机构:
Department of Computer Science and Technology, College of Computer Science and Technology, Hengyang Normal University, Hengyang, Hunan, People's Republic of China;Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, People's Republic of China;[Yun Zhang] Institute of Radio and TV Technology, Communication University of Zhejiang, 310018 Hangzhou, People's Republic of China;[Yaonan Wang] Department of Electrical and Engineering, College of Electrical and Information Engineering, Hunan University, Changsha, People's Republic of China;[Zhijun Qiao] School of Mathematical and Statistical Sciences, University of Texas, Rio Grande Valley, TX, USA
通讯机构:
[Huihuang Zhao] D;Department of Computer Science and Technology, College of Computer Science and Technology, Hengyang Normal University, Hengyang, Hunan, People's Republic of China<&wdkj&>Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, People's Republic of China
摘要:
This study aims to improve the performance in solder joint image compression and reconstruction. A novel adaptive block compressive sensing with convex optimisation and Gini index (Ad_BCSGB_Gini) methodology for solder joint image compression and reconstruction is proposed. At first, the image is split into square blocks and each block is resized into a row which consists of a new image. Then, the new image is transformed into a sparse signal by an orthogonal basis matrix, and the image reconstruction is handled as a convex optimisation problem. Moreover, a gradient-based method which has fast computational speed is used to reconstruct image. There is a control factor which controls a norm l 1 in the optimisation problem. To achieve the best performance, at last, the proposed method adaptively selects the best result by comparing Gini index of the reconstruction results based on different control factor values. Experimental results with different methods indicate that the Ad_BCSGB_Gini method is able to achieve the best performance in quantisation comparison than several classical algorithms, and Ad_BCSGB_Gini has a good robustness.
摘要:
Glass bottles are widely used as containers in the food and beverage industry, especially for beer and carbonated beverages. As the key part of a glass bottle, the bottle bottom and its quality are closely related to product safety. Therefore, the bottle bottom must be inspected before the bottle is used for packaging. In this paper, an apparatus based on machine vision is designed for real-time bottle bottom inspection, and a framework for the defect detection mainly using saliency detection and template matching is presented. Following a brief description of the apparatus, our emphasis is on the image analysis. First, we locate the bottom by combining Hough circle detection with the size prior, and we divide the region of interest into three measurement regions: central panel region, annular panel region, and annular texture region. Then, a saliency detection method is proposed for finding defective areas inside the central panel region. A multiscale filtering method is adopted to search for defects in the annular panel region. For the annular texture region, we combine template matching with multiscale filtering to detect defects. Finally, the defect detection results of the three measurement regions are fused to distinguish the quality of the tested bottle bottom. The proposed defect detection framework is evaluated on bottle bottom images acquired by our designed apparatus. The experimental results demonstrate that the proposed methods achieve the best performance in comparison with many conventional methods.
摘要:
This paper provides a novel method that can achieve better results in solder joint imagery compression and reconstruction. Wavelet packet decomposition is used to generate some frequency coefficients of images. The higher and lower frequency coefficients of the reconstruction signal are used separately to improve the reconstruction performance. A threshold that only relates to the higher frequency coefficients is defined to remove the noise in the reconstruction result in each iteration. A new control factor is further defined to control the threshold value. The control factor relates to the wavelet packet low-frequency coefficients and is updated by the wavelet packet low-frequency coefficients in each iteration. The experimental results reveal that the proposed algorithm is able to improve the performance in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) compared with classical algorithms in reconstruction of different types of solder joint images. When the sample rate is increased, the proposed method improves the reconstruction results and maintains low computational cost. The proposed algorithm can retain more image structure and achieve better results than some common methods.
通讯机构:
[Zhao, Hui-Huang] H;Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421008, Peoples R China.;Hunan Prov Key Lab Intelligent Informat Proc & Ap, Hengyang 421008, Hunan, Peoples R China.
关键词:
Deep neural networks;gram matrix;local patch;Markov random field;style transfer
摘要:
This paper presents a new image synthesis method for image style transfer. For some common methods, the textures and colors in the style image are sometimes applied inappropriately to the content image, which generates artifacts. In order to improve the results, we propose a novel method based on a new strategy that combines both local and global style losses. On the one hand, a style loss function based on a local approach is used to keep the style details. On the other hand, another style loss function based on global measures is used to capture more global structural information. The results on various images show that the proposed method reduces artifacts while faithfully transferring the style image's characteristics and preserving the structure and color of the content image.
摘要:
Recognition of handwritten digits is a very popular application of machine learning. In this context, each of the ten digits (0-9) is defined as a class in the setting of machine learning based classification tasks. In general, popular learning methods , such as support vector machine, neural networks and K nearest neighbours, have been used for classifying instances of handwritten digits to one of the ten classes. However, due to the diversity of handwriting styles from different people, it can happen that some handwritten digits (e.g. 4 and 9) are very similar and are thus difficult to distinguish. Also, each single learning algorithm may have its own advantages and disadvantages, which means that a single algorithm would be capable of learning some but not all specific characteristics of handwritten digits. From this point of view, a method for handwritten digits recognition is proposed in the setting of ensemble learning, towards encouraging the diversity among different classifiers trained by different learning algorithms. In particular, the image features of handwritten digits are extracted by using the Convolutional Neural Network architecture. Furthermore, single classifiers trained respectively by K nearest neighbours and random forests are fused as an ensemble one. The experimental results show that the ensemble classifier was able to achieve a recognition accuracy of ≥ 98% using the MNISET data set.