With the development of computer technology, speech synthesis techniques are becoming increasingly sophisticated. Speech cloning can be performed as a subtask of speech synthesis technology by using deep learning techniques to extract acoustic information from human voices and combine it with text to output a natural human voice. However, traditional speech cloning technology still has certain limitations; excessively large text inputs cannot be adequately processed, and the synthesized audio may include noise artifacts like breaks and unclear phrases. In this study, we add a text determinatio...