Parallel corpus is widely used in machine translation, lexicography, bilingual teaching And so on. Web bilingual resources have the following characteristics: 1) multiple fields. Web The fields covered by the information are very broad, such as bilingual on open curriculum plans.Resources contain resources for the content of each subject. 2) Regularity of page layout. Through After observing a large amount of bilingual data on Wb, I found that the vast majority of sentence pairs are Corresponding and logical locations are adjacent. 3) Noise. We...