To aim the problem of document image segmentation, we propose a topic model based method to segment the document images into several areas, such as text, background, tables and figures. In the past, the segmentation of document images focused on threshold based method or supervised learning method. In our work, we firstly build a codebook by using PCA and K-means which only need an unsupervised learning method. Then, the document images are coded using codebook and the probability of each code is calculated using LDA based method, which is followed by a Markov random field, based labeling proc...