I have published my work on generalized language models with the Association of Computer Linguistics. The corresponding software the generalized language model toolkit can be found on github. You might also want to check out the open licensed course materials about basic modeling of similarity of text corpora which I have created in the past. Also I have analyzed the wikipedia corpus in the past.

If you think my experience could be of help for you I will be happy to talk to you about the NLP problems that you might have. Weather you are a media company, blogger, library, or u just happen to have some large text corpora which you want to analyze and use to develop your business it is very likely that I will be able to help you out.