I am trying to compute pointwise mutual information PMI RMP training using wikipedia as data source. Given two words, PMI defines the relation between two words. The formula is as below.
pmi(word1,word2) = log [probability(number of times both words appears in a document together)/probability(word1)*probability(word2)].
Hence to compute PMI, I would need joint and individual probabilities of word1 and word2. I looked at the wikipedia miner relatedness score between two words. They are implementing a Milne and Witten algorithm. However, for defining topic similarities, PMI is a better score.
Does any one know how to compute PMI score for two words using dbpedia or wikipedia miner or any other software.
------------------------------
navya sri
software developer
hkr trainings
------------------------------

