Skip to main content

I am trying to compute pointwise mutual information PMI RMP training using wikipedia as data source. Given two words, PMI defines the relation between two words. The formula is as below.

pmi(word1,word2) = log [probability(number of times both words appears in a document together)/probability(word1)*probability(word2)].

Hence to compute PMI, I would need joint and individual probabilities of word1 and word2. I looked at the wikipedia miner relatedness score between two words. They are implementing a Milne and Witten algorithm. However, for defining topic similarities, PMI is a better score.

Does any one know how to compute PMI score for two words using dbpedia or wikipedia miner or any other software.



------------------------------
navya sri
software developer
hkr trainings
------------------------------
Be the first to reply!