Class GND

    • Constructor Detail

      • GND

        public GND(boolean backgroundIsSuperset)
    • Method Detail

      • equals

        public boolean equals(Object other)
      • hashCode

        public int hashCode()
      • getScore

        public double getScore(long subsetFreq,
                               long subsetSize,
                               long supersetFreq,
                               long supersetSize)
        Calculates Google Normalized Distance, as described in "The Google Similarity Distance", Cilibrasi and Vitanyi, 2007 link: http://arxiv.org/pdf/cs/0412098v3.pdf
        Specified by:
        getScore in class  SignificanceHeuristic
        Parameters:
        subsetFreq - The frequency of the term in the selected sample
        subsetSize - The size of the selected sample (typically number of docs)
        supersetFreq - The frequency of the term in the superset from which the sample was taken
        supersetSize - The size of the superset from which the sample was taken (typically number of docs)
        Returns:
        a "significance" score