This paper aims at providing a methodology for analyzing the reliability of human evaluation in MT. In the scope of the second TC-STAR evaluation campaign, during which a human evaluation on English-to-Spanish was carried out, we first demonstrate the reliability of the evaluation. Then, we define several methods to detect judges who could bias the evaluation with judgments which are too strict...