Friday, April 4, 2014

Finding head word in a Noun Phrase

Recently I came across this problem of finding head words in phrases. This problem assumes significance in a coreference resolution setup. (More on coreference resolution in another post). I was trying to do a simple string match on two phrases after stripping each of the phrases off closed-class words (or stopwords) and punctuation marks. Upon analyzing the results I realized that a simple string match may not solve the problem.

For example: matching "a glass bowl" and "glass of milk"

If we follow the heuristic of matching them after stripping them off stopwords and punctuation marks, we would end up saying that both the phrases are a match. (as there is a string match of "glass").

That is when I realized the need for identifying the entity being talked about in a phrase. The head words for the 2 phrases listed above are "bowl" and "milk" respectively.

It turns out that in NLP literature, it is called as the head word. Prof. Michael Collins, around 19992, seems to have come up with some heuristics (rules based on syntax tree path etc..,) to identify these head words. Upon further research, it seems to me that this is the state of the art and the heuristics are implemented in the stanford CoreNLP suite1

Here is a sample code below based on Stanford CoreNLP to identify head words for NounPhrases3:

1:  public static void dfs(Tree node, Tree parent, HeadFinder headFinder) {  
2:    if (node == null || node.isLeaf()) {  
3:     return;  
4:    }  
5:    if(node.value().equals("NP") ) {  
6:     System.out.println(" Noun Phrase is ");  
7:     List<Tree> leaves = node.getLeaves();  
8:     for(Tree leaf : leaves) {  
9:      System.out.print(leaf.toString()+" ");  
10:    }  
11:    System.out.println();  
12:    System.out.println(" Head string is ");  
13:    System.out.println(node.headTerminal(headFinder, parent));  
14:   }  
15:   for(Tree child : node.children()) {  
16:    dfs(child, node, headFinder);  
17:   }  
18: }  

References:
[1] http://nlp.stanford.edu/software/corenlp.shtml
[2] Prof Michael Collins' thesis has description about head finding rules here http://www.cs.columbia.edu/~mcollins/publications.html
[3] More details on StanfordCoreNLP HeadFinders here - http://stackoverflow.com/a/22841952/1019673

No comments:

Post a Comment