A General Tool for Anaphora Resolution, GuiTAR |
GuiTAR stands for "General-purpose Tool for Anaphora Resolution". It was developed in Java at Essex University between 2003 -- 2005 (see older pages, no longer updated)
It was brushed up and put up on sourceforge more recently with the hope that it may be of use to others. Details of experiments and how it was used for research can be found in my PhD thesis, also available as published manuscript.
You could cite it as follows:
- Kabadjov, Mijail (2010). Anaphora Resolution and Discourse-new Classification: A Comprehensive Evaluation. VDM Verlag Dr. Müller. ISBN: 978-3639244472.
Download GuiTAR sources from here
Untar and unzip the sources to a working directory. Then, start from the README.
Remember to set the classpath to include all the .jar files in the lib subdirectory (there is a simple shell script included, called 'print-classpath.sh', which returns the correct classpath on standard output).
If you have used ltchunk previously, you might want to try using it.
It may work as well, but I have not tried it myself.
This step has changed so the first time you run the system over your input you should add the command line option '-prepro' to the command described in sec. III below.
This is probably the most recommended way for using it. It should work as with previous versions. With the option "-i" you can provide an open-ended list of input files. Even though it is open-ended there is a limitation on how many files you can provide (specially on the Windows prompt), therefore a more useful option for many input files would be "-f" followed by a name of (text) file containing the list of input file names (one per line). Additionally, the "-t" option is almost a must (uses the Penn Tree Bank tag set used by Charniak's parser), since if it is not provided it will by default use a tag set employed by a proprietary software XELDA (which I have not used for long).
This new version of GuiTAR also features a discourse-new classifier and in the zip file there are two
trained models: one is for a Support Vector Machines (SVM) classifier using software
LIBSVM
and the other is a Maximum Entropy classifier using the openNLP package
MAXENT.
But in order to use this facility, one must provide a valid Google Key, as some of the classifier's input features
are computed by querying google through its API.
The way to provide this key, is by editing the file "penntagSet.ini" (or "tagSet.ini" if using XELDA) and
replacing the value "kkk" of the parameter "GOOGLE_KEY". Otherwise google features will not be computed, and possibly
the system will not behave as expected. Here is how you can invoke either one or the other classifiers:
java -jar gtar3.0.3.jar -log -t penntagSet.ini -verbose -svm gnmvpc.libsvm -f masxmlFilesVPCGNM.txt
java -jar gtar3.0.3.jar -log -t penntagSet.ini -verbose -maxent gnmvpc.maxent -f masxmlFilesVPCGNM.txt
Note the libsvm model is composed of two files: one with the normalization ranges of input features and the other
one with the model. The maxent model is the model itself, so any feature normalization should be done externally.
Feedback welcome.