Example Use of the JAWS WordNet Library

public ListSynset getSynsetsString word { return Arrays.asListdatabase.getSynsetsword; } public static void mainString[] args { The constant P ropertyN ames.DAT ABASE DIRECT ORY is equal to “word- net.database.dir.” It is a good idea to make sure that you have this Java property set; if the value prints as null, then either fix the way you set Java properties, or just set it explicitly: System.setPropertyPropertyNames.DATABASE_DIRECTORY, Usersmarkwtempwordnet3dict; WordNetTest tester = new WordNetTest; String word = bank; ListSynset synset_list = tester.getSynsetsword; System.out.println\n\n Process word: + word; for Synset synset : synset_list { System.out.println\nsynset type: + SYNSET_TYPES[synset.getType.getCode]; System.out.println definition: + synset.getDefinition; word forms are synonyms: for String wordForm : synset.getWordForms { if wordForm.equalsword { System.out.println synonym: + wordForm; Antonyms are the opposites to synonyms. Notice that antonyms are specific to indi- vidual senses for a word. This is why I have the following code to display antonyms inside the loop over word forms for each word sense for “bank”: antonyms mean the opposite: for WordSense antonym : synset.getAntonymswordForm { for String opposite : antonym.getSynset.getWordForms { System.out.println antonym of + wordForm+: + opposite; } } } } 146 System.out.println\n; } } private WordNetDatabase database; private final static String[] SYNSET_TYPES = {, noun, verb}; } Using this example program, we can see the word “bank” has 18 different “senses,” 10 noun, and 8 verb senses: Process word: bank synset type: noun definition: sloping land especially the slope beside a body of water synset type: noun definition: a financial institution that accepts deposits and channels the money into lending activities synonym: depository financial institution synonym: banking concern synonym: banking company synset type: noun definition: a long ridge or pile synset type: noun definition: an arrangement of similar objects in a row or in tiers synset type: noun definition: a supply or stock held in reserve for future use especially in emergencies synset type: noun definition: the funds held by a gambling house or the dealer in some gambling games synset type: noun definition: a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force synonym: cant synonym: camber synset type: noun definition: a container usually with a slot 147 in the top for keeping money at home synonym: savings bank synonym: coin bank synonym: money box synset type: noun definition: a building in which the business of banking transacted synonym: bank building synset type: noun definition: a flight maneuver; aircraft tips laterally about its longitudinal axis especially in turning synset type: verb definition: tip laterally synset type: verb definition: enclose with a bank synset type: verb definition: do business with a bank or keep an account at a bank synset type: verb definition: act as the banker in a game or in gambling synset type: verb definition: be in the banking business synset type: verb definition: put into a bank account synonym: deposit antonym of deposit: withdraw antonym of deposit: draw antonym of deposit: take out antonym of deposit: draw off synset type: verb definition: cover with ashes so to control the rate of burning synset type: verb definition: have confidence or faith in synonym: trust antonym of trust: distrust antonym of trust: mistrust antonym of trust: suspect antonym of trust: distrust antonym of trust: mistrust antonym of trust: suspect 148 synonym: swear synonym: rely WordNet provides a rich linguistic database for human linguists but although I have been using WordNet since 1999, I do not often use it in automated systems. I tend to use it for manual reference and sometimes for simple tasks like augmenting a list of terms with synonyms. In the next two sub-sections I suggest two possible projects both involving use of synsets synonyms. I have used both of these suggested ideas in my own projects with some success.

9.3.3 Suggested Project: Using a Part of Speech Tagger to Use the Correct WordNet Synonyms

We saw in Section 9.3 that WordNet will give us both synonyms and antonyms opposite meaning of words. The problem is that we can only get words with similar and opposite meanings for specific “senses” of a word. Using the example in Section 9.3, synonyms of the word “bank” in the sense of a verb meaning “have confidence or faith in” are: • trust • swear • rely while synonyms for “bank” in the sense of a noun meaning “a financial institution that accepts deposits and channels the money into lending activities” are: • depository financial institution • banking concern • banking company So, it does not make too much sense to try to maintain a data map of synonyms for a given word. It does make some sense to try to use some information about the context of a word. We can do this with some degree of accuracy by using the part of speech tagger from Section 9.1 to at least determine that a word in a sentence is a noun or a verb, and thus limit the mapping of possible synonyms for the word in its current context. 149

9.3.4 Suggested Project: Using WordNet Synonyms to Improve Document Clustering

Another suggestion for a WordNet-based project is to use the Tagger to identify the probable part of speech for each word in all text documents that you want to cluster, and augment the documents with sysnset synonym data. You can then cluster the documents similarly to how we will calculate document similarity in Section 9.5.

9.4 Automatically Assigning Tags to Text

By tagging I mean assigning zero or more categories like “politics”, “economy”, etc. to text based on the words contained in the text. While the code for doing this is simple there is usually much work to do to build a word count database for different classifications. I have been working on commercial products for automatic tagging and semantic ex- traction for about ten years see www.knowledgebooks.com if you are interested. In this section I will show you some simple techniques for automatically assigning tags or categories to text using some code snippets from my own commercial prod- uct. We will use a set of tags for which I have collected word frequency statistics. For example, a tag of “Java” might be associated with the use of the words “Java,” “JVM,” “Sun,” etc. You can find my pre-trained tag data in the file: test_dataclassification_tags.xml The Java source code for the class AutoT agger is in the file: src-statistical-nlp comknowledgebooksnlpAutoTagger.java The AutoT agger class uses a few data structures to keep track of both the names of tags and the word count statistics for words associated with each tag name. I use a temporary hash table for processing the XML input data: private static HashtableString, HashtableString, Float tagClasses; The names of tags used are defined in the XML tag data file: change this file, and you alter both the tags and behavior of this utility class. Here is a snippet of data 150