GNU ASpell Library and Jazzy

File dict = new Filetest_datadictionaryenglish.0; SpellChecker checker = new SpellCheckernew SpellDictionaryHashMapdict; int THRESHOLD = 10; computational cost threshold System.out.printlnchecker.getSuggestionsrunnng, THRESHOLD; System.out.printlnchecker.getSuggestionsseason, THRESHOLD; System.out.printlnchecker.getSuggestions advantagius, THRESHOLD; The method getSuggestions returns an ArrayList of spelling suggestions. This example code produces the following output: [running] [season, seasons, reason] [advantageous, advantages] The file test datadictionaryenglish.0 contains an alphabetically ordered list of words, one per line. You may want to add words appropriate for the type of text that your applications use. For example, if you were adding spelling correction to a web site for selling sailboats then you would want to insert manufacturer and product names to this word list in the correct alphabetical order. The title of this book contains the word “Practical,” so I feel fine about showing you how to use a useful Open Source package like Jazzy without digging into its implementation or APsell’s implementation. The next section contains the imple- mentation of a simple algorithm and we will study its implementation some detail.

9.6.2 Peter Norvig’s Spelling Algorithm

Peter Norvig designed and implemented a spelling corrector in about 20 lines of Python code. I will implement his algorithm in Java in this section and in Section 9.6.3 I will extend my implementation to also use word pair statistics. The class SpellingSuggestions uses static data to create an in-memory spelling dictionary. This initialization will be done at class load time so creating instances of this class will be inexpensive. Here is the static initialization code with error handling removed for brevity: private static MapString, Integer wordCounts = 158 new HashMapString, Integer; static { Use Peter Norvig’s training file big.txt: http:www.norvig.comspell-correct.html FileInputStream fstream = new FileInputStreamtmpbig.txt; DataInputStream in = new DataInputStreamfstream; BufferedReader br = new BufferedReadernew InputStreamReaderin; String line; while line = br.readLine = null { ListString words = Tokenizer.wordsToListline; for String word : words { if wordCounts.containsKeyword { Integer count = wordCounts.getword; wordCounts.putword, count + 1; } else { wordCounts.putword, 1; } } } in.close; } The class has two static methods that implement the algorithm. The first method edits seen in the following listing is private and returns a list of permutations for a string containing a word. Permutations are created by removing characters, by reversing the order of two adjacent characters, by replacing single characters with all other characters, and by adding all possible letters to each space between characters in the word: private static ListString editsString word { int wordL = word.length, wordLm1 = wordL - 1; ListString possible = new ArrayListString; drop a character: for int i=0; i wordL; ++i { possible.addword.substring0, i + word.substringi+1; } reverse order of 2 characters: for int i=0; i wordLm1; ++i { possible.addword.substring0, i + word.substringi+1, i+2 + word.substringi, i+1 + 159