Open Calais Practical Artificial Intelligence Programming With Java
string xmlns=http:clearforest.com --Use of the Calais Web Service is governed by the Terms
of Service located at http:www.opencalais.com. By using this service or the results of the service you
agree to these terms of service. --
--Relations: Country: France, United States, Spain
Person: Hillary Clinton, Doug Hattaway, Al Gore City: San Francisco
ProvinceOrState: Texas
-- rdf:RDF xmlns:rdf=http:www.w3.org1 ...
xmlns:c=http:s.opencalais.com1pred ...
rdf:type ... ... .
rdf:RDF string
Here we will simply parse out the relations from the comment block. If you want to use Sesame to parse the RDF payload and load it into a local RDF repository
then you can alternatively load the returned Open Calais response by modifying the example code from Chapter 4 using:
StringReader sr = new StringReaderresult; RepositoryConnection connection =
repository.getConnection; connection.addsr, , RDFFormat.RDFXML;
Here are a few code snippets incomplete code: please see the Java source file for more details from the file OpenCalaisClient.java:
public HashtableString, ListString getPropertyNamesAndValuesString text
throws MalformedURLException, IOException { HashtableString, ListString ret =
new HashtableString, ListString; You need an Open Calais license key. The following code sets up the data for a REST
style web service call and opens a connection to the server, makes the request, and retrieves the response in the string variable payload. The Java libraries for handling
178
HTTP connections make it simple to make a architecture style web service call and get the response as a text string:
String licenseID = System.getenvOPEN_CALAIS_KEY; String content = text;
String paramsXML = c:params ... c:params;
StringBuilder sb = new StringBuildercontent.length + 512;
sb.appendlicenseID=.appendlicenseID; sb.appendcontent=.appendcontent;
sb.appendparamsXML=.appendparamsXML; String payload = sb.toString;
URLConnection connection =
new URLhttp:api.opencalais.com .... openConnection;
connection.addRequestPropertyContent-Type, applicationx-www-form-urlencoded;
connection.addRequestPropertyContent-Length, String.valueOfpayload.length;
connection.setDoOutputtrue; OutputStream out = connection.getOutputStream;
OutputStreamWriter writer = new OutputStreamWriterout;
writer.writepayload; writer.flush;
get response from Open Calais server: String result = new Scanner
connection.getInputStream. useDelimiter\\Z.next;
result = result.replaceAlllt;, . replaceAllgt;, ;
The text that we are parsing looks like: Country: France, United States, Spain
Person: Hillary Clinton, Doug Hattaway, Al Gore
so the text response is parsed to extract a list of values for each property name contained in the string variable result:
int index1 = result.indexOfterms of service.--;
179
index1 = result.indexOf--, index1; int index2 = result.indexOf--, index1;
result = result.substringindex1 + 4, index2 - 1 + 1;
String[] lines = result.split\\n; for String line : lines {
int index = line.indexOf:; if index -1 {
String relation = line.substring0, index.trim; String[] entities =
line.substringindex + 1.trim.split,; for int i = 0, size = entities.length;
i size; i++ { entities[i] = entities[i].trim;
} ret.putrelation, Arrays.asListentities;
} }
return ret; }
Again, I want to point out that the above code depends on the format of XML com- ments in the returned XML payload so this code may break in the future and require
modification. Here is an example use of this API:
String content = Hillary Clinton likes to remind Texans that ...;
MapString, ListString results = new OpenCalaisClient.
getPropertyNamesAndValuescontent; for String key : results.keySet {
System.out.println + key + : +
results.getkey; }
In this example the string value assigned to the variable content was about 500 words of text from a news article; the full text can be seen in the example data files.
The output of this example code is:
Person: [Hillary Clinton, Doug Hattaway, Al Gore] Relations: []
City: [San Francisco] Country: [France, United States, Spain]
ProvinceOrState: [Texas]
180
There are several ways that you might want to use named entity identification. One idea is to create a search engine that identifies people, places, and products in search
results and offers users a linked set of documents or web pages that discuss the same people, places, andor products. Another idea is to load the RDF payload returned
by the Open Calais web service calls to an RDF repository and support SPARQL queries. You may also want to modify any content management systems CMS that
you use to add tags for documents maintained in a CMS; using Open Calais you are limited to the types of entities that they extract. This limitation is one reason why
I maintain and support my own system for named entity and classification knowl- edgebooks.com – I like some flexibility in the type of semantic information that I
extract from text data. I covered some of the techniques that I use in my own work in Section 9.2 if you decide to implement your own system to replace or augment
Open Calais.