Suggestions for Further Study

Knowledge Representation and Logic are huge subjects and I will close out this chapter by recommending a few books that have been the most helpful to me: • Knowledge Representation by John Sowa. This has always been my favorite reference for knowledge representation, logic, and ontologies. • Artificial Intelligence, A Modern Approach by Stuart Russell and Peter Norvig. A very good theoretical treatment of logic and knowledge representation. • The Art of Prolog by Leon Sterling and Ehud Shapiro. Prolog implements a form of predicate logic that is less expressive than the descriptive logics supported by PowerLoom and OWL Chapter 4. That said, Prolog is very efficient and fairly easy to learn and so is sometimes a better choice. This book is one of my favorite general Prolog references. The Prolog language is a powerful AI development tool. Both the open source SWI- Prolog and the commercial Amzi Prolog systems have good Java interfaces. I don’t cover Prolog in this book but there are several very good tutorials on the web if you decide to experiment with Prolog. We will continue Chapter 4 with our study of logic-based reasoning systems in the context of the Semantic Web. 55 56 4 Semantic Web The Semantic Web is intended to provide a massive linked set of data for use by soft- ware systems just as the World Wide Web provides a massive collection of linked web pages for human reading and browsing. The Semantic Web is like the web in that anyone can generate any content that they want. This freedom to publish any- thing works for the web because we use our ability to understand natural language to interpret what we read – and often to dismiss material that based upon our own knowledge we consider to be incorrect. The core concept for the Semantic Web is data integration and use from different sources. As we will soon see, the tools for implementing the Semantic Web are designed for encoding data and sharing data from many different sources. There are several very good Semantic Web toolkits for the Java language and plat- form. I will use Sesame because it is what I often use in my own work and I believe that it is a good starting technology for your first experiments with Semantic Web technologies. This chapter provides an incomplete coverage of Semantic Web tech- nologies and is intended merely as a gentle introduction to a few useful techniques and how to implement those techniques in Java. Figure 4.1 shows a layered set of data models that are used to implement Seman- tic Web applications. To design and implement these applications we need to think in terms of physical models storage and access of RDF, RDFS, and perhaps OWL data, logical models how we use RDF and RDFS to define relationships between data represented as unique URIs and string literals and how we logically combine data from different sources and conceptual modeling higher level knowledge rep- resentation using OWL. I am currently writing a separate book Practical Semantic Web Programming in Java that goes into much more detail on the use of Sesame, Jena, Protege, OwlApis, RD- FRDFSOWL modeling, and Descriptive Logic Reasoners. This chapter is meant to get you interested in this technology but is not intended as a detailed guide. 57 OWL: extends RDFS to allow expression of richer class relationships, cardinality, etc. XML: a syntax for tree structured documents XML Schema: a language for placing restrictions on XML documents RDF: modeling subject, predicate and object links RDFS: vocabulary for describing properties and class membership by properties Figure 4.1: Layers of data models used in implementing Semantic Web applications

4.1 Relational Database Model Has Problems Dealing with Rapidly Changing Data

Requirements When people are first introduced to Semantic Web technologies their first reac- tion is often something like, “I can just do that with a database.” The relational database model is an efficient way to express and work with slowly changing data models. There are some clever tools for dealing with data change requirements in the database world ActiveRecord and migrations being a good example but it is awkward to have end users and even developers tagging on new data attributes to relational database tables. This same limitation also applies to object oriented programming and object mod- eling. Even with dynamic languages that facilitate modifying classes at runtime, the options for adding attributes to existing models is just too limiting. The same argument can be made against the use of XML constrained by conformance to ei- ther DTDs or XML Schemas. It is true that RDF and RDFS can be serialized to XML using many pre-existing XML namespaces for different knowledge sources and their schemas but it turns out that this is done in a way that does not reduce the flexibility for extending data models. XML storage is really only a serialization of RDF and many developers who are just starting to use Semantic Web technologies initially get confused trying to read XML serialization of RDF – almost like trying to read a PDF file with a plain text editor and something to be avoided. 58 A major goal for the rest of this chapter is convincing you that modeling data with RDF and RDFS facilitates freely extending data models and also allows fairly easy integration of data from different sources using different schemas without explicitly converting data from one schema to another for reuse.

4.2 RDF: The Universal Data Format

The Resource Description Framework RDF is used to encode information and the RDF Schema RDFS facilitates using data with different RDF encodings without the need to convert data formats. RDF data was originally encoded as XML and intended for automated processing. In this chapter we will use two simple to read formats called ”N-Triples” and ”N3.” Sesame can be used to convert between all RDF formats so we might as well use formats that are easier to read and understand. RDF data consists of a set of triple values: • subject • predicate • object Some of my work with Semantic Web technologies deals with processing news sto- ries, extracting semantic information from the text, and storing it in RDF. I will use this application domain for the examples in this chapter. I deal with triples like: • subject: a URL or URI of a news article • predicate: a relation like ”containsPerson” • object: a value like ”Bill Clinton” As previously mentioned, we will use either URIs or string literals as values for subjects and objects. We will always use URIs for the values of predicates. In any case URIs are usually preferred to string literals because they are unique. We will see an example of this preferred use but first we need to learn the N-Triple and N3 RDF formats. In Section 4.1 I proposed the idea that RDF was more flexible than Object Modeling in programming languages, relational databases, and XML with schemas. If we can tag new attributes on the fly to existing data, how do we prevent what I might call “data chaos” as we modify existing data sources? It turns out that the solution to this problem is also the solution for encoding real semantics or meaning with data: we usually use unique URIs for RDF subjects, predicates, and objects, and usually 59