The Common Gateway Interface Servlets

The right solution to this problem is to contact the people in charge of security and convince them that your application is important enough, and used by enough people outside the firewall, that the firewall policy should be amended and the firewall selectively breached. It is often more expedient, however, to simply cheat. If the firewall allows web-related traffic, then you can use this fact to your advantage. You can bypass the firewall by disguising your RMI- related network traffic to look like Web-related network traffic. How? Encode each remote method invocation inside the content of an HTTP POST command. This practice is known as HTTP tunneling.

22.2 CGI and Dynamic Content

Weve already discussed HTTP a little in previous chapters. For example, in Chapt er 19 , when we discussed how classes are actually vended over a network, we discussed the basic structure of an HTTP message. We will expand on those discussions a little bit here and talk about how web servers have evolved over the past 10 years. HTTP is a communications protocol that was designed for the World Wide Web. It was designed to facilitate the retrieval of static pages of information. That is, the protocol assumed that the pages being sent already existed and were merely being retrieved from a server. A URL such as ht t p: w w w .salon.com ent t v t em pt at ion index.ht m l meant, Go to the web server running on the machine named w w w .salon.com and fetch me the contents of the file to which enttvtemptationindex.html refers. This is certainly what happens with our dynamic class server. When the RMI runtime requests the bytecode for a class, all our class server does is send the contents of a .class file back over the network.

22.2.1 The Common Gateway Interface

The next step in the evolution of the Web was the invention of dynamic pages. The invention of dynamic pages really boiled down to the realization that the client didnt care whether the information being returned already existed. All a web browser needs is a response; whether the web server read the response from a file commonly referred to as returning a static page or is somehow generating the page on the fly returning a dynamic page is irrelevant to a web browser. This means that a web server can run a program to generate an HTML page on the fly instead of simply returning the contents of a file without the browser ever noticing. The first mechanism for creating dynamic pages was the Common Gateway Interface CGI or CGI-BIN. In this system, and in all subsequent mechanisms for generating dynamic pages, requests are still specified to the web server using URLs. The difference is that, instead of mapping the path of the URL to a file, the web server is configured to forward the request to a second application. This second application returns an HTTP response to the web server, and this response is then relayed to the application that made the original request. For example, Figur e 22- 2 shows a snapshot of the configuration applet for Suns JavaWebServer. [ 2] In it, we see that a number of different URL patterns are mapped to various programs that will dynamically generate pages. For example: [ 2] The use of JavaWebServer for examples in no way implies an endors ement of JWS for real-world applications. For these applications, the combination of the Apache web server and the Tomcat servlet engine is a much better choice. • Any request with path ending of .jhtml will be forwarded to the pageCompile servlet. • All requests with the path cgi-binjava-rmi.cgi will be forwarded to the HTTP Tunneler servlet. Figure 22-2. Suns JavaWebServer

22.2.2 Servlets

The development of CGI, and the encoding of HTTP request information so that it can be passed to an external process, were major steps forward for dynamic web pages. However, the initial implementations of CGI performed badly. For example, in 1996, the most common CGI implementation was: Create a new process for every request. Set a special set of environment variables corresponding to the request parameters and then invoke the appropriate request-handling program as defined in the web server configuration. Whatever the request-handling program writes to standard-out is the response that should be sent back to the initiating browser. This way of writing and installing programs to generate dynamic content is flexible and easy to implement. Moreover, it naturally lends itself to scripting languages such as Perl. However, naively using such an implementation of CGI leads to problems. Among them are: Performance problems Forking off a separate process, or even creating a new thread for every request, is expensive. Language problems Theres a reason that programmers have gradually converted over to well-typed, object- oriented languages. Blithely ignoring the past 30 years of experience and reverting to scripting languages is, quite probably, a mistake. Compatibility problems Ideally, youd like your dynamic page-generation programs to be part of the same codebase as, and reuse components and libraries from, your main enterprise applications. Javasoft created the Servlet API to solve these problems. [ 3] As of this writing, the Servlet API is an impressive document; it is a mature specification that weighs in at over 300 pages. Fortunately, understanding the basic implementation of HTTP tunneling in RMI doesnt require a thorough mastery of the specification. Instead, the following six paragraphs are sufficient. [ 3] Javasoft probably had other reasons as well. But these are the important ones from a technical viewpoint. The heart of the Servlet specification is the definition of the abstract class HttpServlet in the javax.servlet.htt p package. A servlet class extends HttpServlet and adds request- specific functionality. The servlet class corresponds, more or less, to a program in the CGI specification. That is, the HttpServlet class defines a way for the web server or, more precisely, the servlet runner to pass HTTP requests to an instance of the servlet class and then receive responses from the same instance. Servlets are instantiated by a servlet runner. The servlet runner is responsible for managing the lifecycle of specific servlet instances and for maintaining connections with the web server. It usually exists in a separate process from the web servers. The most important methods in the HttpServlet class are: protected void doGetHttpServletRequest req, HttpServletResponse resp protected void doPostHttpServletRequest req, HttpServletResponse resp These two methods, which correspond to the HTTP GET and POST commands, have trivial implementations in HttpServlet . As written, they do nothing. However, almost every servlet class overrides both of these methods, and very few servlet classes override any other methods defined in HttpServlet . The final point worth noticing is that the servlet specification defines a whole set of abstractions related to HTTP. The two most important data objects are HttpServletRequest , which encapsulates an incoming HTTP request, and HttpServletResponse , which encapsulates the response that will be sent to the HTTP client. Its interesting to note the layers of abstraction. Network programming starts with UDP, and TCPIP is layered on top. The next layer is comprised of the basic sockets library, as implemented by the operating system. Above that is the java.net package, which contains a set of classes that define an object-oriented interface for sockets. The servlet specification then adds another layer of abstraction; from one instance of Socket , we get an instance of HttpServletRequest and an instance of HttpServlet - Response . The instance of HttpServletRequest is a wrapper around the sockets input stream, and the instance of HttpServletResponse is a wrapper around the sockets output stream. If you want to learn more about servlets, Java Servlet Programming, Second Edition by Jason Hunter OReilly is a good place to start. I also highly recommend downloading and reading the latest version of the servlet specification from Javasofts servlet pages see ht t p: w w w .j avasoft .com pr oduct s ser vlet index.ht m l .

22.3 HTTP Tunneling