The right solution to this problem is to contact the people in charge of security and convince them that your application is important enough, and used by enough people outside the firewall, that
the firewall policy should be amended and the firewall selectively breached.
It is often more expedient, however, to simply cheat. If the firewall allows web-related traffic, then you can use this fact to your advantage. You can bypass the firewall by disguising your RMI-
related network traffic to look like Web-related network traffic. How? Encode each remote method invocation inside the content of an HTTP POST command. This practice is known as HTTP
tunneling.
22.2 CGI and Dynamic Content
Weve already discussed HTTP a little in previous chapters. For example, in Chapt er 19
, when we discussed how classes are actually vended over a network, we discussed the basic structure
of an HTTP message. We will expand on those discussions a little bit here and talk about how web servers have evolved over the past 10 years.
HTTP is a communications protocol that was designed for the World Wide Web. It was designed to facilitate the retrieval of static pages of information. That is, the protocol assumed that the
pages being sent already existed and were merely being retrieved from a server. A URL such as ht t p: w w w .salon.com ent t v t em pt at ion index.ht m l
meant, Go to the web server running on the machine named
w w w .salon.com and fetch me the contents of the file to which
enttvtemptationindex.html refers. This is certainly what happens with our dynamic class server. When the RMI runtime requests the
bytecode for a class, all our class server does is send the contents of a .class file back over the network.
22.2.1 The Common Gateway Interface
The next step in the evolution of the Web was the invention of dynamic pages. The invention of dynamic pages really boiled down to the realization that the client didnt care whether the
information being returned already existed. All a web browser needs is a response; whether the web server read the response from a file commonly referred to as returning a static page or is
somehow generating the page on the fly returning a dynamic page is irrelevant to a web browser. This means that a web server can run a program to generate an HTML page on the fly
instead of simply returning the contents of a file without the browser ever noticing.
The first mechanism for creating dynamic pages was the Common Gateway Interface CGI or CGI-BIN. In this system, and in all subsequent mechanisms for generating dynamic pages,
requests are still specified to the web server using URLs. The difference is that, instead of mapping the path of the URL to a file, the web server is configured to forward the request to a
second application. This second application returns an HTTP response to the web server, and this response is then relayed to the application that made the original request.
For example, Figur e 22- 2
shows a snapshot of the configuration applet for Suns JavaWebServer.
[ 2]
In it, we see that a number of different URL patterns are mapped to various programs that will dynamically generate pages. For example:
[ 2]
The use of JavaWebServer for examples in no way implies an endors ement of JWS for real-world applications. For these applications, the combination of the Apache web server and the Tomcat servlet
engine is a much better choice.
• Any request with path ending of .jhtml will be forwarded to the
pageCompile servlet.
• All requests with the path cgi-binjava-rmi.cgi will be forwarded to the
HTTP Tunneler
servlet.
Figure 22-2. Suns JavaWebServer
22.2.2 Servlets
The development of CGI, and the encoding of HTTP request information so that it can be passed to an external process, were major steps forward for dynamic web pages. However, the initial
implementations of CGI performed badly. For example, in 1996, the most common CGI implementation was:
Create a new process for every request. Set a special set of environment variables corresponding to the request parameters and then invoke the
appropriate request-handling program as defined in the web server configuration. Whatever the request-handling program writes to standard-out is the response
that should be sent back to the initiating browser.
This way of writing and installing programs to generate dynamic content is flexible and easy to implement. Moreover, it naturally lends itself to scripting languages such as Perl. However,
naively using such an implementation of CGI leads to problems. Among them are: Performance problems
Forking off a separate process, or even creating a new thread for every request, is expensive.
Language problems Theres a reason that programmers have gradually converted over to well-typed, object-
oriented languages. Blithely ignoring the past 30 years of experience and reverting to scripting languages is, quite probably, a mistake.
Compatibility problems Ideally, youd like your dynamic page-generation programs to be part of the same
codebase as, and reuse components and libraries from, your main enterprise applications.
Javasoft created the Servlet API to solve these problems.
[ 3]
As of this writing, the Servlet API is an impressive document; it is a mature specification that weighs in at over 300 pages.
Fortunately, understanding the basic implementation of HTTP tunneling in RMI doesnt require a thorough mastery of the specification. Instead, the following six paragraphs are sufficient.
[ 3]
Javasoft probably had other reasons as well. But these are the important ones from a technical viewpoint.
The heart of the Servlet specification is the definition of the abstract class HttpServlet
in the javax.servlet.htt
p package. A servlet class extends HttpServlet
and adds request- specific functionality. The servlet class corresponds, more or less, to a program in the CGI
specification. That is, the HttpServlet
class defines a way for the web server or, more precisely, the servlet runner to pass HTTP requests to an instance of the servlet class and then
receive responses from the same instance. Servlets are instantiated by a servlet runner. The servlet runner is responsible for managing the
lifecycle of specific servlet instances and for maintaining connections with the web server. It usually exists in a separate process from the web servers.
The most important methods in the HttpServlet
class are: protected void doGetHttpServletRequest req, HttpServletResponse resp
protected void doPostHttpServletRequest req, HttpServletResponse resp These two methods, which correspond to the HTTP GET and POST commands, have trivial
implementations in HttpServlet
. As written, they do nothing. However, almost every servlet class overrides both of these methods, and very few servlet classes override any other methods
defined in HttpServlet
. The final point worth noticing is that the servlet specification defines a whole set of abstractions
related to HTTP. The two most important data objects are HttpServletRequest
, which encapsulates an incoming HTTP request, and
HttpServletResponse , which encapsulates the
response that will be sent to the HTTP client. Its interesting to note the layers of abstraction. Network programming starts with UDP, and
TCPIP is layered on top. The next layer is comprised of the basic sockets library, as implemented by the operating system. Above that is the
java.net package, which contains a
set of classes that define an object-oriented interface for sockets. The servlet specification then adds another layer of abstraction; from one instance of
Socket , we
get an instance of HttpServletRequest
and an instance of HttpServlet
- Response
. The instance of
HttpServletRequest is a wrapper around the sockets input stream, and the
instance of HttpServletResponse
is a wrapper around the sockets output stream.
If you want to learn more about servlets, Java Servlet Programming, Second Edition by Jason Hunter OReilly is a
good place to start. I also highly recommend downloading and reading the latest version of the servlet specification from
Javasofts servlet pages see
ht t p: w w w .j avasoft .com pr oduct s ser vlet index.ht m l .
22.3 HTTP Tunneling