Tell me if I'm going crazy or if there is really much difference between a CGI program and a servlet. When some people speak of CGI just the tone of their voices seems to say "that's old, inefficient and we should never use CGI" and "servlets are better, more robust, more capable and will save the world".
I've studied the CGI environment and the servlet spec and have written programs to both interfaces and while there are certain differences, there's nothing in the Java servlet spec that makes it radically different from the CGI model, and in comparison to a slightly more advanced environment, that of Apache mod_perl, the differences between programming in Java and CGI programming in Perl shrink to nearly nothing. The areas in question are:
3) error handling,
4) session management, and
5) access to and ability to extend the server itself.
servlets. It's just PERL servlets instead of Java servlets.
The whole point of servlets is to integrate your application code into the
web server itself whence the name servlets, i.e. little pieces of server code.
CGI, on the other hand, was originally created as an interface to
applications and that is still primarily how CGI is used today.
I'm missing the point of this diatribe...
Please folks, if we follow the name Common Gateway Interface then the Java Servlet Spec, at least the HTTP part of it, just names and describes the data that passes across this interface, and thus is the same as used in what most call CGI programming. On the other hand the Servlet Spec implements this typically with threads and much more efficient passing of the information for processing, and many other systems do so as well. There are two things that are called CGI quite often, the interface has not changed much for years, and the implementation changes quite frequently.
Of course a CGI can be written in any language you choose, even Java. All the issues raised will still apply. This paper will only focus on Java and Perl because they are full-fledged programming languages with wide support.
There seems to be little difference between the servlet and CGI model from the web client side. In both cases, the client connects to the web server, sends an HTTP POST or GET request with associated encoded arguments. The server sends back a response as HTML with the same HTTP headers as would be returned by the results of a CGI handler. The transaction is basically stateless per the HTTP protocol unless the client and server implement keep-alive, but nevertheless each request/response is independent from the previous and state must be kept through cookies or other mechanism shared by the client and server.
A servlet must extend the javax.servlet.http.H
ttpServlet class by at least defining implementations of one of doGet, doPut, or other do<method> operations, to handle HTTP requests. Any of those methods has for its parameters an H
ttpServletResponse and an H
ttpServletRequest has methods to get information about the request from the browser, e.g. getQueryString() getRequestURI(), which correspond to the CGI variables (this correspondence is even mentioned in the servlet API docs).
ttpServletResponse class has methods setting the various HTTP headers, setting the Status of the responses, and encoding the response URL.
There are also utility methods in HttpUtils?
for parsing the Query String or Post data, which return hashtables of key-value pairs, and a Cookie object for creating and manipulating cookies.
Servlets can access additional services through the ServletContext?
Interface. These include methods to write to the log and get info about the server, such as the software version and the virtual host. These services are not normally available to CGIs.
There is also a lifecycle for each servlet instance, and is implemented by the servlet writer with the init() and destroy() methods. These allow the servlet to establish and gracefully remove references to any resources that servlet needs, such as database connections, open files, etc.
Let's get specific with definitions. According to the specification, "The Common Gateway Interface (CGI) is a standard for interfacing external applications with information servers, such as HTTP or Web servers." In fact, nothing in the CGI 1.1 spec defines a programming interface at all. I'll make a leap of faith here and say that when most people say CGI, they are talking about a program that is not part of the Web server, but is executed by the server in response to a specific URL request with certain information about the Web server that executed the program and about the request that it is expected to handle. The "standard input" and "standard output" of the CGI program are connected up through the client connection. The term CGI Request refers to the specific connection from the browser to the URL of the CGI.
Outside the specs, there are a few differences. Normally, CGI programs are run in a different process context than the server which starts them. This may mean a costly fork()/exec() pair. If you write your CGI in C and compile it to native code, the startup is slightly different from a Perl or sh CGI because there is no compilation or interpretation. Generally the CGI does not have any access to the server's configuration. For servlets, the intent is that it will run in the same process space, share resources, and possibly be multithreaded.
In the usual model, each time an HTTP request calls a CGI the server will startup and initialize the entire state of the CGI program. However, for servlets, the init() method is only called once. This improves performance because such costly operations as opening database connections need to be done only once for many HTTP requests to that servlet. Given the stateless nature of the original CGI spec and HTTP, any application state must be handled by some workaround, commonly cookies or appending state info to the URL.
Perhaps surprisingly to those who think of servlets as a magic solution servlets introduce no new technology for session management
. Aside from a class (HttpSession
) in the servlet API, it simply is a useful service to the servlet writer to hide details of cookie handling. Given the services available even to ordinary Perl CGI programs with modules such as CGI.pm, you can also count on having access to some session-management assistance, and you don't need mod_perl for that.
Exception and error handling are also touted as a benefit of servlets. At its most primitive a CGI is only going to result in a program failure and a 500 error to the user's browser. With Java servlets you get exception handling, so even if you application completely fails you still return something to the user; however that info is likely to be cryptic to most users who don't know what a stack trace is [Of course a servlet can catch the exception and write a suitable error to the client
]. One of the benefits of mod_perl is its ability to trap errors and hook into the Apache server's handlers, so that you should be able to return useful info to the browser in a robustly written application.
What else? Oh yeah, actual output the the client. The simplest thing is to print out raw HTML to the request object's writer. It's also possible to use templates or to have on-the-fly page compilation with servlet code embedded in the HTML pages. Servlets can also make connections directly to applets and pass objects directly to the applet (or act as a proxy for the applet to backend services).
Java servlets are remarkably portable, and come closer to the write once, run anywhere hype, mostly because they don't do any complicated user interface handling. But let's be honest, Perl runs on far more platforms than any Java VM [but the important platforms are well supported.
And So On
Now, lets step back a minute and think about something I'm sure has come to mind in reading this: mod_perl. In mod_perl wit Apache, you get essentially the same process model as described above. You get many of the services of the HttpRequest?
objects in the servlet API, and modules like CGI.pm give you state management and cookie handling. You also get access to the server's state through the Apache module. But actually with mod_perl, you get something you don't with servlets, the ability to actually extend the functionality of the server within Perl programs. In limited ways, you can extend the server with servlets, but you can't hook directly into the server's behavior. Yes, there is a mod_java for Apache coming soon, but it doesn't depend on the servlet API. Actually, mod_jserv for now is dependent on *external* JDK binaries, so you still get the separate process model. On the fly page compilation? Extensions like ePerl do that for you also.
So, that's my analysis. My question is, am I missing something? Is there anything about servlets that makes them the greatest thing in web services since HTTP/1.0? Isn't it correct for me to say I (and you) are writing Perl servlets? Performance? With mod_perl you're sharing process space, and with mod_jserv you're not, though other implementations might, so that's a wash. Portability? Hmm, how many platforms support Perl vs Java? Server context available to the servlet writer? The servlet API doesn't define any. Services available to the servlet? Well, java does define some standard things, but I'm sure there are Perl modules to duplicate any of that - DB connections, math functions, whatever.
- http://perl.apache.org/ The Apache/Perl Integration Project With mod_perl it is possible to write Apache modules entirely in Perl. In addition, the persistent interpreter embedded in the server avoids the overhead of starting an external interpreter and the penalty of Perl start-up time.
Surely one of the advantages of a Java-based solution is the built-in support for threads and their synchronization. I'm thinking here about the ease of synchronizing business logic that might be run in the CGI/servlet. (Take the wiki locking mechanism for example.) How can you deal with threads in Perl? - has that been sorted yet? Presumably people will use Unix style semaphores to synchronize separate processes.
Perl 5.005 now has threads. It's still developmental, though. http://language.perl.com/doc/manual/html/lib/Thread.html -- StevenNewton
It's also conceivable that some people will want to run a CGI/servlet developed by a third party within a defined security policy.
Jakarta's Tomcat has support for the Java SecurityManager
. See http://jakarta.apache.org/tomcat/jakarta-tomcat/src/doc/uguide/tomcat-security.html
. -- MattBehrens
On advantage of CGIs that may be worth mentioning: If you don't control the web server (e.g., if you're using a shared server), chances are good that either (a) the HTTP server doesn't support servlets, and/or (b) you might be restricted to using CGI scripts. This discussion implicitly assumes control over the web server. Given how easy it is to set up Linux/Apache, this is getting to be a safer assumption.
One advantage of servlets (either Java or mod_perl-based scripts) is that they let you take the hit for expensive application startup once, rather than once per CGI fork/exec. For example, a Perl script that uses DBM can suffer a startup cost that is roughly linear with the size of the database. There are ways around this, including migrating "business logic" and persistence logic into a separate application server.
I don't know if this applies to servlets, but mod_perl-based scripts require stopping and restarting the server to update the scripts. This can slow development to the point where it is often faster to develop and debug using CGIs, then convert to mod_perl once the scripts are stable. -- DaveSmith
I can't say what it is like in other environments, but we had a 60,000 line total mod_perl based application server, and we started and stopped our personal test servers and the integration server all the time. In fact, I had a one letter alias to do just that. It didn't seem to slow development at all...
I believe that most (all?) servlet-enabled servers and server add-ons detect that a servlet has been changed and reload it. They generally do not detect changes to classes used by the servlet
. That can be worked around by 'touching' the servlet class file when you change a supporting class. -- KielHodges
JavaApache? will reload your servlet if any of the classes it depends upon change, as long as that class is within the user classpath rather than the system classpath. If things are configured properly there should be no need to touch the servlet class.
The biggest win I've seen with servlets is code reuse. I have a pretty well-developed class library, focused on content management but also including nice things like user accounts, and by using servlets I've been able to reuse that code. I have a servlet, a GUI program, and a "batch" program that the same set of business objects; I've used the same User class for four sites. In theory Perl is as flexible, but it's not as easy to sell. -- RobCrawford
The most recent versions of mod_perl will also detect when the program has changed on disk and reload it. Like Java servlets, it will not detect modules used by the program, though the latest version of mod_perl includes support for StatINC, which will force the reload of all changed modules, at the cost of performance. DaveSmith
mentions the expensive application startup issue, which I mention though perhaps not clearly enough under Performance
. He also makes a good point about the situation where you do not control the server, but if that's true you might even end up running on NT/IIS.
As far as security, there is of course taint checking, and there are numerous ways to add security to your Perl scripts, including the Safe module. What else do you mean by security policy?
I'm wondering about servlet synchronization. It seems to me that once a web-based application climbs past a certain point in the complexity curve, Java-based synchronization isn't sufficient, because agents outside of the process space of the server will need synchronized access to the application's persistent data. For this to happen safely, a synchronization method external to the server/servlets is required. (For example, file locking).
It seems prudent to isolate synchronization mechanisms, rather than leaving them strewn throughout an application where some future maintainer might overlook one. So might it not be better, once there are external agents accessing persistent data, to move synchronization out of the servlet portion of the application? Say, into a separate application server? -- DaveSmith
I think it's really a matter of choosing a language to develop in: Java is perhaps a bit more robust, Perl better for glue, PHP3 for ease of use and rapid development. ZopeApplicationServer
) is worth a look if you're dissatisfied with both servlets and Perl -- it's like an ORB/ODB binding to the web. -- MartinPool
One thing to remember is that CGI doesn't necessarily imply perl. I've written perfectly acceptable CGI code in sh, awk, C and C++. In many cases these are simpler to maintain and quicker to start up than perl, especially on the poor little SparcStation?
LX which I use as a web server here. From my experience, though, for equivalent complexity, Java servlets are much quicker in use than perl CGI.
Yes, traditional CGI, in any language, implies a fork()/exec() for every request. Hence the focus on mod_perl (Anybody wanna do a mod_bash?). If you are writing in C you can go both CGI and servlet one better and hook into your server's extension API
Another point against perl is that, despite the hype, you do really need root access to set up and maintain a perl system. If the host you are using lacks a certain module it can be a real pain to get it installed. With Java servlets, once the basic servlet engine is installed you can put any classes you like where you put your servlets and you've got access to those features. No root access or server configuration required.
No root access is needed for the knowledgeable Perl programmer. See CPAN.pm and the @INC variable or the use lib directive.
One point against servlets, though, is the immaturity of the servlet support in current servlet engines. Key features like automatic support for user directories (eg. http://server/~frank/
...) are either missing or virtually impossible to configure, so servlet engines require more expensive and skilled setup and maintenance, thus increasing the cost of servlet support by web providers, and thus decreasing the take-up by cost conscious customers. -- FrankCarver
The goal of the Servlet API that it will HIDE the dependencies on your particular Server API. That way you won't have to learn ISAPI, NSAPI, etc., and will be reasonably guaranteed that all of your code will run on any servlet engine from any vendor. So if you switch from Apache to Netscape to IIS it's no big loss, and no additional development time. As long as you only write to the JSDK 2.0, that seems to hold pretty true.
That's why it said "if you are writing in C". The nice thing about Perl is that it too is the same no matter what server you're running under, and there aren't many other languages that are that cross-platform that also have extensive collection of libraries for web applications. Does IIS even support the servlet API? And if it does now, will it continue to do so now that MS is making noises about abandoning Java?
In hindsight, yes and yes ... though there are differences in the relationships. A robust J2EE implementation might draw a Java developer to Apache just as mod_perl might attract a CGI developer, .NET to IIS, etc. APache with mod_perl, however, makes assumptions about the type of web server being used that puts it in a class apart from a something like a J2EE-compliant server. In my opinion, we can see in retrospect that both are server agnostic, with Java having a wider scope to the standard, increasing the likelihood of code working well without modification on a variety of web server implementations.
After re-reading this and trying to assimilate the comments into the main text, I realized this isn't going where I'd wanted to go. Comments so far have been over either language or implementation details. But what about the fundamental infrastructure of two technologies? I'll make this bold statement: the Servlet spec is just a gussied-up implementation of a CGI request handling API. There's no new functionality introduced, HTTP and the CGI spec are still there, under the hood, doing and not doing the same things in Servlets as before. State is still managed independently by cookies or URLs, request/responses are still URLs and HTML. In short, it's a nice little API specific to a certain language buried under a load of marketing hype. The hype may convince the unwary to make a big switch in language or platform when all that may really be needed is to modify and modernize the implementation of what works.
Perhaps we can say, since only second order issues surfaced, that Steven's assertion holds: mod_perl style cgi and java servlets are to the first order equivalent.
As far as I can tell, Steven isn't saying there's anything wrong
with the Java servlet approach. He's just come across some undue hype.
I use servlets because (a) I prefer Java over Perl generally; (b) I believed servlets would be more portable. mod_perl wasn't around then and seems relatively rare now. I kinda wish I had used Smalltalk, but I felt (and still feel) that few independent ISPs support Smalltalk servlets. -- DaveHarris
Advantages of Servlets
I will take the contrary view. Servlets are great, and Steven has overlooked many of their advantages.
First and foremost has to be ServletDesign
. The design space possible is much bigger than with CGI. It is my view that a ModelViewController
design has always been latent in web technologies, but only a persistent OO server technology can hope to exploit it.
mod_perl is persistent, perl is OO
Well, yes, but that's wandering from the topic a little. This page is CgiVsServlet, and that's CgiPerlVsModPerl?. CGI as a technology has no persistence, in the same way as HTTP.
package mentioned above is a free servlet framework I created to implement that idea.
As for session management - with servlets, you do not need it. All you need is session tracking: the servlet API does support that, via both cookies and URL rewriting.
Yes, Sun has created a decent session management API. There are Perl modules that do that too. It's still just cookies and URL rewriting, so why all the hype?
That's a bit like saying: "CGI is still just HTTP and HTML, so why all the hype?". For me the key is that, by providing session tracking as a fundamental part of the Servlet concept, a Servlet programmer shouldn't need to worry about the details. Sure, there are no doubt Perl modules for this, but as I said above, CGI does not imply Perl. CGI has no built-in idea of sessions, so if I write a CGI in sh or awk or C++ or whatever, I can't rely on sessions like I can with the standard CGI variables.
Another thing you can do with servlets is resolve expensive resources in a separate thread. The user may continue reading further pages until their resource is available. In particular, you can acquire the expensive resource between connections rather than during a connection.
Perl has threads, see the link above
Yeah but Perl threads are experimental and nobody uses them yet for production. A feature you can't use yet doesn't count.
Finally, the Java language itself provides a bunch of interesting technology that can be useful in a servlet: dynamic loading of classes (possibly supplied over the network by the user), a strong security model, and the powerful reflection/introspection API which can be used to automatically generate forms and objects.
-- Justin Wells, WebMacro
No-one has mentioned the collaboration between Servlets and JavaServerPages
(JSP) to build a web application around the MVC model. It allows very clear separation between the request processing (done in servlets) and the actual output returned back to the user (in JSP), since the forwarding is done in the servlet, it is very easy to have the same request showing completely different pages based on the result of the processing. Is that possible at all using CGI (or ASP)?
Another thing is the taglib stuff introduced in JSP 1.1, any corresponding stuff in CGI?
Servlets and Related technology is standardized
. That makes a BIG difference in portability. Moreover, I find code reuse to be greater in case of Servlets than in case of CGI since Java is present in many different spaces and technologies.
I'd have to argue that comparing Servlets out of context of the J2EE spec/framework to another technology isn't a fair comparison. What framework is CGI integrated with? Servlets are just one piece of an overall architecture. CGI is a little more constrained in scope and use than that. -- CS graduate school student