Mark Eschbach

Software Developer && System Analyst

Writing a Simple CGI

I run into a requirement to implement a simple web application to take an identifier appended to a URL, send a message, then dump the response. I’ve seen a lot of references to CGI, including in the Java Servlet Specification and I’ve never built a real CGI application. I have written many CGI based interpreted application on PHP, Perl, etc platforms, however I never had to deal with the CGI interface personally. So I will chronicle my attempt here. For those of you playing along at home I will be using C++ as my implementation language. If you aren’t fluent in C++ I will provide examples and descriptions.

I do assume you have prior knowledge of a C based language, such as C, C++, Java, Objective-C, etc, and some experience with a Web based system such as Perl, PHP, Servlets, etc.

The CGI Interface

CGI, or Common Gateway Interface is an application data interface between a web service, such as Apache’s HTTPD, and an executable (script or binary) on the local system. When a request is received for a URL mapped to the executable the web service spawns an instance of the executable. The web service will set a number of environment variables which fill in the details of the request and will pass the request body in via the standard input for the POST and PUT methods.

The protocol for CGI 1.1 is defined in RFC 3875, which appears to be the current at the time of writing this post. Section 6 of the RFC specifies the expected response, and I will go over the the output format first. Section 4 defines the request. If you are going to dive into the RFC you should read section 1.4, Terminology, because the authors clarify the terminology used to refer to various components of the system.

The CGI application is expected to provide an HTTP-like response. The format is as follows:

Content-Type: type
Status: Status-Code Status-Text
Header-Name: Header-Value

Response body, perhaps an HTML or XML document.
Detailed description of the CGI application response
WhereWhat
WhereWhat
typeThe mime type of the response

Status Line

Status-CodeThe response type issued by the CGI. Use 200 for normal output.
Status-TextThe text to go along with the repsonse code (IE: OK for 200 status)

Headers

You may have a number of headers, just repeat the form of line #2. I assume your server software may impose a limit on the number of headers in a response.
Header-NameShould be the name of the outgoing HTTP response header to set followed by a colon and a space.
Header-ValueIs the value to be used.
Empty line [Line #3]The empty line denotes the end of the HTTP response headers and the beginning of the response body.

Response Body

BodyShould contain the response of your CGI application.

So the response is pretty simple. The response is like writing your own web server save the request processing...because parsing is as simple as pie...

So how is request data passed? Environment variables for most. Here is a brief list of variables I typically find useful in a web application.

Request Variables exposed via process Environment
VariableValue
VariableValue
PATH_INFO Contains the ‘sub-resource’ requested from the application. This includes a leading ‘/’. So if a user requested something like http://example.com/cgi-bin/{cgi-name}/test then the result would be /test
PATH_TRANSLATED Provides a path relative to the document root of the virtual host the script is executing under plus the sub-resource requested.
QUERY_STRING This variable will include the string after the ? character.
SERVER_NAME Contains the name of the virtual host the request was made too.
REQUEST_METHOD Which method was requested against the resource.
CONTENT_TYPE The content type of the incoming request body.
CONTENT_LENGTH The length of the content on the incoming request body.
HTTP_* These should contain the headers set by the incoming request.

Nose to the grindstone

Alright, so now we know the input and expected output from our application. Let us start with a simple C++ stub of an application for those who aren’t familiar with the language.

#include <iomanip>
#include <iostream>
#include <stdint.h>

using namespace std;

int main(int argc, char** argv){
	uint8_t result;
	result = 0;
	return result;
}

So all this does is provide a return code of 0 to the calling process, so let us output the string “Hello World!”. We add the following code after line #8:

	cout << "Hello World!" << endl;

The object cout is a kind of output stream attached to the standard output of the process. The word endl is known as a stream manipulator, or rather a function which operates on the string object. The endl stream manipulator writes a new line character then flushes the stream buffer. You could safely buffer all of your output and I don’t think CGI would care.

Standard Output to Web

Well, we now have a first day C++ application. How about we move on to our problem domain? As our next requirement we don’t we output "Hell World!" to the web! First up is writing the CGI output header. We need to write the content type, status, then an empty line for to prepare the body. The following code gets inserted at line #8, right above the last added fragment.

	cout << "Content-type: text/plain" << endl;
	cout << "Status: 200 OK" << endl;
	cout << endl;

Dynamism

I can make up words too :-p. Anyways, my requirements are to interpret the portion of the URL after the CGI script. If you recall from the section above, that is provided in the PATH_INFO environment variables. I’m running my CGI application under a POSIX/*nix system, so we use the C function char* getenv(char*). The return value is NULL if the variable is not set. So here is a simple application which outputs the value extra resource information minus the leading /.

The completed application