What Actually Happens When a CGI Script Runs

You type a URL. A web page appears. Between those two events, the server spawns a process, passes it environment variables, reads its output, and kills it. Every single time. Here is what happens, step by step.

The Common Gateway Interface is often described in abstractions: “the server runs a script and returns the output.” That description is technically correct and practically useless. It hides the actual mechanics — the system calls, the data flow, the failure modes — that every web developer in the 1990s had to understand intimately to get anything working. This article walks through every stage of CGI execution, from the moment a browser sends an HTTP request to the moment the server process dies and its memory is reclaimed. No abstractions. Just what actually happens.

The Request

It begins with a browser. A user on a Windows 98 machine running Internet Explorer 5 clicks a link on a personal homepage. The browser constructs an HTTP request and sends it over a TCP connection to the server at www.worldwidemart.com, port 80:

GET /cgi-bin/counter.pl?style=led&digits=6 HTTP/1.1
Host: www.worldwidemart.com
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)
Accept: text/html, image/gif, image/jpeg, */*
Referer: http://www.worldwidemart.com/
Connection: keep-alive

This looks like any other HTTP request. The browser does not know or care that the URL points to an executable script rather than a static HTML file. It sends the same headers, the same connection negotiation, the same bytes on the wire. The distinction between “static file” and “CGI script” is entirely the server’s concern.

The critical detail is the path: /cgi-bin/counter.pl. That /cgi-bin/ prefix is about to trigger a fundamentally different code path inside Apache. Instead of reading a file from disk and sending its contents, the server is about to execute a program.

Step 1: Apache Receives the Request

The Apache HTTP Server (version 1.3.x, the dominant web server of the late 1990s) reads the incoming request and begins mapping the URL to a filesystem path. This is where configuration directives determine the request’s fate.

In httpd.conf, the system administrator has defined a special mapping:

# httpd.conf
ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

<Directory "/var/www/cgi-bin">
    AllowOverride None
    Options +ExecCGI
    AddHandler cgi-script .pl .cgi
    Require all granted
</Directory>

The ScriptAlias directive does two things simultaneously. First, it maps the URL path /cgi-bin/ to the filesystem directory /var/www/cgi-bin/. Second — and this is the critical part — it marks every file in that directory as executable. Unlike a regular Alias, ScriptAlias tells Apache that files here are programs to be run, not documents to be served.

Apache performs several checks before proceeding:

  1. File existence — Does /var/www/cgi-bin/counter.pl exist on disk? If not, Apache returns 404 Not Found.
  2. Execute permission — Does the file have the executable bit set (chmod 755)? On Unix systems, the file must be executable by the user Apache runs as (typically nobody or www-data). If the permission check fails, Apache returns 403 Forbidden or 500 Internal Server Error, depending on the configuration.
  3. Handler match — Does the file extension (.pl) match a registered CGI handler? The AddHandler cgi-script .pl .cgi directive confirms this.
  4. Shebang line — Apache reads the first two bytes of the file. If they are #!, it uses the path that follows as the interpreter. For a Perl script, the first line is typically #!/usr/bin/perl.

All checks pass. Apache now knows it needs to execute /var/www/cgi-bin/counter.pl using /usr/bin/perl as the interpreter. The next step involves the operating system itself.

Step 2: fork() — Creating a New Process

This is the most consequential step in the entire CGI lifecycle, and the one that ultimately led to CGI’s replacement by faster technologies. Apache calls the fork() system call.

[Apache parent process]  (PID 1234)
        |
        +--- fork() ---> [Child process]  (PID 5678)
                              |
                              +--- exec("/usr/bin/perl", "/var/www/cgi-bin/counter.pl")
                              |
                              [Perl interpreter running counter.pl]

fork() is a Unix system call that creates an exact copy of the calling process. The operating system duplicates the entire process memory space, file descriptors, environment, and execution context. For a brief moment, there are two identical Apache processes running. The original (parent) continues handling connections. The copy (child) is about to become something else entirely.

Immediately after fork(), the child process calls exec(). This is a family of system calls (execve, execvp, etc.) that replaces the current process image with a new program. The child process stops being Apache and becomes the Perl interpreter. The Perl binary is loaded from disk, its runtime is initialized, and it begins parsing counter.pl.

The cost of this operation is substantial:

Operation Approximate Time (1998 hardware)
fork() — duplicate process 10–50 ms
exec() — load Perl binary 20–80 ms
Perl startup and script parsing 30–100 ms
Module loading (use CGI, etc.) 50–200 ms
Total overhead before script logic begins 110–430 ms

On a shared hosting server in 1998 — a Pentium II at 300 MHz with 128 MB of RAM — this overhead was enormous. If ten users hit the same CGI script simultaneously, the server had to create ten separate processes, load ten copies of the Perl interpreter, and parse the same script ten times. Each process consumed 2–8 MB of RAM. A server that could comfortably serve 500 static pages per second might handle only 15–30 CGI requests per second.

Compare this with mod_perl, which appeared in 1996. With mod_perl, the Perl interpreter was embedded directly inside the Apache process. No fork(), no exec(), no interpreter startup. The script was precompiled and cached in memory. The same request that took 300ms via CGI could be served in 5–15ms via mod_perl — a 20x improvement. But mod_perl required more expertise to set up, and most shared hosting providers in the 1990s did not offer it. So CGI remained the default.

Step 3: Environment Variables — Passing the Request Data

Before the child process calls exec(), Apache populates the process’s environment with a set of standardized variables. These environment variables are the primary mechanism by which the web server communicates with the CGI script. They are defined in RFC 3875 (the CGI specification) and have remained unchanged since the earliest implementations on the NCSA HTTPd server in 1993.

For our request to /cgi-bin/counter.pl?style=led&digits=6, Apache sets the following variables:

GATEWAY_INTERFACE = CGI/1.1
SERVER_PROTOCOL   = HTTP/1.1
SERVER_SOFTWARE   = Apache/1.3.37 (Unix)
REQUEST_METHOD    = GET
QUERY_STRING      = style=led&digits=6
SCRIPT_NAME       = /cgi-bin/counter.pl
SCRIPT_FILENAME   = /var/www/cgi-bin/counter.pl
PATH_INFO         =
PATH_TRANSLATED   =
REMOTE_ADDR       = 198.51.100.42
REMOTE_HOST       = dialup-42.aol.com
HTTP_HOST         = www.worldwidemart.com
HTTP_USER_AGENT   = Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)
HTTP_ACCEPT       = text/html, image/gif, image/jpeg, */*
HTTP_REFERER      = http://www.worldwidemart.com/
CONTENT_TYPE      =
CONTENT_LENGTH    = 0
SERVER_NAME       = www.worldwidemart.com
SERVER_PORT       = 80
SERVER_ADMIN      = [email protected]
DOCUMENT_ROOT     = /var/www/html

These variables fall into three categories:

Request metadataREQUEST_METHOD, QUERY_STRING, CONTENT_TYPE, CONTENT_LENGTH. These describe what the client is asking for and how the data is encoded.

Client informationREMOTE_ADDR, REMOTE_HOST. The client’s IP address and (if DNS reverse lookup is enabled) hostname. In the dial-up era, REMOTE_HOST often revealed the user’s ISP: dialup-42.aol.com, ppp-123.earthlink.net, dyn-56.mindspring.com.

HTTP headers — Every incoming HTTP header is converted to an environment variable by uppercasing it, replacing hyphens with underscores, and prepending HTTP_. So User-Agent becomes HTTP_USER_AGENT, Referer becomes HTTP_REFERER, and Accept-Language becomes HTTP_ACCEPT_LANGUAGE.

Inside the Perl script, all of these are accessible through the global %ENV hash:

#!/usr/bin/perl

my $method = $ENV{'REQUEST_METHOD'};    # "GET"
my $query  = $ENV{'QUERY_STRING'};      # "style=led&digits=6"
my $ip     = $ENV{'REMOTE_ADDR'};       # "198.51.100.42"
my $agent  = $ENV{'HTTP_USER_AGENT'};   # "Mozilla/4.0 (compatible; ...)"
my $host   = $ENV{'HTTP_HOST'};         # "www.worldwidemart.com"

The QUERY_STRING is everything after the ? in the URL, in its raw URL-encoded form. Parsing it means splitting on &, then splitting each pair on =, then URL-decoding both the name and value (replacing + with spaces and %XX hex sequences with their character equivalents). This parsing was so common that Lincoln Stein’s CGI.pm module (bundled with Perl since 5.004 in 1997) became the de facto standard for handling it.

Step 4: STDIN — For POST Requests

For GET requests, all data arrives through QUERY_STRING and the story is simple. POST requests add another data channel: standard input.

When a user fills out an HTML form and clicks “Submit,” the browser sends the form data in the HTTP request body. Apache reads this body from the network socket and pipes it directly into the CGI script’s STDIN. The script reads it the same way it would read from a file or the keyboard:

#!/usr/bin/perl
# Reading POST data from STDIN

my $buffer;
if ($ENV{'REQUEST_METHOD'} eq 'POST') {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    # $buffer now contains: "name=John+Doe&email=john%40example.com&message=Hello+World"
}

The CONTENT_LENGTH environment variable tells the script exactly how many bytes to read. This is critical — reading too few bytes gives you truncated data; reading too many causes the script to hang, waiting for input that will never arrive. The read() call blocks until it receives exactly that many bytes or the connection is closed.

The raw POST data is URL-encoded, the same format as a query string: name=value pairs separated by &, with spaces encoded as + and special characters as %XX hex sequences. Parsing it by hand:

# Manual parsing of URL-encoded POST data
my %form;
foreach my $pair (split(/&/, $buffer)) {
    my ($name, $value) = split(/=/, $pair, 2);
    $value =~ tr/+/ /;                         # + back to space
    $value =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;  # %XX to char
    $name  =~ tr/+/ /;
    $name  =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
    $form{$name} = $value;
}
# $form{'name'}    = "John Doe"
# $form{'email'}   = "[email protected]"
# $form{'message'} = "Hello World"

This manual parsing is what every CGI script had to do before CGI.pm automated it. And this parsing code was a security minefield: it did not handle multiple values for the same field name, it did not limit input size (enabling denial-of-service attacks via megabyte-sized POST bodies), and it did not validate or sanitize the input in any way. The raw values went straight into whatever the script did next — writing to a file, inserting into a database, or, in the worst case, passing into a shell command.

GET vs. POST in CGI: GET requests pass data through QUERY_STRING (in the URL, limited to roughly 2,048 characters by most browsers). POST requests pass data through STDIN (in the body, with no practical size limit enforced by the CGI protocol itself). Both use the same URL-encoded format. The choice between GET and POST was supposed to follow HTTP semantics — GET for retrieval, POST for modification — but in practice, many early CGI scripts used GET for everything, including form submissions that modified server-side data.

Step 5: The Script Executes

With environment variables set and STDIN ready, the Perl interpreter begins executing the script. Here is counter.pl — a simplified version of the kind of hit counter that appeared on millions of personal web pages in the late 1990s:

#!/usr/bin/perl
# counter.pl — a simple hit counter for worldwidemart.com
# Typical CGI script circa 1997

use strict;
use warnings;

my $count_file = "/var/www/cgi-bin/data/count.dat";
my $count = 0;

# --- Read current count from file ---
if (open(my $fh, "<", $count_file)) {
    $count = <$fh>;
    close($fh);
    chomp($count);
    $count = int($count);
}

# --- Increment ---
$count++;

# --- Write new count back to file ---
if (open(my $fh, ">", $count_file)) {
    print $fh $count;
    close($fh);
}

# --- Parse query string for display options ---
my %params;
foreach my $pair (split(/&/, $ENV{'QUERY_STRING'} || '')) {
    my ($k, $v) = split(/=/, $pair, 2);
    $params{$k} = $v if defined $k;
}

my $style  = $params{'style'}  || 'plain';
my $digits = int($params{'digits'} || 6);
my $display = sprintf("%0${digits}d", $count);

# --- Output HTTP headers + HTML ---
print "Content-type: text/html\n\n";
print "<html><body>\n";
print "<p>You are visitor number: <b>$display</b></p>\n";
print "</body></html>\n";

This script does four things: read a number from a file, increment it, write it back, and output an HTML page. The actual computation takes microseconds. The file I/O takes a few milliseconds at most. The overhead of getting to this point — fork(), exec(), Perl startup — dwarfs the script’s own execution time by an order of magnitude.

But there is a subtle bug here that plagued CGI scripts for years: a race condition. If two requests arrive simultaneously, both processes read count.dat at the same time, both see the same number (say, 41), both increment to 42, and both write 42 back. One visit is lost. Under heavy traffic, counters would routinely undercount. The correct solution was file locking (flock() in Perl), but most scripts found on the internet in the 1990s did not implement it. This was one of the hazards of learning web programming from copied code snippets.

More complex CGI scripts would do much more during this phase: connect to a MySQL or PostgreSQL database, read template files, process uploaded files (multipart form data), send emails via sendmail, or generate images using GD.pm. Every one of these operations happened fresh, from scratch, on every request. No connection pooling. No template caching. No persistent state of any kind.

Step 6: STDOUT — The Response

The CGI script communicates its response to Apache through one channel: standard output (STDOUT). Everything the script prints goes into a pipe that Apache reads from the other end.

The output must follow a strict format. First, one or more HTTP response headers. Then a completely blank line. Then the response body. Here is what counter.pl writes to STDOUT:

Content-type: text/html

<html><body>
<p>You are visitor number: <b>000042</b></p>
</body></html>

The first line is a CGI header: Content-type: text/html. This tells Apache what kind of content the script is producing. Apache takes this header, adds its own headers (Date, Server, Content-Length, Connection), and constructs a proper HTTP response:

HTTP/1.1 200 OK
Date: Wed, 15 Jul 1998 14:30:00 GMT
Server: Apache/1.3.37 (Unix)
Content-type: text/html
Content-Length: 67
Connection: keep-alive

<html><body>
<p>You are visitor number: <b>000042</b></p>
</body></html>

A CGI script could output several different headers to control the response:

Header Purpose Example
Content-type MIME type of the body Content-type: text/html
Location Redirect to another URL Location: http://www.example.com/
Status HTTP status code Status: 404 Not Found
Set-Cookie Set a browser cookie Set-Cookie: session=abc123; path=/

The single most common CGI error:

# WRONG — missing blank line after headers
print "Content-type: text/html\n";
print "<html><body>Hello</body></html>\n";
# RIGHT — note the TWO newlines (\n\n) ending the header
print "Content-type: text/html\n\n";
print "<html><body>Hello</body></html>\n";

The blank line between headers and body is mandatory. It is what tells Apache where headers end and content begins. If you forget it, Apache cannot parse the response and returns 500 Internal Server Error. This single missing newline character was responsible for more time spent debugging CGI scripts than perhaps any other issue. Every person who learned CGI programming made this mistake at least once.

For scripts that needed to redirect instead of returning content, the output was even simpler:

#!/usr/bin/perl
# redirect.cgi — send the user elsewhere
print "Location: http://www.worldwidemart.com/scripts/\n\n";

No body at all — just a Location header and the mandatory blank line. Apache would translate this into an HTTP 302 Found redirect response.

Step 7: Process Termination

The Perl interpreter reaches the end of counter.pl. The exit(0) is implicit — when there is no more code to execute, Perl exits with status 0 (success). At this point, the operating system reclaims everything:

  • All open file handles are closed.
  • All allocated memory is freed.
  • The process entry is removed from the kernel’s process table.
  • Apache’s parent process receives a SIGCHLD signal indicating the child has terminated.
  • Apache reads any remaining data from the pipe, finishes sending the response to the client, and moves on to the next request.

And here is the defining characteristic of CGI: nothing survives between requests. No variables persist. No database connections remain open. No cached templates stay in memory. No session data is preserved in the process. The next request to the same script starts from absolute zero: fork(), exec(), load Perl, parse the script, initialize variables, open files, open database connections, do the work, output the result, and die.

This statelessness was both CGI’s greatest simplicity and its greatest weakness. Simple, because you never had to worry about memory leaks, stale connections, or corrupted state — every request got a pristine environment. Weak, because you paid the full startup cost on every single request, and any data you needed to persist between requests had to be stored externally: in files, in databases, or in cookies sent back to the browser.

Sessions, for example, typically worked by generating a random ID, storing it in a cookie, and saving session data to a file on disk named after that ID. Every CGI request would read the cookie, open the corresponding file, deserialize the session data, do its work, serialize the data back, and write the file again. Compared to modern in-memory session stores, this was glacially slow — but it was the only option available.

The Full Lifecycle

Here is the complete sequence of events, from the browser’s perspective to the server’s internal operations, for a single CGI request:

Browser                        Apache                          CGI Script
=======                        ======                          ==========

GET /cgi-bin/counter.pl
?style=led&digits=6
 ─────────────────────────────>
                               Parse request
                               Match ScriptAlias /cgi-bin/
                               Check file exists (755)
                               Read shebang: #!/usr/bin/perl
                               
                               fork()
                                ├── Parent: wait for child
                                └── Child:
                                    Set ENV variables
                                    exec("/usr/bin/perl",
                                         "counter.pl")
                                     ──────────────────────────>
                                                                Perl starts
                                                                Parse counter.pl
                                                                Read %ENV
                                                                
                                                                open(count.dat)
                                                                $count = 41
                                                                $count++ → 42
                                                                write(count.dat)
                                                                
                                                                print headers
                                                                print HTML
                                     <──────────────────────────
                                                                exit(0)
                                                                [process dies]
                               
                               Read pipe (headers + body)
                               Add HTTP headers
                               (Date, Server, Content-Length)
                               
 <─────────────────────────────
HTTP/1.1 200 OK
Content-type: text/html

<html>...visitor 000042...</html>

 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  [Connection closed or kept alive for next request]

Total elapsed time for this request on typical 1998 hardware: 200–600 ms. Of that, perhaps 5–20 ms was spent in the actual script logic. The rest was overhead: process creation, interpreter loading, and process destruction.

On a modern server, the same CGI script would execute in 10–30 ms total. The fork()/exec() overhead on 2020s hardware is under 5 ms. But nobody runs CGI on modern servers — not because the overhead is still painful, but because vastly better alternatives exist.

Why This Was Slow (And What Replaced It)

The CGI model has four fundamental performance problems, all stemming from its process-per-request architecture:

  1. Process creation overheadfork() + exec() on every request. On 1990s hardware, 50–500 ms per request just for process creation. On a busy shared hosting server, this alone could saturate the CPU.
  2. Interpreter loading — The Perl interpreter (or Python, or whatever language the script used) had to be loaded from disk and initialized on every request. For Perl with common modules, this meant loading and parsing 50,000+ lines of library code before a single line of the actual script ran.
  3. No connection pooling — If the script used a database, it opened a new connection, authenticated, ran its queries, and closed the connection. Database connection setup typically took 20–50 ms. For database-heavy scripts, this could double the total request time.
  4. No shared state — Compiled templates, configuration files, cached data — everything had to be read and parsed fresh on every request. There was no mechanism to share data between requests because each request ran in a completely separate process.

The solutions appeared in stages, each keeping more state alive between requests:

Technology Year Approach Speed vs. CGI
mod_perl 1996 Perl interpreter embedded in Apache; scripts precompiled and cached 10–30x faster
FastCGI 1996 Persistent process communicates with server via Unix socket; no fork() per request 5–20x faster
PHP (mod_php) 1997 PHP interpreter embedded as Apache module; scripts compiled on each request but no process overhead 5–15x faster
Java Servlets 1997 Persistent JVM process; compiled bytecode; connection pooling; shared application state 20–50x faster
ASP (IIS) 1996 VBScript/JScript embedded in IIS; compiled and cached; session management built in 10–20x faster

Each of these technologies addressed the same core problem: eliminate the process-per-request overhead by keeping the runtime alive between requests. The details differed — some embedded the interpreter in the server, some used persistent external processes, some ran in their own application server — but the principle was identical. Stop creating and destroying processes. Keep the interpreter running. Cache compiled code. Pool database connections. Share state in memory.

For a deeper look at how each of these technologies emerged and eventually replaced CGI, see What Replaced CGI?

The Elegance of the Model

Despite its performance limitations, the CGI model has an elegance that modern web frameworks often lack. A CGI script is just a program that reads from environment variables and stdin, and writes to stdout. It can be written in any language. It can be tested from the command line by setting environment variables manually. It requires no framework, no dependency manager, no build system. If your program can print text to the terminal, it can be a CGI script.

This simplicity is why CGI dominated the early web. You did not need to understand event loops, middleware stacks, or dependency injection to make a webpage that said “Hello World.” You needed to understand print. And that was the genius of it — CGI turned every programming language into a web programming language, with nothing more than environment variables and standard I/O.

#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Hello, World.";

Three lines. No dependencies. No configuration files. No package manager. No build step. Just a program that prints a header and a greeting. In 1995, this was revolutionary. In 2026, it is worth understanding — because everything that came after was built to solve the problems that these three lines, multiplied by millions of requests, created.

Related Reading

What is CGI?

The complete overview of Common Gateway Interface — history, specification, and how it shaped the web.

What Replaced CGI?

From mod_perl to FastCGI to PHP — the technologies that solved CGI’s performance problems.

What is /cgi-bin/?

The special directory that told the server to execute files instead of serving them.

Graphical Counter Script

The hit counter CGI script that appeared on millions of 90s homepages.

How to Read Perl Code

A guide to understanding legacy Perl CGI scripts — sigils, regexes, and all.

Glossary: CGI

Quick-reference definition of CGI and related terms from the early web era.