How to Read Legacy Perl Code: A Guide for Modern Developers

You inherited a server. There is a /cgi-bin/ directory. Inside: files ending in .pl with code that looks like someone encrypted it. This is Perl. Here is how to read it.

Why You Might Need This

Nobody starts a new project in Perl in 2026. But Perl is everywhere on older servers. You take over a VPS from a previous developer. You audit a government website that has been running since 2003. You help a university department migrate from Apache on Solaris. In every one of these situations, you will find Perl scripts — and you will need to understand what they do before you can safely remove, replace, or update them.

The most common scenario is discovering a /cgi-bin/ directory on a web server. Inside, there will be .pl and .cgi files handling form submissions, hit counters, guestbooks, and redirects. Many of these scripts are from Matt’s Script Archive, originally distributed from this very domain in the 1990s. FormMail.pl alone was installed on hundreds of thousands of servers. Many of those installations are still running.

Here is the key insight: you do not need to write Perl. You need to read it. You need to look at a script and understand what it does, what files it touches, what environment variables it reads, and whether it is a security risk. This guide teaches exactly that. Every example in this article comes from real-world scripts you are likely to encounter.

If you know Python, JavaScript, PHP, or really any modern language, you already understand 80% of programming concepts. Perl just has different syntax for the same ideas. Let us bridge that gap.

The Shebang Line

Every Perl script starts with a line that tells the operating system which interpreter to use:

#!/usr/bin/perl

This is called the shebang (from “hash-bang”: #!). When you run ./script.pl, the OS reads this first line and knows to pass the rest of the file to /usr/bin/perl for execution. It is not a Perl comment — it is an instruction for the Unix kernel.

You will see variations depending on where Perl was installed on a given system:

#!/usr/local/bin/perl     # FreeBSD, older systems
#!/usr/bin/env perl       # portable — searches PATH for perl
#!/usr/bin/perl -w        # -w enables warnings (like Python -W)
#!/usr/bin/perl -T        # -T enables taint mode (security feature)

If a CGI script is not executing and you see a 500 error in the server logs, check the shebang first. On CentOS and AlmaLinux, Perl lives at /usr/bin/perl. On FreeBSD and some shared hosting, it is at /usr/local/bin/perl. A wrong path means the OS literally cannot find the interpreter. Run which perl on the server to find the correct path.

The -T flag is especially important for CGI scripts. Taint mode forces the script to validate all external input before using it in dangerous operations like file access or system calls. If you see -T in the shebang, that is a good sign — the original developer cared about security.

Variables: $, @, %

Perl uses sigils — special characters at the beginning of variable names that tell you what type of data the variable holds. There are exactly three to learn:

$name = "FormMail";                                        # scalar (one value)
@fields = ("name", "email", "message");                    # array (ordered list)
%config = (
    mailprog  => "/usr/sbin/sendmail",
    recipient => "[email protected]",
);                                                         # hash (key-value pairs)

This is the single most important concept in Perl. Once you recognize the sigils, you can read almost any Perl code:

SigilTypePerlPythonJavaScript
$Scalar (one value)$name = "hello";name = "hello"let name = "hello";
@Array (ordered list)@items = (1, 2, 3);items = [1, 2, 3]let items = [1, 2, 3];
%Hash (key-value map)%map = (a => 1);map = {"a": 1}let map = {a: 1};

The tricky part: when you access a single element from an array or hash, the sigil changes to $ because you are getting one value back:

$fields[0]        # "name"    — one element from @fields (use $ because it is a scalar)
$fields[1]        # "email"   — index starts at 0, just like every other language
$config{mailprog} # "/usr/sbin/sendmail" — one value from %config

@fields           # the whole array: ("name", "email", "message")
%config           # the whole hash

Notice the brackets: square brackets [] for array indexing, curly braces {} for hash key lookup. This is identical to how you read dictionaries and arrays in other languages. The only difference is the $ in front.

You will also see $#array which gives you the last index of an array (not the length). So if @fields has 3 elements, $#fields is 2. To get the count of elements, you use scalar(@fields) or put the array in a scalar context.

One more thing: => in Perl hashes is called the fat comma. It is just a comma that also automatically quotes the string on its left side. So mailprog => "/usr/sbin/sendmail" is the same as "mailprog", "/usr/sbin/sendmail". A hash is really just a flat list of alternating keys and values.

The Magic Variable: $_

$_ is Perl’s default variable. Think of it as the pronoun “it” in English. When you see a function called without an argument, it is almost certainly operating on $_. This is the single most confusing thing for people coming from other languages — but once you understand it, Perl code becomes dramatically more readable.

foreach (@fields) {
    print "$_\n";       # $_ is set to each element in turn
}

# The above is equivalent to:
foreach $field (@fields) {
    print "$field\n";   # explicit variable — same result
}

The foreach loop without a named variable automatically assigns each element to $_. This is the pattern you will see most often. Here is another common one:

while (<STDIN>) {        # read one line at a time from input
    chomp;               # remove trailing newline from $_
    next if /^#/;        # skip comment lines (matches $_ against regex)
    print;               # print $_ to stdout
}

In this example, four things happen implicitly with $_:

  1. <STDIN> reads a line and assigns it to $_
  2. chomp without arguments operates on $_, removing the newline
  3. /^#/ without =~ matches against $_
  4. print without arguments prints $_

If you wrote this explicitly, it would be:

while ($_ = <STDIN>) {
    chomp($_);
    next if ($_ =~ /^#/);
    print($_);
}

Both versions produce identical results. Experienced Perl programmers use the implicit form because it is shorter and, once you know the convention, just as clear. The implicit form is the one you will find in legacy code.

Here are the most common functions that default to $_:

  • chomp — remove trailing newline
  • chop — remove last character
  • print — output to stdout
  • split — split string (when called with just a pattern)
  • lc, uc — lowercase, uppercase
  • defined — check if value exists
  • /regex/ — match against pattern

Other special variables you may encounter: $! (system error message, like errno in C), $0 (script name), @ARGV (command-line arguments, like sys.argv in Python), and %ENV (environment variables, like os.environ).

Regular Expressions

Perl is where modern regular expressions were popularized. The regex syntax you know from Python, JavaScript, PHP, Java, and Go — all of it traces back to Perl. When you see regex in a Perl script, you already know the pattern syntax. You just need to learn Perl’s operators for applying them.

Matching with =~

if ($email =~ /^[\w.+-]+@[\w.-]+\.\w{2,}$/) {
    print "Valid email\n";
}

# =~ means "match this string against this pattern"
# The regex itself is between the // delimiters
# This is equivalent to Python's re.match() or JavaScript's .match()

The =~ operator is called the binding operator. It connects a string (on the left) to a regex operation (on the right). The negated form !~ returns true when the pattern does not match.

Substitution with s///

$line =~ s/\r\n/\n/g;       # replace all CRLF with LF
$value =~ s/^\s+|\s+$//g;  # trim whitespace from both ends
$html =~ s/<[^>]*>//g;      # strip HTML tags (crude but common)

The s/old/new/flags syntax is a substitution. It works exactly like sed. The g flag means “global” — replace all occurrences, not just the first. Without g, only the first match is replaced.

Common flags you will see:

  • g — global (replace all)
  • i — case-insensitive
  • m — multiline (^ and $ match line boundaries)
  • s — single-line (dot matches newline)
  • e — evaluate replacement as code
  • x — extended (allows comments and whitespace in pattern)

Split and Capture

($name, $value) = split(/=/, $pair);         # split "key=value" on "="
@words = split(/\s+/, $line);                # split on whitespace
@parts = split(/,\s*/, "one, two, three");   # split on comma+optional space

# Capturing groups work with parentheses, just like every other language:
if ($url =~ /^https?:\/\/([\w.-]+)\/(.*)$/) {
    $host = $1;    # first capture group
    $path = $2;    # second capture group
}

Capture groups use $1, $2, $3, etc. to reference matched groups. This is the same as \1 in sed or group(1) in Python. The variables are set after a successful match and persist until the next regex operation.

The Transliteration Operator: tr///

$value =~ tr/+/ /;         # replace all + with space (URL decoding)
$text =~ tr/a-z/A-Z/;     # uppercase (like tr command in Unix)
$count = ($str =~ tr/,//); # count commas (tr returns count of replacements)

tr/// (also written as y///) is character-for-character transliteration. It is not regex. It maps each character in the first set to the corresponding character in the second set. You will encounter tr/+/ / in virtually every CGI script that processes form data — it converts URL-encoded spaces back to actual spaces.

File Operations

Perl’s file handling uses filehandles — uppercase bareword identifiers that represent an open file or stream. You will see this pattern in almost every legacy script:

open(FILE, "<data.txt") or die "Cannot open data.txt: $!";
while (<FILE>) {
    chomp;
    print "$_\n";
}
close(FILE);

The file modes use the same symbols as shell redirects:

ModeMeaningShell Equivalent
<Readcat file
>Write (truncate)> file
>>Append>> file
|Pipe to command| command
-|Pipe from commandcommand |

<FILE> inside angle brackets reads one line at a time. In a while loop, it reads the entire file line by line. The die function exits the script with an error message, and $! contains the system error (like “Permission denied” or “No such file or directory”).

The most important file pattern in CGI scripts is the pipe to sendmail:

open(MAIL, "|/usr/sbin/sendmail -t") or die "Cannot open sendmail: $!";
print MAIL "To: $recipient\n";
print MAIL "From: $sender\n";
print MAIL "Subject: Form Submission\n";
print MAIL "\n";
print MAIL $message;
close(MAIL);

This is how FormMail.pl sends email. The | before the command means “open a pipe to this program.” Everything written with print MAIL goes to sendmail’s standard input. When close(MAIL) is called, sendmail processes and delivers the message. This pattern — piping directly to sendmail — was the standard way to send email from a web server for over a decade.

Modern Perl uses three-argument open with lexical filehandles:

open(my $fh, '<', 'data.txt') or die "Cannot open: $!";
while (my $line = <$fh>) {
    chomp $line;
    print "$line\n";
}
close($fh);

If you see my $fh instead of a bareword like FILE, the code was written or updated more recently. Both forms work the same way. Legacy scripts from the 1990s and early 2000s almost always use bareword filehandles.

CGI-Specific Patterns

If you are reading Perl because you found it in a /cgi-bin/ directory, you will encounter specific patterns related to the Common Gateway Interface. CGI is the protocol that connects a web server (Apache, Nginx) to an executable script. The server passes form data and request information through environment variables and standard input. The script processes the data and writes HTTP headers and HTML to standard output.

Reading Form Data

Before Perl’s CGI.pm module became standard, every script parsed form data manually. This is the exact code you will find in FormMail.pl and dozens of similar scripts:

# Read POST data from STDIN
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

# Split into individual form fields
@pairs = split(/&/, $buffer);

# Decode each field
foreach $pair (@pairs) {
    ($name, $value) = split(/=/, $pair);

    # URL decoding: + becomes space
    $value =~ tr/+/ /;

    # URL decoding: %XX becomes the actual character
    $value =~ s/%([a-fA-F0-9]{2})/pack("C", hex($1))/eg;

    # Store in hash
    $FORM{$name} = $value;
}

Let us break this down line by line:

  1. read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}) — reads exactly CONTENT_LENGTH bytes from standard input. The web server sets this environment variable to tell the script how many bytes of POST data to expect.
  2. split(/&/, $buffer) — form data is URL-encoded as name=John&[email protected]. This splits it into individual name=value pairs.
  3. split(/=/, $pair) — splits each pair into the field name and field value.
  4. tr/+/ / — in URL encoding, spaces become +. This reverses that.
  5. s/%([a-fA-F0-9]{2})/pack("C", hex($1))/eg — this is the URL decode. The %XX sequences (like %40 for @) are converted back to characters. hex($1) converts the hex string to a number, and pack("C", ...) converts that number to its ASCII character. The e flag means the replacement side is evaluated as Perl code.
  6. $FORM{$name} = $value — stores the decoded value in a hash, accessible later as $FORM{'email'}, $FORM{'message'}, etc.

Environment Variables

CGI scripts get request information from %ENV, Perl’s hash of environment variables. The most common ones you will see:

$ENV{'QUERY_STRING'}     # GET parameters (after ? in URL)
$ENV{'REQUEST_METHOD'}   # GET, POST, HEAD
$ENV{'CONTENT_LENGTH'}   # size of POST body in bytes
$ENV{'CONTENT_TYPE'}     # usually application/x-www-form-urlencoded
$ENV{'REMOTE_ADDR'}      # client IP address
$ENV{'HTTP_REFERER'}     # page that linked to this script
$ENV{'HTTP_USER_AGENT'}  # browser identification string
$ENV{'SERVER_NAME'}      # hostname of the web server
$ENV{'SCRIPT_NAME'}      # path to the CGI script
$ENV{'DOCUMENT_ROOT'}    # web server document root

The HTTP_REFERER variable is what FormMail checks in its @referers array to prevent unauthorized access. See the FormMail troubleshooting guide for details on that specific check.

Outputting HTTP Headers

A CGI script must output HTTP headers before any HTML content. The minimum is a Content-Type header followed by a blank line:

print "Content-type: text/html\n\n";    # header + blank line
print "<html><body>Thank you!</body></html>\n";

# Or for a redirect:
print "Location: https://example.com/thanks.html\n\n";

If you see “Internal Server Error” and the script looks correct, check whether it outputs the Content-Type header before any other output. A single print statement before the header will break CGI. This is the second most common source of 500 errors after the shebang path.

Common Patterns Cheat Sheet

This table covers the Perl patterns you will encounter most often in legacy code, with equivalents in Python for reference:

PerlWhat It DoesPython Equivalent
$_Default variable (“it”)(no equivalent)
chompRemove trailing newlineline.rstrip('\n')
die "msg"Exit with errorraise Exception("msg")
unless ($x)Execute if condition is falseif not x:
$x =~ /pattern/Regex matchre.search(r'pattern', x)
$x =~ s/a/b/gRegex substitutere.sub(r'a', 'b', x)
qw(a b c)Quote words into list['a', 'b', 'c']
my $x = 5;Declare local variablex = 5
use strict;Require variable declarations(default in Python)
use warnings;Enable warning messagespython -W all
$hash{$key}Hash lookupdict[key]
push @arr, $valAdd to end of arrayarr.append(val)
pop @arrRemove from end of arrayarr.pop()
shift @arrRemove from start of arrayarr.pop(0)
join(",", @arr)Join array into string",".join(arr)
defined($x)Check if value existsx is not None
chomp(my $x = <STDIN>)Read one line from inputx = input()
foreach my $x (@arr) { }Loop over arrayfor x in arr:
nextSkip to next iterationcontinue
lastBreak out of loopbreak
# commentSingle-line comment# comment
eq, ne, lt, gtString comparison==, !=, <, >
==, !=, <, >Numeric comparison==, !=, <, >
String vs. Numeric Comparison: Perl has separate operators for comparing strings and numbers. == compares numerically (so "42" == "42.0" is true). eq compares as strings (so "42" eq "42.0" is false). If you see unexpected behavior in a condition, check which operator is being used.

A few more syntax elements worth recognizing:

# Postfix conditionals (read right-to-left):
print "Found it\n" if $found;          # print only if $found is true
die "Missing config" unless -e $file;  # die unless file exists

# -e, -f, -d, -r, -w are file test operators:
if (-e "/cgi-bin/formmail.pl") { }     # file exists?
if (-d "/var/www/html") { }            # is directory?
if (-r $config_file) { }              # is readable?

# String repetition:
$line = "-" x 40;                      # "----------------------------------------"

# Here-doc (multiline string):
print <<END_HTML;
<html>
<body>Thank you, $name.</body>
</html>
END_HTML

The here-doc syntax (<<END_HTML) is identical to how it works in bash and PHP. Everything between the marker and the closing marker is treated as a string, with variable interpolation. You will see this in CGI scripts that output HTML.

What To Do If You Find FormMail.pl

Security Warning: If you found FormMail.pl on a server, check its version immediately. Matt Wright’s original versions (1.0 through 1.6) have known vulnerabilities including open relay exploits that spammers actively scan for.

Open the file and look near the top for a version string:

grep -i "version\|formmail" /path/to/formmail.pl | head -5

If the version is 1.6 or below, the script is vulnerable to email injection attacks (CVE-2001-0357). Spammers use it as an open relay to send thousands of messages through your server. This will get your IP blacklisted and potentially your hosting account suspended.

Your options, from best to least effort:

  1. Remove it entirely. If the form is no longer needed, delete the script and the HTML form that references it. This is the safest option.
  2. Replace with NMS FormMail. The NMS project created secure drop-in replacements for Matt’s scripts. Their FormMail (v1.93+) fixes all known vulnerabilities while maintaining the same configuration format.
  3. Migrate to a modern solution. Replace the Perl script with Formspree, Web3Forms, or a simple PHP handler. See the migration guide for step-by-step instructions.

While you are at it, check for other scripts from that era: guestbook.pl, wwwboard.pl, counter.pl. These all have similar security issues. The complete script guide lists every program from Matt’s Script Archive and its known vulnerabilities.

Putting It All Together

Here is a real-world snippet that combines everything from this article. This is a simplified version of the core logic from a CGI contact form script:

#!/usr/bin/perl -T
use strict;
use warnings;

# Read form data
my %FORM;
if ($ENV{'REQUEST_METHOD'} eq 'POST') {
    my $buffer;
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    foreach my $pair (split(/&/, $buffer)) {
        my ($name, $value) = split(/=/, $pair, 2);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9]{2})/pack("C", hex($1))/eg;
        $value =~ s/[<>]//g;        # basic XSS prevention
        $FORM{$name} = $value;
    }
}

# Validate
die "No email provided" unless $FORM{'email'} =~ /^[\w.+-]+@[\w.-]+\.\w{2,}$/;

# Send email via sendmail
open(my $mail, '|-', '/usr/sbin/sendmail', '-t')
    or die "Cannot open sendmail: $!";
print $mail "To: admin\@example.com\n";
print $mail "From: $FORM{'email'}\n";
print $mail "Subject: Contact Form\n\n";
print $mail $FORM{'message'};
close($mail);

# Output response
print "Content-type: text/html\n\n";
print "<html><body><p>Thank you, $FORM{'name'}.</p></body></html>\n";

You should now be able to read every line of this script and understand exactly what it does. The sigils ($, %) identify variable types. The regex (=~, s///, tr///) handles URL decoding and validation. The filehandle ($mail) pipes data to sendmail. The %ENV hash provides CGI environment variables. The die and or provide error handling.

You do not need to become a Perl programmer. But you do need to recognize these patterns when you encounter them on a server. Now you can.

Related

FormMail.pl

The original FormMail script from Matt’s Script Archive. Documentation, download, and version history.

FormMail Troubleshooting

Fix Bad Referer, 500 errors, blank pages, and spam relay issues. Migration guide to modern alternatives.

The History of CGI

How the Common Gateway Interface worked and why it mattered. From NCSA httpd to Apache and beyond.

Glossary: Perl

Quick definition and context for Perl in web development history.

Glossary: cgi-bin

What is the cgi-bin directory and why does it exist on your server.

Complete Script Guide

All 18 programs from Matt’s Script Archive documented in one place.