Website Keyword Search Engine Script
Simple Search allows you to integrate a keyword and boolean search program into your website. Users can search specified text and HTML documents, and the script returns a list of matching files displayed by their title tags. Perfect for small to medium-sized sites without database-driven search.
Site search has evolved dramatically. Today's solutions offer typo tolerance, instant results, faceting, and AI-powered relevance.
Lightning fast (<50ms), written in Rust. Typo-tolerant, instant search. Great documentation. Inspired by Algolia but open source.
Written in C++, blazing fast. Typo correction, tunable ranking, faceting. Battle-tested since 2015. Great Algolia alternative.
Fork of Elasticsearch (Apache 2.0). Full-text search, analytics, dashboards. Enterprise-grade, AWS-backed.
Fully static search for static sites. Auto-indexes at build time. 10KB JS, no server needed. Perfect for Hugo, Jekyll, Astro.
Client-side search with pre-built index. Stemming, boosting, field search. No server required, works offline.
Lightweight fuzzy search. No dependencies, works in browser and Node. Perfect for small datasets (<10K items).
Industry leader. Used by Stripe, Twitch, Slack. Instant, typo-tolerant. Enterprise pricing.
Free 10K searches/mo Paid from $1/1KManaged Elasticsearch. Full-text, analytics, APM. Enterprise features, global infrastructure.
From $95/moFree for open source documentation. Auto-crawls your docs. Used by React, Vue, Bootstrap docs.
Free for OSS| Solution | Type | Best For | Typo Tolerance | Self-Hosted | Cost |
|---|---|---|---|---|---|
| Simple Search (1990s) | CGI | Historical interest | No | Yes | Free |
| Meilisearch | Server | Apps, e-commerce | Yes | Yes | Free / Cloud |
| Typesense | Server | Apps, high traffic | Yes | Yes | Free / $19+ |
| Pagefind | Static | Static sites, docs | Basic | Yes | Free |
| Lunr.js | Client-side | Small static sites | Basic | Yes | Free |
| Algolia | SaaS | Enterprise, e-commerce | Yes | No | $$-$$$ |
Simple Search is a lightweight Perl CGI script that provides search functionality for static HTML and text files. It reads through your specified documents, searches for keywords, and returns results with links to matching pages.
The script scans through files in specified directories, looking for keyword matches. When a match is found, it extracts the page title from the HTML <title> tag and creates a clickable link in the results. Boolean operators (AND, OR) can combine multiple keywords.
| File | Description |
|---|---|
search.pl |
Main Perl script that performs the search and displays results |
search.html |
HTML form template for the search interface |
README |
Installation instructions and configuration guide |
Search for single keywords or phrases across all specified documents on your website.
Support for AND/OR operators to combine multiple keywords for refined search results.
Automatically extracts page titles from HTML documents to display meaningful result links.
Configure multiple directories to search, allowing you to cover your entire site structure.
Search both plain text (.txt) and HTML files (.html, .htm) for maximum coverage.
Modify the results page template to match your website's design and branding.
| Option | Description | Example |
|---|---|---|
| Single Keyword | Search for a single word in all documents | perl |
| Multiple Keywords (AND) | Find documents containing ALL specified keywords | perl cgi script |
| Multiple Keywords (OR) | Find documents containing ANY of the keywords | perl OR php OR python |
| Case Insensitive | Searches are case insensitive by default | PERL = perl = Perl |
search.pl to your cgi-bin directory.
#!/usr/bin/perl).
@directories array with the paths to directories you want to search.
$baseurl variable to match your website's URL structure.
chmod 755 search.pl
search.html to create your search interface.
<!DOCTYPE html>
<html>
<head>
<title>Search Our Site</title>
</head>
<body>
<h1>Search</h1>
<form action="/cgi-bin/search.pl" method="GET">
<p>
<label for="keywords">Enter Keywords:</label><br>
<input type="text" name="keywords" id="keywords" size="40">
</p>
<p>
Search Type:<br>
<input type="radio" name="boolean" value="AND" id="and" checked>
<label for="and">Match ALL keywords (AND)</label><br>
<input type="radio" name="boolean" value="OR" id="or">
<label for="or">Match ANY keyword (OR)</label>
</p>
<p>
<input type="submit" value="Search">
<input type="reset" value="Clear">
</p>
</form>
<!-- Bootstrap 5 version -->
<form action="/cgi-bin/search.pl" method="GET" class="needs-validation">
<div class="mb-3">
<label for="keywords" class="form-label">Enter Keywords</label>
<input type="text" class="form-control" name="keywords" id="keywords"
placeholder="Search..." required>
</div>
<div class="mb-3">
<label class="form-label">Search Type</label>
<div class="form-check">
<input class="form-check-input" type="radio" name="boolean"
value="AND" id="and" checked>
<label class="form-check-label" for="and">
Match ALL keywords (AND)
</label>
</div>
<div class="form-check">
<input class="form-check-input" type="radio" name="boolean"
value="OR" id="or">
<label class="form-check-label" for="or">
Match ANY keyword (OR)
</label>
</div>
</div>
<button type="submit" class="btn btn-primary">
<i class="bi bi-search"></i> Search
</button>
</form>
</body>
</html>
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use File::Find;
my $cgi = CGI->new;
# Configuration
my @directories = ('/var/www/html/docs', '/var/www/html/pages');
my $baseurl = 'http://example.com';
my @extensions = qw(html htm txt);
# Get search parameters
my $keywords = $cgi->param('keywords') || '';
my $boolean = $cgi->param('boolean') || 'AND';
# Security: sanitize input
$keywords =~ s/[^\w\s]//g;
my @terms = split(/\s+/, lc($keywords));
# Output HTML header
print $cgi->header('text/html');
print $cgi->start_html('Search Results');
print "Search Results
\n";
if (!@terms) {
print "Please enter search keywords.
\n";
print $cgi->end_html;
exit;
}
print "Searching for: $keywords ($boolean)
\n";
# Search files
my @results;
find(sub {
return unless -f;
my $file = $File::Find::name;
# Check extension
my ($ext) = $file =~ /\.(\w+)$/;
return unless $ext && grep { $_ eq lc($ext) } @extensions;
# Read file content
open(my $fh, '<', $file) or return;
my $content = do { local $/; <$fh> };
close($fh);
$content = lc($content);
# Check for matches
my $match = 0;
if ($boolean eq 'AND') {
$match = 1;
for my $term (@terms) {
unless ($content =~ /\b\Q$term\E\b/i) {
$match = 0;
last;
}
}
} else { # OR
for my $term (@terms) {
if ($content =~ /\b\Q$term\E\b/i) {
$match = 1;
last;
}
}
}
if ($match) {
# Extract title
my ($title) = $content =~ /([^<]+)<\/title>/i;
$title ||= $file;
# Convert path to URL
my $url = $file;
$url =~ s{^/var/www/html}{$baseurl};
push @results, { title => $title, url => $url };
}
}, @directories);
# Display results
if (@results) {
print "Found " . scalar(@results) . " result(s):
\n";
print "\n";
for my $result (@results) {
print qq{- $result->{title}
\n};
}
print "
\n";
} else {
print "No results found.
\n";
}
print $cgi->end_html;
exit 0;
<?php
/**
* Simple Search - PHP Version
*/
// Configuration
$config = [
'directories' => [
'/var/www/html/docs',
'/var/www/html/pages'
],
'baseurl' => 'https://example.com',
'extensions' => ['html', 'htm', 'txt', 'php'],
'max_results' => 100
];
// Get search parameters
$keywords = $_GET['keywords'] ?? '';
$boolean = $_GET['boolean'] ?? 'AND';
// Security: sanitize input
$keywords = preg_replace('/[^\w\s]/u', '', $keywords);
$terms = array_filter(explode(' ', strtolower($keywords)));
function searchFiles($directories, $extensions, $baseurl) {
$files = [];
foreach ($directories as $dir) {
if (!is_dir($dir)) continue;
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($dir)
);
foreach ($iterator as $file) {
if (!$file->isFile()) continue;
$ext = strtolower($file->getExtension());
if (!in_array($ext, $extensions)) continue;
$files[] = [
'path' => $file->getPathname(),
'url' => str_replace('/var/www/html', $baseurl, $file->getPathname())
];
}
}
return $files;
}
function extractTitle($content, $fallback) {
if (preg_match('/([^<]+)<\/title>/i', $content, $matches)) {
return htmlspecialchars($matches[1]);
}
return htmlspecialchars(basename($fallback));
}
function matchesSearch($content, $terms, $boolean) {
$content = strtolower($content);
if ($boolean === 'AND') {
foreach ($terms as $term) {
if (stripos($content, $term) === false) {
return false;
}
}
return true;
} else { // OR
foreach ($terms as $term) {
if (stripos($content, $term) !== false) {
return true;
}
}
return false;
}
}
// Perform search
$results = [];
if (!empty($terms)) {
$files = searchFiles(
$config['directories'],
$config['extensions'],
$config['baseurl']
);
foreach ($files as $file) {
$content = @file_get_contents($file['path']);
if ($content === false) continue;
if (matchesSearch($content, $terms, $boolean)) {
$results[] = [
'title' => extractTitle($content, $file['path']),
'url' => $file['url']
];
if (count($results) >= $config['max_results']) {
break;
}
}
}
}
?>
<!DOCTYPE html>
<html>
<head>
<title>Search Results</title>
</head>
<body>
<h1>Search Results</h1>
<?php if (empty($terms)): ?>
<p>Please enter search keywords.</p>
<?php else: ?>
<p>Searching for: <strong><?= htmlspecialchars($keywords) ?></strong>
(<?= $boolean ?>)</p>
<?php if (!empty($results)): ?>
<p>Found <?= count($results) ?> result(s):</p>
<ul>
<?php foreach ($results as $result): ?>
<li>
<a href="<?= htmlspecialchars($result['url']) ?>">
<?= $result['title'] ?>
</a>
</li>
<?php endforeach; ?>
</ul>
<?php else: ?>
<p>No results found.</p>
<?php endif; ?>
<?php endif; ?>
</body>
</html>
/**
* Simple Search - JavaScript Version
* Client-side search for static sites (requires pre-built index)
*/
class SimpleSearch {
constructor(options = {}) {
this.options = {
indexUrl: '/search-index.json',
inputSelector: '#search-input',
resultsSelector: '#search-results',
minChars: 2,
maxResults: 20,
highlightMatches: true,
...options
};
this.index = [];
this.init();
}
async init() {
await this.loadIndex();
this.bindEvents();
}
async loadIndex() {
try {
const response = await fetch(this.options.indexUrl);
this.index = await response.json();
} catch (error) {
console.error('Failed to load search index:', error);
}
}
bindEvents() {
const input = document.querySelector(this.options.inputSelector);
if (input) {
input.addEventListener('input', (e) => this.handleSearch(e.target.value));
// Handle form submission
input.closest('form')?.addEventListener('submit', (e) => {
e.preventDefault();
this.handleSearch(input.value);
});
}
}
handleSearch(query) {
if (query.length < this.options.minChars) {
this.displayResults([]);
return;
}
const results = this.search(query);
this.displayResults(results, query);
}
search(query) {
const terms = query.toLowerCase().split(/\s+/).filter(t => t.length > 0);
if (terms.length === 0) return [];
return this.index
.map(item => ({
...item,
score: this.calculateScore(item, terms)
}))
.filter(item => item.score > 0)
.sort((a, b) => b.score - a.score)
.slice(0, this.options.maxResults);
}
calculateScore(item, terms) {
let score = 0;
const titleLower = item.title.toLowerCase();
const contentLower = (item.content || '').toLowerCase();
for (const term of terms) {
// Title matches are worth more
if (titleLower.includes(term)) {
score += 10;
// Exact word match in title
if (new RegExp(`\\b${term}\\b`).test(titleLower)) {
score += 5;
}
}
// Content matches
const contentMatches = (contentLower.match(new RegExp(term, 'g')) || []).length;
score += Math.min(contentMatches, 5); // Cap at 5 points per term
}
return score;
}
displayResults(results, query = '') {
const container = document.querySelector(this.options.resultsSelector);
if (!container) return;
if (results.length === 0) {
container.innerHTML = query.length >= this.options.minChars
? 'No results found.
'
: '';
return;
}
const html = results.map(result => {
let title = result.title;
let snippet = result.snippet || '';
if (this.options.highlightMatches && query) {
const terms = query.split(/\s+/);
terms.forEach(term => {
const regex = new RegExp(`(${term})`, 'gi');
title = title.replace(regex, '$1');
snippet = snippet.replace(regex, '$1');
});
}
return `
`;
}).join('');
container.innerHTML = `
Found ${results.length} result(s):
${html}
`;
}
}
// Usage
const search = new SimpleSearch({
indexUrl: '/search-index.json',
inputSelector: '#search-input',
resultsSelector: '#search-results'
});
// Building a search index (Node.js build script example)
/*
const fs = require('fs');
const path = require('path');
const cheerio = require('cheerio');
function buildIndex(directory) {
const index = [];
const files = walkDir(directory);
files.forEach(file => {
if (!file.endsWith('.html')) return;
const content = fs.readFileSync(file, 'utf-8');
const $ = cheerio.load(content);
index.push({
url: file.replace(directory, ''),
title: $('title').text() || path.basename(file),
content: $('body').text().replace(/\s+/g, ' ').slice(0, 500),
snippet: $('meta[name="description"]').attr('content') || ''
});
});
return index;
}
fs.writeFileSync('search-index.json', JSON.stringify(buildIndex('./public')));
*/
Main Perl script that performs keyword searches
HTML form template for the search interface
Installation instructions and configuration guide
Community-contributed enhancements and localizations:
A modified version with built-in debugging options to help troubleshoot search issues. Created by the MSA help list community.
Community ContributionA localized version modified to search Japanese character sets (Shift-JIS, EUC-JP). Demonstrates internationalization of the script.
Localization@exclude = ('admin', 'private', '*.bak');) and check each file against these patterns before searching. This is useful for excluding administrative pages, backup files, or development directories.
<mark> or <strong> tags. You'll need to find the position of the keyword in the content, extract surrounding text (50-100 characters before and after), and apply the highlighting.
lc() functions that convert text to lowercase before comparison. You can also add a checkbox to the search form letting users choose case sensitivity, then check that parameter in the script.
page parameter to the query string and modify the script to: (1) Count total results, (2) Calculate offset based on page number and results per page, (3) Slice the results array to show only the current page, (4) Generate pagination links with the search query and page numbers. Example: ?keywords=perl&page=2.