Basically I\'ve got an old static html site ( http://www.brownwatson.co.uk/brochure/page1.html ) I need to add a search box to it to search a folder called /brochure within
I was searching for solution for searching for my blog created using Jekyll but didn't found good one, also Custom Google Search was giving me ads and results from subdomains, so it was not good. So I've created my own solution for this. I've written an article about how to create search for static site like Jekyll it's in Polish and translated using google translate.
Probably will create better manual translation or rewrite on my English blog soon.
The solution is python script that create SQLite database from HTML files and small PHP script that show search results. But it will require that your static site hosting also support PHP.
Just in case the article go down, here is the code, it's created just for my blog (my html and file structure) so it need to be tweaked to work with your blog.
Python script:
import os, sys, re, sqlite3
from bs4 import BeautifulSoup
def get_data(html):
"""return dictionary with title url and content of the blog post"""
tree = BeautifulSoup(html, 'html5lib')
body = tree.body
if body is None:
return None
for tag in body.select('script'):
tag.decompose()
for tag in body.select('style'):
tag.decompose()
for tag in body.select('figure'): # ignore code snippets
tag.decompose()
text = tree.findAll("div", {"class": "body"})
if len(text) > 0:
text = text[0].get_text(separator='\n')
else:
text = None
title = tree.findAll("h2", {"itemprop" : "title"}) # my h2 havee this attr
url = tree.findAll("link", {"rel": "canonical"}) # get url
if len(title) > 0:
title = title[0].get_text()
else:
title = None
if len(url) > 0:
url = url[0]['href']
else:
url = None
result = {
"title": title,
"url": url,
"text": text
}
return result
if __name__ == '__main__':
if len(sys.argv) == 2:
db_file = 'index.db'
# usunięcie starego pliku
if os.path.exists(db_file):
os.remove(db_file)
conn = sqlite3.connect(db_file)
c = conn.cursor()
c.execute('CREATE TABLE page(title text, url text, content text)')
for root, dirs, files in os.walk(sys.argv[1]):
for name in files:
# my files are in 20.* directories (eg. 2018) [/\\] is for windows and unix
if name.endswith(".html") and re.search(r"[/\\]20[0-9]{2}", root):
fname = os.path.join(root, name)
f = open(fname, "r")
data = get_data(f.read())
f.close()
if data is not None:
data = (data['title'], data['url'], data['text']
c.execute('INSERT INTO page VALUES(?, ?, ?)', data))
print "indexed %s" % data['url']
sys.stdout.flush()
conn.commit()
conn.close()
and PHP search script:
function mark($query, $str) {
return preg_replace("%(" . $query . ")%i", '$1', $str);
}
if (isset($_GET['q'])) {
$db = new PDO('sqlite:index.db');
$stmt = $db->prepare('SELECT * FROM page WHERE content LIKE :var OR title LIKE :var');
$wildcarded = '%'. $_GET['q'] .'%';
$stmt->bindParam(':var', $wildcarded);
$stmt->execute();
$data = $stmt->fetchAll(PDO::FETCH_ASSOC);
$query = str_replace("%", "\\%", preg_quote($_GET['q']));
$re = "%(?>\S+\s*){0,10}(" . $query . ")\s*(?>\S+\s*){0,10}%i";
if (count($data) == 0) {
echo "Brak wyników
";
} else {
foreach ($data as $row) {
if (preg_match($re, $row['content'], $match)) {
echo '' . mark($query, $row['title']) . '';
$text = trim($match[0], " \t\n\r\0\x0B,.{}()-");
echo '
' . mark($query, $text) . '
';
}
}
}
}
In my code an in article I've wrapped this PHP script in the same layout as other pages by adding front matter to PHP file.
If you can't use PHP on your hosting you can try to use sql.js which is SQLite compiled to JS with Emscripten. Here is example how to use ajax to load a file.