squeeze

A static site generator that can put the toothpaste back in the tube.
git clone https://git.mulligrubs.me/squeeze
Log | Files | Refs | README | LICENSE

commit 6c976763f193e3016e660facc7a15558a79dc069
parent 9c85393bd337f771268514e1667e4b032403de13
Author: St John Karp <contact@stjo.hn>
Date:   Tue, 17 Nov 2020 04:40:00 -0600

Strip HTML entities from the RSS title

HTML entities in the RSS title weren't being escaped, which caused some
strict XML parsers to fail to read it properly. I tried XML escaping
the title, as we already do with the description, but some RSS
readers don't read the title as HTML and so don't display the entities
correctly. Best thing to do is just strip out any HTML entities from
the title and treat it as plain text.

This fix covers the most likely entities to be used, but support
for others can be added easily enough.

Diffstat:
Mgenerate_rss.pl | 9++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/generate_rss.pl b/generate_rss.pl @@ -50,7 +50,7 @@ file_list([File|FileList]) --> % Read in each file as an article predicate. files_to_articles([], []). -files_to_articles([Filename|Filenames], [article(FormattedDate, Title, Link, Description)|Articles]):- +files_to_articles([Filename|Filenames], [article(FormattedDate, FormattedTitle, Link, Description)|Articles]):- open(Filename, read, Stream), read_file(Stream, HTML), close(Stream), @@ -64,6 +64,13 @@ files_to_articles([Filename|Filenames], [article(FormattedDate, Title, Link, Des replace("&", "&amp;", Entry, EntryAmp), replace("<", "&lt;", EntryAmp, EntryLT), replace(">", "&gt;", EntryLT, Description), + % Strip HTML entities from the title. + replace("&amp;", "&", Title, TitleAmp), + replace("&lsquo;", "'", TitleAmp, TitleLSQuo), + replace("&rsquo;", "'", TitleLSQuo, TitleRSQuo), + replace("&ldquo;", "\"", TitleRSQuo, TitleLDQuo), + replace("&rdquo;", "\"", TitleLDQuo, TitleRDQuo), + replace("&hellip;", "...", TitleRDQuo, FormattedTitle), files_to_articles(Filenames, Articles).