A static site generator that can put the toothpaste back in the tube.
git clone https://git.mulligrubs.me/squeeze
Log | Files | Refs | README | LICENSE

commit 6dacf6db82a636db9434a84fc479860fe7a03c04
parent 193747ba2cd502be711ee8a621e94ec57a1a1ce7
Author: St John Karp <stjohn@mulligrubs.me>
Date:   Tue, 17 Nov 2020 04:40:00 -0600

Strip HTML entities from the RSS title

HTML entities in the RSS title weren't being escaped, which caused some
strict XML parsers to fail to read it properly. I tried XML escaping
the title, as we already do with the description, but some RSS
readers don't read the title as HTML and so don't display the entities
correctly. Best thing to do is just strip out any HTML entities from
the title and treat it as plain text.

This fix covers the most likely entities to be used, but support
for others can be added easily enough.

Mgenerate_rss.pl | 9++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/generate_rss.pl b/generate_rss.pl @@ -50,7 +50,7 @@ file_list([File|FileList]) --> % Read in each file as an article predicate. files_to_articles([], []). -files_to_articles([Filename|Filenames], [article(FormattedDate, Title, Link, Description)|Articles]):- +files_to_articles([Filename|Filenames], [article(FormattedDate, FormattedTitle, Link, Description)|Articles]):- open(Filename, read, Stream), read_file(Stream, HTML), close(Stream), @@ -64,6 +64,13 @@ files_to_articles([Filename|Filenames], [article(FormattedDate, Title, Link, Des replace("&", "&amp;", Entry, EntryAmp), replace("<", "&lt;", EntryAmp, EntryLT), replace(">", "&gt;", EntryLT, Description), + % Strip HTML entities from the title. + replace("&amp;", "&", Title, TitleAmp), + replace("&lsquo;", "'", TitleAmp, TitleLSQuo), + replace("&rsquo;", "'", TitleLSQuo, TitleRSQuo), + replace("&ldquo;", "\"", TitleRSQuo, TitleLDQuo), + replace("&rdquo;", "\"", TitleLDQuo, TitleRDQuo), + replace("&hellip;", "...", TitleRDQuo, FormattedTitle), files_to_articles(Filenames, Articles).