[Privoxy-devel] privoxy debian patches

Roland Rosenfeld roland at spinnaker.de
Mon Oct 15 13:51:01 UTC 2018


Hi Lee!

On Fr, 12 Okt 2018, Lee wrote:

> >>> I just tried building the privoxy docs with & without the patch
> >>> and I don't see the difference.
> >>
> >> So it may no longer be necessary, at least on your system :-)
> >
> > Which is not all that reassuring.  Along with "© converted to
> > 8bit char" type things seem to still be happening:
> >
> > Lee at i3668 /source/privoxy/privoxy/doc/source
> > $ file *sgml | grep -v ASCII
> > p-authors.sgml:        exported SGML document, ISO-8859 text
> >
> > Lee at i3668 /source/privoxy/privoxy/doc/webserver/user-manual
> > $ file *html |grep -v ASCII
> > configuration.html: HTML document, ISO-8859 text
> > copyright.html:     HTML document, ISO-8859 text
> > index.html:         HTML document, ISO-8859 text
> > quickstart.html:    HTML document, ISO-8859 text

So maybe my Debian system has some different docbook configuration:
$ find . -name \*.html | xargs file | grep -v ASCII
./user-manual/copyright.html:             HTML document, ISO-8859 text

So here it's only the copyright file with the ISO-8859 chars in the
names, while all other files are 7bit (using SGML entities instead of
8bit chars), which I'd prefer for all files, since it's independent of
the charset or if we use ISO-8859-1 or UTF-8.

> > I'm guessing that if everything was utf8 there wouldn't be as many
> > opportunities for things to break (in a locale specific way?)

It may depend on the webserver serving the HTML files.  You can still
break everything if the server announces one charset in the HTTP
header while the HTML file announces a different charset.  Using only
7bit and SGML entities works around every issue with this.

> > That sounds like an interesting project; see if I can get everything
> > handled as utf8

It's at least better than ISO-8859-1 in this century :-)

> If I edit
>   /usr/share/sgml/docbook/xsl-stylesheets/html/profile-docbook.xsl
> and make this change
>   <!-- xsl:output method="html" encoding="ISO-8859-1" indent="no"/ -->
>   <xsl:output method="html" encoding="utf-8" indent="no"/>
> docbook generates utf-8 output.  But changing the standard profile
> seems like the wrong way to do it.

Fully agree.

> Maybe I could figure out how to do a privoxy-specific profile
> (stylesheet? catalog?) and leave the docbook system files alone, but
> that seems like a lot of work for doubtful benefits.

No idea how to do this, but I didn't dig deep into this myself.

> Anyone know if there's a way to tell docbook to leave things like
> © or   alone instead of turning them into the character
> they represent when creating html output?

Seems that on Debian systems the old 06_8bit_manual.patch still has
this effect.  At least
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=203697#19 says, that
this is the cause.

Maybe doc/source/ldp.dsl.in is the correct place to
search in?  At least it requests ISO-8859-1:

(define %html-header-tags% 
  '(("META" ("HTTP-EQUIV" "Content-Type") ("CONTENT" "text/html;
charset=ISO-8859-1"))))

Greetings
Roland


More information about the Privoxy-devel mailing list