[Privoxy-devel] 0007-Create-a-fast-CGI-function

Lee ler762 at protonmail.com
Tue Sep 12 14:49:26 CEST 2023


On Sunday, September 10th, 2023 at 6:52 AM, Fabian Keil wrote:

> Lee wrote on 2023-09-08 at 19:05:16:
> 
> > On Thursday, September 7th, 2023 at 12:06 PM, Fabian Keil wrote:
> > 
> > > I suppose some filled-out templates could be cached as another
> > > mechanism to speed things up.
> > 
> > But then you'll need to check the cache to see if it's out of
> > date & how do you do that without accessing the disk?
> 
> Some of the templates could be cached until the config file
> gets reloaded, one example for this would be mod-title.
> 
> Even if we would continue to check the config file for each
> request, caching mod-title and friends should help a bit
> by reducing the number of system calls Privoxy makes.

My apologies - I'm feeling really slow today because I'm just not following you :(

What we have now seems plenty good enough for human use.  I wouldn't put any effort into speeding up the results of show-url-info for a person.

It's the programmatic use that I thought we were considering adding another param like 'fast=1' to the show-url-info function to make it use an in-memory template and not reference things like mod-title at all.  I took another look at templates/show-url-info and aside from the fact that it pulls in things like mod-title, it's _long_

$ wc templates/show-url-info
  300  1104 10292 templates/show-url-info

10K characters won't fit into a single packet, so it doesn't seem like a good candidate for caching even if all the other templates it calls are cached.


> > Maybe you can come up with something that is fast; all
> > I could come up with is what I did in cgi_show_url_final_info:
> > char body[] = \
> > "<!DOCTYPE html><html lang=\"en\"><head><title>URL Block Info</title></head>\n"\
> > ... </html>\n";
> > ...
> > /* return template_fill_for_cgi(csp, "show-url-final-info", exports, rsp); -LR- */
> > rsp->body = strdup_or_die(body);
> > template_fill(&rsp->body, exports);
> > free_map(exports);
> > return 0;
> 
> Doing it without template_fill_for_cgi() (which leads to lots
> of pcre compilations and pcrs substitutions) should be even faster,
> but of course every small step would help.

If I was any good I could figure out how to do it without calling template_fill.  But I'm turning into a cargo-cult programmer & copying working code is my limit here :(

> > > > I expect it to be noticeably slower when everything is happening
> > > > on the same machine. I originally had this in my awk program for
> > > > talking to Privoxy:
> > > > 
> > > > print "GET http://config.privoxy.org/show-url-final-info?url=" url " HTTP/1.1" |& webserver
> > > > print "Host: config.privoxy.org" |& webserver
> > > > print "Accept: text/html" |& webserver
> > > > print "Connection: Keep-Alive" |& webserver
> > > > print "" |& webserver
> > > > 
> > > > That's slower than
> > > > 
> > > > printf("GET http://config.privoxy.org/show-url-final-info?url=%s HTTP/1.1\r\n"\
> > > > "Host: config.privoxy.org\r\n"\
> > > > "Accept: text/html\r\n"\
> > > > "Connection: Keep-Alive\r\n\r\n", url ) |& webserver
> > > > 
> > > > because the multiple print statements tend to cause multiple writes
> > > > to Privoxy. The single printf causes a single write to Privoxy and
> > > > is clearly faster.
> > > 
> > > That's what I would expect.
> > 
> > Not me. Or at least, not the me back then :)
> > I was seeing multiple tcp packets with wireshark and not coming up
> > with a reasonable explanation for why. It took me a while before
> > trying to do everything in a single write.
> 
> 
> While the topic is TCP packets, you could also experiment
> with letting Privoxy not set TCP_NODELAY (see set_no_delay_flag()
> in jbsockets.c) for the socket Privoxy accepts the connection
> on.
>
> It's not guaranteed to make things faster (after all we set
> the TCP_NODELAY flag for a reason) but it could reduce the
> number of packets Privoxy sends which may reduce the context
> switches your awk program does.

I don't think clearing TCP_NODELAY will help.  I've got it down to Privoxy sending just two packets in response to the show-url-final-info request - one for the html header and the other for the body.  It seems like enabling Nagle would only slow things down..

> > > Is your awk program free software and publicly available?
> > 
> > GPL yes. Available, maybe. I tried to update my github repository
> > and after multiple pushes finally saw something in the contrib directory.
> > https://github.com/ler762/privoxy/tree/lee/contrib
> 
> 
> Thanks, I'll take a look.
> 
> > Hopefully the "nothing, nothing, nothing, YaY! there it is,
> > nothing, nothing" flakiness was because I'd just updated it
> > and they took way too long to update all their web servers.
> 
> 
> I'm not sure I follow. Are you referring to Microsoft/Github
> or who are "they"?

"they" is Github.

I did a 'git push' from the command line and used a browser to check the results on github.  The contrib directory was empty :(
I made some trivial change to the files in contrib and did another git commit/push.. still no contents in the contrib directory.  I ended up doing several pushes before seeing something in contrib.  And when I looked again and nothing was in contrib, so I wrote it off as their flakey problem.

Lee



More information about the Privoxy-devel mailing list