[Privoxy-devel] any interest in a faster way to check blacklists?

Lee ler762 at gmail.com
Mon Feb 3 11:36:11 UTC 2020


On 1/31/20, Fabian Keil <fk at fabiankeil.de> wrote:
> Lee <ler762 at gmail.com> wrote:
>
>> I use various host file blacklists to generate action files & I try to
>> keep the number of duplicates down..  So I crank the new list of
>> hostnames thru http://config.privoxy.org/show-url-info to see if it's
>> already blocked or not
>>
>> The problem is that it takes a while to check every hostname, so I
>> came up with a faster way
>> https://github.com/ler762/privoxy/commit/ff0bf9b850d539bbf037ff8616412bf99b5a1d2c
>>
>> tl;dr: about 90 seconds vs. 197 seconds to check the lightswitch05 hosts
>> file
>>
>> Any interest in including the patch?
>
> I think with a few modifications the patch would be nice to have.

What's your use case?

I'm guessing speeding up the privoxy regression test, but do you have
anything else in mind?


> actions.c
> +  *      no, i have no idea what the diff is between single & multi
> actions
>
> Multi actions can be applied multiple times (example filter{}) while
> single actions overrule each other (example block{}). Now you know
> and the comment can be removed.

thank you :)

> It concerns me that the patch only seems to deal with single actions
> as this makes the output misleading.

I suppose it could be misleading.. but it'd be documented as something
for a program to use to see if a url would be blocked or not, so only
confusing to those that don't RTFM :)
  (and if they don't RTFM, how are they going to find this function name?)

> Are multi actions really that
> expensive to compute?

I finally gave up trying to figure that out & measured the difference
between the new cgi function that doesn't do multi actions & the old
cgi_show_url_info that I changed to not touch the disk:
- the template (with included style section) comes from memory instead of disk
   ie: +   char body[] = \
       +"<!DOCTYPE html><html lang=\"en\"><head>
    ... etc
- not do the 'if (run_loader(csp))' check
- do the 'rsp->body = strdup_or_die(body);
template_fill(&rsp->body,... etc' instead of
     return template_fill_for_cgi

There is a noticeable difference:
calling   http://config.privoxy.org/show-url-info?url=
 time: 141.62 seconds
 time: 142.59 seconds
 time: 141.19 seconds
 time: 143.70 seconds
 time: 144.30 seconds

calling   http://config.privoxy.org/show-url-final-info?url=
 time: 108.73 seconds
 time: 112.99 seconds
 time: 109.02 seconds
 time: 112.88 seconds
 time: 112.29 seconds

compared to 240.29 seconds for the original show-url-info that checks
the timestamps on all the action/ filter files and reads all those
template/xxx files from disk

and if you're paying attention <grin>  The
>> tl;dr: about 90 seconds vs. 197 seconds
was on a new, not quite finished setting up, intel-i5 PC.
These tests were run on my old intel-i3 PC where it's about 110 vs 240 seconds

I'm not sure how to sum up..  Other than testing a new release, I
rarely run the privoxy regression test.  The only other program I have
calling a privoxy cgi function is that turn a host file into an action
file script that I run a couple of times a week.

Lee


More information about the Privoxy-devel mailing list