[Privoxy-devel] PATCH for pcre2 support

Gagan Sidhu broly at mac.com
Sat Mar 11 22:12:52 CET 2023


hi fabian,

so i have done a bit of digging and i hope you can easily see the issue.

when i look at the output of the function “pcrs_compile” for pcre and pcre2, things look okay at startup.

at this point, the outputs are identical for capturecount, eg:

> pcre2:substitute:, capturecount:0
> pcre2:substitute:, capturecount:0
> pcre2:substitute:, capturecount:0
> pcre2:substitute:, capturecount:0
> pcre2:substitute:, capturecount:0
> pcre2:substitute:, capturecount:0
> pcre2:substitute:$1blafasel, capturecount:1

and

> pcre1:substitute:, capturecount:0
> pcre1:substitute:, capturecount:0
> pcre1:substitute:, capturecount:0
> pcre1:substitute:, capturecount:0
> pcre1:substitute:, capturecount:0
> pcre1:substitute:, capturecount:0
> pcre1:substitute:$1blafasel, capturecount:1

but when we start doing the regression tests, i noticed that the pcre2 version just stops at a given point:

pcre2:

> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./match-all.action</td><td class="buttons"><a href="/show-status?file=actions&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
> , capturecount:0
> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
> , capturecount:0
> pcre2:substitute:None specified, capturecount:0
> pcre2:substitute:, capturecount:0
> pcre2:substitute:/PRIVOXY-FORCE, capturecount:0


whereas for pcre1 it hums along to the next “chunk":

> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
> , capturecount:0
> pcre1:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
> , capturecount:0
> pcre1:substitute:None specified, capturecount:0
> pcre1:substitute:, capturecount:0
> pcre1:substitute:/PRIVOXY-FORCE, capturecount:0
> pcre1:substitute:3.0.33, capturecount:0
> pcre1:substitute:Sat Mar 11 13:51:17 MST 2023, capturecount:0
> pcre1:substitute:127.0.0.1, capturecount:0
> pcre1:substitute:8118, capturecount:0
> pcre1:substitute:localhost, capturecount:0
> pcre1:substitute:https://www.privoxy.org/, capturecount:0
> pcre1:substitute:http://config.privoxy.org/, capturecount:0
> pcre1:substitute:<li><a href="http://config.privoxy.org/">Privoxy main page</a></li><li><a href="http://config.privoxy.org/client-tags">View or toggle the tags that can be set based on the client's address</a></li><li><a href="http://config.privoxy.org/show-request">View the request headers</a></li><li><a href="http://config.privoxy.org/show-url-info">Look up which actions apply to a URL and why</a></li>, capturecount:0
> pcre1:substitute:stable, capturecount:0
> pcre1:substitute:https://www.privoxy.org/3.0.33/user-manual/, capturecount:0

in short, i believe everything in pcrs_compile is behaving because the capturecounts are correct. i don’t see why it stops here unless there’s an issue with pcrs_execute.

so i then started printing the subject lengths in pcrs_execute. this is the problem: the subject_length is wrong for pcre2. i do not know why:

> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./match-all.action</td><td class="buttons"><a href="/show-status?file=actions&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
> , capturecount:0
> pcre2:subject_length:0
> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
> , capturecount:0
> pcre2:subject_length:0
> pcre2:substitute:None specified, capturecount:0
> pcre2:subject_length:0
> pcre2:substitute:, capturecount:0
> pcre2:subject_length:0
> pcre2:substitute:/PRIVOXY-FORCE, capturecount:0
> pcre2:subject_length:0

pcre1:
> pcre1:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./match-all.action</td><td class="buttons"><a href="/show-status?file=actions&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
> , capturecount:0
> pcre1:subject_length:14638
> pcre1:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
> , capturecount:0
> pcre1:subject_length:15116
> pcre1:substitute:None specified, capturecount:0
> pcre1:subject_length:15422
> pcre1:substitute:, capturecount:0
> pcre1:subject_length:15420
> pcre1:substitute:/PRIVOXY-FORCE, capturecount:0
> pcre1:subject_length:15294


in short, it seems to me if the issue with the subject_length being passed to pcrs_execute is fixed for pcre2, then hopefully we have no other problems.

i am just not familiar enough with your software to pinpoint where exactly this length is being calculated, and what adjustments need to be made for pcre2 so that the right value is used.

i am hoping this is a simple fix. at least pcrs_compile looks okay :P

> On Mar 11, 2023, at 11:51 AM, Gagan Sidhu <broly at mac.com> wrote:
> 
> my mistake, there was something different in the source tree.
> 
> i will look into this and get back to you! 
> 
> thanks for telling me how to test it!
> 
> Thanks,
> Gagan
> 
>> On Mar 11, 2023, at 11:47 AM, Gagan Sidhu <broly at mac.com> wrote:
>> 
>> hi fabian,
>> 
>> can i get a little more clarification? i ran the test and on both a pcre2 and pcre-compiled version, and the result is the same:
>> 
>> pcre:
>> 
>>> gcc -L/opt/local/lib  -Dunix -o privoxy actions.o cgi.o cgiedit.o cgisimple.o deanimate.o encode.o errlog.o filters.o gateway.o jbsockets.o jcc.o list.o loadcfg.o loaders.o miscutil.o parsers.o ssplit.o urlmatch.o client-tags.o   pcrs.o     -lpcre -lz -lpcre -lpcreposix   
>>> grep -v '^#MASTER#' default.action.master > default.action
>>> GagansMacPro:privoxy Gagan$ ./privoxy
>>> GagansMacPro:privoxy Gagan$ ps -A | grep privoxy
>>> 14540 ??         0:00.04 ./privoxy
>>> 57690 ttys001    0:01.42 vim privoxy/pcrs.c
>>> 14534 ttys002    0:00.91 vim tools/privoxy-regression-test.pl
>>> 14542 ttys002    0:00.00 grep privoxy
>>> 90105 ttys004    0:00.15 vim rules/privoxy.mk
>>> GagansMacPro:privoxy Gagan$ tools/privoxy-regression-test.pl 
>>> 2023-03-11 11:03:02: Asking Privoxy for the number of action files available ...
>>> 2023-03-11 11:03:02: Gathering regression tests from 3 action file(s) delivered by Privoxy 3.0.33.
>>> 2023-03-11 11:03:02: Executing regression tests ...
>>> 2023-03-11 11:03:06: Failure for test 70. Supposedly-blocked URL: 'https://elsa.memoinsights.com/t?pid=62012a7a19351c07620394e0&url=https%3A%2F%2Farstechnica.com%2Ftech-policy%2F2022%2F08%2Fthe-women-calling-out-apples-handling-of-misconduct-claims%2F&author%5B%5D=Financial%20Times&title=The%20women%20calling%20out%20Apple%E2%80%99s%20handling%20of%20misconduct%20claims&date=2022-08-04T13%3A39%3A42Z&referrer=&ref_url=&page_url=https%3A%2F%2Farstechnica.com%2Ftech-policy%2F2022%2F08%2Fthe-women-calling-out-apples-handling-of-misconduct-claims%2F%3Fcomments%3D1&cb=MEMO.API.callbacks.cbakynzcplf&v=v3.0.6&t=5000&e=5000&s=7362'
>>> 2023-03-11 11:03:25: Executed 452 regression tests. Skipped 17. 451 successes, 1 failures.
>> 
>> pcre2:
>> 
>>> gcc -L/opt/local/lib  -Dunix -o privoxy actions.o cgi.o cgiedit.o cgisimple.o deanimate.o encode.o errlog.o filters.o gateway.o jbsockets.o jcc.o list.o loadcfg.o loaders.o miscutil.o parsers.o ssplit.o urlmatch.o client-tags.o   pcrs.o     -lz -lpcre2-8 -lpcre2-posix   
>>> grep -v '^#MASTER#' default.action.master > default.action
>>> GagansMacPro:privoxy Gagan$ ./privoxy
>>> ps -GagansMacPro:privoxy Gagan$ ps -A | grep privoxy
>>> 17837 ??         0:00.21 ./privoxy
>>> 17256 ttys001    0:00.26 vim privoxy/pcrs.c
>>> 21572 ttys002    0:00.00 grep privoxy
>>> 90105 ttys004    0:00.15 vim rules/privoxy.mk
>>> GagansMacPro:privoxy Gagan$ tools/privoxy-regression-test.pl 
>>> 2023-03-11 11:37:31: Asking Privoxy for the number of action files available ...
>>> 2023-03-11 11:37:31: Gathering regression tests from 3 action file(s) delivered by Privoxy 3.0.33.
>>> 2023-03-11 11:37:31: Executing regression tests ...
>>> 2023-03-11 11:37:35: Failure for test 70. Supposedly-blocked URL: 'https://elsa.memoinsights.com/t?pid=62012a7a19351c07620394e0&url=https%3A%2F%2Farstechnica.com%2Ftech-policy%2F2022%2F08%2Fthe-women-calling-out-apples-handling-of-misconduct-claims%2F&author%5B%5D=Financial%20Times&title=The%20women%20calling%20out%20Apple%E2%80%99s%20handling%20of%20misconduct%20claims&date=2022-08-04T13%3A39%3A42Z&referrer=&ref_url=&page_url=https%3A%2F%2Farstechnica.com%2Ftech-policy%2F2022%2F08%2Fthe-women-calling-out-apples-handling-of-misconduct-claims%2F%3Fcomments%3D1&cb=MEMO.API.callbacks.cbakynzcplf&v=v3.0.6&t=5000&e=5000&s=7362'
>>> 2023-03-11 11:37:55: Executed 452 regression tests. Skipped 17. 451 successes, 1 failures.
>> 
>> 
>> i will note that when i tried to run the git version on my mac, it does not manage to even to successfully run in daemon mode. it quietly terminates.
>> 
>> i applied the patch against 3.0.33 and it runs in the background and outputs the same result for the regression tests for pcre1 or pcre2.
>> 
>> Thanks,
>> Gagan
>>> 
>>>> On Mar 11, 2023, at 5:45 AM, Gagan Sidhu <beatlesnut at mac.com> wrote:
>>>> 
>>>> Hi fabian
>>>> 
>>>> Thanks for the information. May I ask how you're testing so I can it on my own before presenting any further changes?
>>>> 
>>>> Thank you!
>>>> 
>>>> Original Message  
>>>> From: Fabian Keil
>>>> Sent: Saturday, 11 March 2023 2:56 AM
>>>> To: Gagan Sidhu
>>>> Reply To: privoxy-devel at lists.privoxy.org
>>>> Cc: privoxy-devel at lists.privoxy.org
>>>> Subject: Re: [Privoxy-devel] PATCH for pcre2 support
>>>> 
>>>> Gagan Sidhu <broly at mac.com> wrote on 2023-03-09 at 08:49:52:
>>>> 
>>>>> sorry about this (lol), but upon further analysis i totally didn’t need the pcre2_matches_dummy variable.
>>>>> 
>>>>> i didn’t pay enough attention to the fact that outputs was originally declared with size PCRS_MAX_SUBMATCHES.
>>>>> 
>>>>> so if i declare pcre2_matches to this size, then no reallocation is necessary at all because these are the *submatches*.
>>>>> -and only the *matches* structure should grow. 
>>>>> 
>>>>> in any case, since PCRS_MAX_MATCH_INIT >PCRS_MAX_SUBMATCHES, this may have not caused any problems at all (40 > 33).
>>>>> 
>>>>> i also added a pcre2_jit_compile call that some say is equivalent to pcre2_study.
>>>>> 
>>>>> by calling pcre2_match on the pcre2_code variable that has been compiled with pcre2_jit, we should get jit capability (if available).
>>>>> 
>>>>> new sha256 is : 
>>>>> 
>>>>> 3986178a0dd241c18ef61297c4a1f48252033610921bc80dfa4d8f5bb0035117
>>>> 
>>>> I tried the latest version:
>>>> 
>>>> fk at t520 ~/git/privoxy $sha256 substandard_pcre2.patch 
>>>> SHA256 (substandard_pcre2.patch) = 142f99e4b685fee8c6592ea47a89cc4ea29622744458e70d3ae1f370abd9df27
>>>> 
>>>> and the CGI pages are still empty.
>>>> 
>>>> Fabian
>>>> 
>>> 
>> 
>> _______________________________________________
>> Privoxy-devel mailing list
>> Privoxy-devel at lists.privoxy.org
>> https://lists.privoxy.org/mailman/listinfo/privoxy-devel
> 



More information about the Privoxy-devel mailing list