[Privoxy-devel] PATCH for pcre2 support

Gagan Sidhu broly at mac.com
Mon Mar 13 18:28:42 CET 2023


hello everyone.

i just wanted to let you know that i made similar changes to midnight commander without issue, and have proposed a patch for pcre2 support:

https://midnight-commander.org/ticket/4450 <https://midnight-commander.org/ticket/4450>

this was pretty close to a ‘drop in’ replacement. it confirms that moving to pcre2_match (from pcre_exec) with an ovector pointer works great.

therefore it is not my changes per-se that are the problem. it is a combination of privoxy’s string preprocessing/postprocessing and pcre2 that is the problem.

i will also add the ’server’ is up. when you go to http://127.0.0.1:8118 <http://127.0.0.1:8118/>, you get the exact same output for pcre2 and pcre1.

some people were kind enough to share some things that may break in pcre2 (that worked in pcre1):

https://stackoverflow.com/a/73767663 <https://stackoverflow.com/a/73767663>

any assistance on fixing this issue would be great.

Thanks,
Gagan

> On Mar 11, 2023, at 5:15 PM, Gagan Sidhu <broly at mac.com> wrote:
> 
> i’ve done more digging and it seems to be the header_tagger function.
> 
> i put a printf before:
> 
>> const int hits = pcrs_execute(job, tag, size, &modified_tag, &size)
> 
> and it does not print upon initiating the regression tests, which must mean that the job list is shorter than we are expecting (for some reason or another):
> 
> 
>> pcre2:substitute:$1blafasel, capturecount:1 < - last line output before starting regression test
>> hi
>> pcre2:subject_length:35
>> hi
>> pcre2:subject_length:35
>> hi
>> pcre2:subject_length:35
>> hi
>> pcre2:subject_length:9
>> hi
>> pcre2:subject_length:9
>> hi
>> pcre2:subject_length:9
>> hi
>> pcre2:subject_length:41
>> hi
>> pcre2:subject_length:41
>> hi
>> pcre2:subject_length:41
>> hi
>> pcre2:subject_length:11
>> hi
>> pcre2:subject_length:11
>> hi
>> pcre2:subject_length:11
>> hi
>> pcre2:subject_length:17
>> hi
>> pcre2:subject_length:17
>> hi
>> pcre2:subject_length:17
>> pcre2:substitute:3.0.33, capturecount:0
>> pcre2:subject_length:0
>> pcre2:substitute:Sat Mar 11 16:24:43 MST 2023, capturecount:0
>> pcre2:subject_length:0
>> pcre2:substitute:127.0.0.1, capturecount:0
> 
> pcre1:
> 
>> pcre1:substitute:$1blafasel, capturecount:1 < - again, last line before regression test starts.
>> hi
>> pcre1:subject_length:35
>> hi
>> pcre1:subject_length:35
>> hi
>> pcre1:subject_length:35
>> hi
>> pcre1:subject_length:9
>> hi
>> pcre1:subject_length:9
>> hi
>> pcre1:subject_length:9
>> hi
>> pcre1:subject_length:41
>> hi
>> pcre1:subject_length:41
>> hi
>> pcre1:subject_length:41
>> hi
>> pcre1:subject_length:11
>> hi
>> pcre1:subject_length:11
>> hi
>> pcre1:subject_length:11
>> hi
>> pcre1:subject_length:17
>> hi
>> pcre1:subject_length:17
>> hi
>> pcre1:subject_length:17
>> pcre1:substitute:3.0.33, capturecount:0
>> pcre1:subject_length:15988
>> pcre1:substitute:Sat Mar 11 16:34:56 MST 2023, capturecount:0
>> pcre1:subject_length:15985
>> pcre1:substitute:127.0.0.1, capturecount:0
>> pcre1:subject_length:15985
>> pcre1:substitute:8118, capturecount:0
>> 
> 
> capturecount is correct up until we call the regression tests, and we never get a chance to see its output after that.
> 
> this explains what you’ve observed regarding the program’s failure to find the action files:
> 
>> GagansMacPro:privoxy Gagan$ tools/privoxy-regression-test.pl 
>> 2023-03-11 17:06:11: Asking Privoxy for the number of action files available ...
>> 2023-03-11 17:06:11: Gathering regression tests from 0 action file(s) delivered by Privoxy (Unknown version!).
>> 2023-03-11 17:06:11: No regression tests found.
> 
> i am hoping this is good information that will help diagnose the issue.
> 
> it’s possible that the pcre2_match transition could be an issue, but the way i’ve done it is ’standard’ for all examples i’ve seen:
> 
> https://github.com/i3/i3/issues/4682#issuecomment-973076704 <https://github.com/i3/i3/issues/4682#issuecomment-973076704>
> 	
> and this example seems to be referenced quite a bit, though william rowe’s is superior ( https://gist.github.com/wrowe/73f655d13bbe0f12030aa4557e804d8a <https://gist.github.com/wrowe/73f655d13bbe0f12030aa4557e804d8a> )
> 
> is it possible that the regression tests may need to be updated for pcre2? i don’t know why it can’t even find the action files.
> 
> i haven’t touched anything other than what is in the patch.
> 
> Thanks,
> Gagan
> 
>> On Mar 11, 2023, at 2:12 PM, Gagan Sidhu <broly at mac.com <mailto:broly at mac.com>> wrote:
>> 
>> hi fabian,
>> 
>> so i have done a bit of digging and i hope you can easily see the issue.
>> 
>> when i look at the output of the function “pcrs_compile” for pcre and pcre2, things look okay at startup.
>> 
>> at this point, the outputs are identical for capturecount, eg:
>> 
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:$1blafasel, capturecount:1
>> 
>> and
>> 
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:$1blafasel, capturecount:1
>> 
>> but when we start doing the regression tests, i noticed that the pcre2 version just stops at a given point:
>> 
>> pcre2:
>> 
>>> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./match-all.action</td><td class="buttons"><a href="/show-status?file=actions&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
>>> , capturecount:0
>>> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
>>> , capturecount:0
>>> pcre2:substitute:None specified, capturecount:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:substitute:/PRIVOXY-FORCE, capturecount:0
>> 
>> whereas for pcre1 it hums along to the next “chunk":
>> 
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
>>> , capturecount:0
>>> pcre1:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
>>> , capturecount:0
>>> pcre1:substitute:None specified, capturecount:0
>>> pcre1:substitute:, capturecount:0
>>> pcre1:substitute:/PRIVOXY-FORCE, capturecount:0
>>> pcre1:substitute:3.0.33, capturecount:0
>>> pcre1:substitute:Sat Mar 11 13:51:17 MST 2023, capturecount:0
>>> pcre1:substitute:127.0.0.1, capturecount:0
>>> pcre1:substitute:8118, capturecount:0
>>> pcre1:substitute:localhost, capturecount:0
>>> pcre1:substitute:https://www.privoxy.org/ <https://www.privoxy.org/>, capturecount:0
>>> pcre1:substitute:http://config.privoxy.org/, capturecount:0
>>> pcre1:substitute:<li><a href="http://config.privoxy.org/">Privoxy main page</a></li><li><a href="http://config.privoxy.org/client-tags">View or toggle the tags that can be set based on the client's address</a></li><li><a href="http://config.privoxy.org/show-request">View the request headers</a></li><li><a href="http://config.privoxy.org/show-url-info">Look up which actions apply to a URL and why</a></li>, capturecount:0
>>> pcre1:substitute:stable, capturecount:0
>>> pcre1:substitute:https://www.privoxy.org/3.0.33/user-manual/, capturecount:0
>> 
>> in short, i believe everything in pcrs_compile is behaving because the capturecounts are correct. i don’t see why it stops here unless there’s an issue with pcrs_execute.
>> 
>> so i then started printing the subject lengths in pcrs_execute. this is the problem: the subject_length is wrong for pcre2. i do not know why:
>> 
>>> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./match-all.action</td><td class="buttons"><a href="/show-status?file=actions&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
>>> , capturecount:0
>>> pcre2:subject_length:0
>>> pcre2:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
>>> , capturecount:0
>>> pcre2:subject_length:0
>>> pcre2:substitute:None specified, capturecount:0
>>> pcre2:subject_length:0
>>> pcre2:substitute:, capturecount:0
>>> pcre2:subject_length:0
>>> pcre2:substitute:/PRIVOXY-FORCE, capturecount:0
>>> pcre2:subject_length:0
>> 
>> pcre1:
>>> pcre1:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./match-all.action</td><td class="buttons"><a href="/show-status?file=actions&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.action</td><td class="buttons"><a href="/show-status?file=actions&index=1">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.action</td><td class="buttons"><a href="/show-status?file=actions&index=2">View</a></td></tr>
>>> , capturecount:0
>>> pcre1:subject_length:14638
>>> pcre1:substitute:<tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./default.filter</td><td class="buttons"><a href="/show-status?file=filter&index=0">View</a></td></tr>
>>> <tr><td>/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxy/./user.filter</td><td class="buttons"><a href="/show-status?file=filter&index=1">View</a></td></tr>
>>> , capturecount:0
>>> pcre1:subject_length:15116
>>> pcre1:substitute:None specified, capturecount:0
>>> pcre1:subject_length:15422
>>> pcre1:substitute:, capturecount:0
>>> pcre1:subject_length:15420
>>> pcre1:substitute:/PRIVOXY-FORCE, capturecount:0
>>> pcre1:subject_length:15294
>> 
>> 
>> in short, it seems to me if the issue with the subject_length being passed to pcrs_execute is fixed for pcre2, then hopefully we have no other problems.
>> 
>> i am just not familiar enough with your software to pinpoint where exactly this length is being calculated, and what adjustments need to be made for pcre2 so that the right value is used.
>> 
>> i am hoping this is a simple fix. at least pcrs_compile looks okay :P
>> 
>> 
> 
> _______________________________________________
> Privoxy-devel mailing list
> Privoxy-devel at lists.privoxy.org
> https://lists.privoxy.org/mailman/listinfo/privoxy-devel



More information about the Privoxy-devel mailing list