[Privoxy-devel] PATCH for pcre2 support

Gagan Sidhu broly at mac.com
Fri Jun 23 21:25:24 CEST 2023


found the problem fabes,

it’s this call:

> static int path_matches(const char *path, const struct pattern_spec *pattern)
> {
> 
>    return ((NULL == pattern->pattern.url_spec.preg)
>       || (0 == regexec(pattern->pattern.url_spec.preg, path, 0, NULL, 0)));
> }
> 
interestingly, even if i manually enable 
> #define FEATURE_PCRE_HOST_PATTERNS 1

in config.h, everything else is fine.

it seems it’s this specific call.

i tried to store the outcome in a separate variable, in case somehow the return was ‘obliterating’ the stack 

> static int path_matches(const char *path, const struct pattern_spec *pattern)
> {
>         int regexoutcome =regexec(pattern->pattern.url_spec.preg, path, 0, NULL, 0);
> 
>    return ((NULL == pattern->pattern.url_spec.preg)
>       || (0 == regexoutcome));
> }

and it failed even quicker.

it seems the regexec (in pcre2 maybe?) does not like you putting in a NULL variable, or something.

if i remove that second check, it obviously hurts the regression tests, failing at a lucky number 81.

i assume you know how to fix this and this is “easy” for you.

Thanks,
Gagan

> On Jun 23, 2023, at 12:22 PM, Gagan Sidhu <broly at mac.com> wrote:
> 
>> i don’t know if this is a threading issue. lldb tells me something slightly different than what you’ve given below.
>> 
>> (lldb) run --no-daemon
>> Process 68171 launched: '/Volumes/xtoolshit/misc/dd-wrt/src/router/privoxygit/privoxy' (x86_64)
>> 2023-06-23 12:19:20.478 0010011b Info: Privoxy version 3.0.35
>> 2023-06-23 12:19:20.479 0010011b Info: Program name: /Volumes/xtoolshit/misc/dd-wrt/src/router/privoxygit/privoxy
>> Process 68171 stopped
>> * thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x1034fffff)
>>    frame #0: 0x00000001001ac905 libpcre2-8.0.dylib`match + 29985
>> libpcre2-8.0.dylib`match:
>> ->  0x1001ac905 <+29985>: movzbl (%rcx), %esi
>>    0x1001ac908 <+29988>: cmpb   (%rax,%rsi), %dl
>>    0x1001ac90b <+29991>: je     0x1001ae4d2               ; <+37102>
>>    0x1001ac911 <+29997>: jmp    0x1001b124b               ; <+48743>
>> Target 0: (privoxy) stopped.
>> 
> 
> as i understand it, pcre2 is supposed to be threadsafe. it’s very interesting that it is failing in the library call and not elsewhere.
> 
> really nowhere else to point the finger, hey?
> 
> Thanks,
> Gagan
> 
>> On Jun 23, 2023, at 7:54 AM, Gagan Sidhu <broly at mac.com> wrote:
>> 
>> i ran the below, without using gdb, and i do get a segfault at different points of the regression test on separate runs.
>> 
>> so you must be right about this (of course you are, you seem very thorough!)
>> 
>> pcre2 does afford us the ability to play with the heap memory limit, backtracking match limit, and backtracking depth limit.
>> 
>> i wonder if any of these would help avoid this issue.
>> 
>> this is not directly related but something i found about the issue (though it references regex, not match, https://github.com/opencog/link-grammar/pull/1354)
>> 
>> i’ll look into this later today.
>> 
>> Thanks,
>> Gagan
>> 
>>> On Jun 23, 2023, at 7:44 AM, Fabian Keil <fk at fabiankeil.de> wrote:
>>> 
>>> Fabian Keil <fk at fabiankeil.de> wrote on 2023-06-19 at 15:05:06:
>>> 
>>>> Gagan Sidhu <broly at mac.com> wrote on 2023-06-19 at 06:40:17:
>>>> 
>>>>>> On Jun 19, 2023, at 4:00 AM, Fabian Keil <fk at fabiankeil.de> wrote:
>>>> 
>>>>>> I'm not seeing different crashes like this one, though:
>>>> 
>>>> I meant to write "I'm now seeing ...".
>>>> 
>>>>>> #0  0x0000000823c95acd in ?? () from /usr/local/lib/libpcre2-8.so.0
>>>>>> [Current thread is 1 (LWP 108957)]
>>>>>> (gdb) where
>>>>>> #0  0x0000000823c95acd in ?? () from /usr/local/lib/libpcre2-8.so.0
>>>>>> #1  0x0000000823c9002a in pcre2_match_8 () from /usr/local/lib/libpcre2-8.so.0
>>>>>> #2  0x00000008255e60fe in pcre2_regexec () from /usr/local/lib/libpcre2-posix.so.3
>>>>>> #3  0x000000000027309b in path_matches (path=0x856e3c040 "/iomm/latest/bootstrap/stub.js", pattern=0x82a2c5f20) at urlmatch.c:1360
>>>>>> #4  0x0000000000272e7e in url_match (pattern=0x82a2c5f20, http=0x835520a58) at urlmatch.c:1387
>>>>>> #5  0x000000000024cfa8 in apply_url_actions (action=0x835520810, http=0x835520a58, client_tags=0x835520ba0, b=0x82a2c5f20) at filters.c:2884
>>>>>> #6  0x000000000024cf11 in get_url_actions (csp=0x835520808, http=0x835520a58) at filters.c:2845
>>>>>> #7  0x00000000002580ac in process_encrypted_request_headers (csp=0x835520808) at jcc.c:2799
>>>>>> #8  0x0000000000256a03 in chat (csp=0x835520808) at jcc.c:4472
>>>>>> #9  0x0000000000255796 in serve (csp=0x835520808) at jcc.c:5056
>>>>>> #10 0x0000000825a07a7a in thread_start (curthread=0x8355a6000) at /usr/src/lib/libthr/thread/thr_create.c:292
>>>>>> #11 0x0000000000000000 in ?? ()
>>>>>> Backtrace stopped: Cannot access memory at address 0x843d01000
>>>>>> 
>>>>>> I haven't been able to reproduce this yet.
>>> 
>>> Looks like this is a threading issue.
>>> 
>>> It can be reproduced in a couple of seconds with:
>>> 
>>> privoxy-regression-test.pl --forks 1
>>> 
>>> which results in crashes like:
>>> 
>>> Thread 117 received signal SIGSEGV, Segmentation fault.
>>> Address not mapped to object.
>>> [Switching to LWP 109325 of process 10990]
>>> 0x000000080062e25e in match (start_eptr=0x80280a0c0 "/show-url-info?url=http%3A%2F%2Fib.adnxs.com%2Fbounce?%252Fseg%253Fadd%253D279412", 
>>>   start_ecode=0x800eecc50 "\036u\036t\036m\036_\036m\036e\036d\036i\036u\036m\036=\036f\036e\036e\036dy", top_bracket=0, frame_size=128, match_data=0x800ecce10, mb=0x7fffdf5f6458) at src/pcre2_match.c:1049
>>> 1049	src/pcre2_match.c: No such file or directory.
>>> (gdb) info threads
>>> Id   Target Id                   Frame 
>>> 1    LWP 101941 of process 10990 _poll () at _poll.S:4
>>> 116  LWP 109324 of process 10990 0x000000080062e25e in match (
>>>   start_eptr=0x801fc8480 "/show-url-info?url=http%3A%2F%2Ffarm.plista.com%2Fwidgetdata.php?clientrev=12%26domainid=4211%26publickey=fdc5a7f9d15be004aa03fc4d%26cb=PLISTA5_7ed57c93e0d17%26requestID=5%265=widgetintegration%253A%2"..., start_ecode=0x800eecc50 "\036u\036t\036m\036_\036m\036e\036d\036i\036u\036m\036=\036f\036e\036e\036dy", top_bracket=0, frame_size=128, match_data=0x800ecce10, mb=0x7fffdf7f7458) at src/pcre2_match.c:1049
>>> * 117  LWP 109325 of process 10990 0x000000080062e25e in match (start_eptr=0x80280a0c0 "/show-url-info?url=http%3A%2F%2Fib.adnxs.com%2Fbounce?%252Fseg%253Fadd%253D279412", 
>>>   start_ecode=0x800eecc50 "\036u\036t\036m\036_\036m\036e\036d\036i\036u\036m\036=\036f\036e\036e\036dy", top_bracket=0, frame_size=128, match_data=0x800ecce10, mb=0x7fffdf5f6458) at src/pcre2_match.c:1049
>>> (gdb) where
>>> #0  0x000000080062e25e in match (start_eptr=0x80280a0c0 "/show-url-info?url=http%3A%2F%2Fib.adnxs.com%2Fbounce?%252Fseg%253Fadd%253D279412", 
>>>   start_ecode=0x800eecc50 "\036u\036t\036m\036_\036m\036e\036d\036i\036u\036m\036=\036f\036e\036e\036dy", top_bracket=0, frame_size=128, match_data=0x800ecce10, mb=0x7fffdf5f6458) at src/pcre2_match.c:1049
>>> #1  0x000000080062c041 in pcre2_match_8 (code=0x800eecbc0, subject=0x80280a0c0 "/show-url-info?url=http%3A%2F%2Fib.adnxs.com%2Fbounce?%252Fseg%253Fadd%253D279412", length=81, start_offset=0, options=0, 
>>>   match_data=0x800ecce10, mcontext=0x80065a508 <_pcre2_default_match_context_8>) at src/pcre2_match.c:7289
>>> #2  0x000000080065e39d in pcre2_regexec (preg=0x800ed8380, string=0x80280a0c0 "/show-url-info?url=http%3A%2F%2Fib.adnxs.com%2Fbounce?%252Fseg%253Fadd%253D279412", nmatch=0, pmatch=0x0, eflags=0) at src/pcre2posix.c:388
>>> #3  0x000000000027305b in path_matches (path=0x80280a0c0 "/show-url-info?url=http%3A%2F%2Fib.adnxs.com%2Fbounce?%252Fseg%253Fadd%253D279412", pattern=0x800eee300) at urlmatch.c:1360
>>> #4  0x0000000000272e3e in url_match (pattern=0x800eee300, http=0x800ef4a58) at urlmatch.c:1387
>>> #5  0x000000000024cf68 in apply_url_actions (action=0x800ef4810, http=0x800ef4a58, client_tags=0x800ef4ba0, b=0x800eee300) at filters.c:2884
>>> #6  0x000000000024ced1 in get_url_actions (csp=0x800ef4808, http=0x800ef4a58) at filters.c:2845
>>> #7  0x000000000025cb68 in receive_client_request (csp=0x800ef4808) at jcc.c:1960
>>> #8  0x0000000000256627 in chat (csp=0x800ef4808) at jcc.c:4310
>>> #9  0x0000000000255756 in serve (csp=0x800ef4808) at jcc.c:5056
>>> #10 0x0000000800708a7a in thread_start (curthread=0x800e14300) at /usr/src/lib/libthr/thread/thr_create.c:292
>>> #11 0x0000000000000000 in ?? ()
>>> Backtrace stopped: Cannot access memory at address 0x7fffdf5f9000
>>> (gdb) t 116
>>> [Switching to thread 116 (LWP 109324 of process 10990)]
>>> #0  0x000000080062e25e in match (
>>>   start_eptr=0x801fc8480 "/show-url-info?url=http%3A%2F%2Ffarm.plista.com%2Fwidgetdata.php?clientrev=12%26domainid=4211%26publickey=fdc5a7f9d15be004aa03fc4d%26cb=PLISTA5_7ed57c93e0d17%26requestID=5%265=widgetintegration%253A%2"..., start_ecode=0x800eecc50 "\036u\036t\036m\036_\036m\036e\036d\036i\036u\036m\036=\036f\036e\036e\036dy", top_bracket=0, frame_size=128, match_data=0x800ecce10, mb=0x7fffdf7f7458) at src/pcre2_match.c:1049
>>> 1049	in src/pcre2_match.c
>>> (gdb) where
>>> #0  0x000000080062e25e in match (
>>>   start_eptr=0x801fc8480 "/show-url-info?url=http%3A%2F%2Ffarm.plista.com%2Fwidgetdata.php?clientrev=12%26domainid=4211%26publickey=fdc5a7f9d15be004aa03fc4d%26cb=PLISTA5_7ed57c93e0d17%26requestID=5%265=widgetintegration%253A%2"..., start_ecode=0x800eecc50 "\036u\036t\036m\036_\036m\036e\036d\036i\036u\036m\036=\036f\036e\036e\036dy", top_bracket=0, frame_size=128, match_data=0x800ecce10, mb=0x7fffdf7f7458) at src/pcre2_match.c:1049
>>> #1  0x000000080062c041 in pcre2_match_8 (code=0x800eecbc0, 
>>>   subject=0x801fc8480 "/show-url-info?url=http%3A%2F%2Ffarm.plista.com%2Fwidgetdata.php?clientrev=12%26domainid=4211%26publickey=fdc5a7f9d15be004aa03fc4d%26cb=PLISTA5_7ed57c93e0d17%26requestID=5%265=widgetintegration%253A%2"..., length=365, start_offset=0, options=0, match_data=0x800ecce10, mcontext=0x80065a508 <_pcre2_default_match_context_8>) at src/pcre2_match.c:7289
>>> #2  0x000000080065e39d in pcre2_regexec (preg=0x800ed8380, 
>>>   string=0x801fc8480 "/show-url-info?url=http%3A%2F%2Ffarm.plista.com%2Fwidgetdata.php?clientrev=12%26domainid=4211%26publickey=fdc5a7f9d15be004aa03fc4d%26cb=PLISTA5_7ed57c93e0d17%26requestID=5%265=widgetintegration%253A%2"..., nmatch=0, pmatch=0x0, eflags=0) at src/pcre2posix.c:388
>>> #3  0x000000000027305b in path_matches (
>>>   path=0x801fc8480 "/show-url-info?url=http%3A%2F%2Ffarm.plista.com%2Fwidgetdata.php?clientrev=12%26domainid=4211%26publickey=fdc5a7f9d15be004aa03fc4d%26cb=PLISTA5_7ed57c93e0d17%26requestID=5%265=widgetintegration%253A%2"..., pattern=0x800eee300) at urlmatch.c:1360
>>> #4  0x0000000000272e3e in url_match (pattern=0x800eee300, http=0x800ef2e58) at urlmatch.c:1387
>>> #5  0x000000000024cf68 in apply_url_actions (action=0x800ef2c10, http=0x800ef2e58, client_tags=0x800ef2fa0, b=0x800eee300) at filters.c:2884
>>> #6  0x000000000024ced1 in get_url_actions (csp=0x800ef2c08, http=0x800ef2e58) at filters.c:2845
>>> #7  0x000000000025cb68 in receive_client_request (csp=0x800ef2c08) at jcc.c:1960
>>> #8  0x0000000000256627 in chat (csp=0x800ef2c08) at jcc.c:4310
>>> #9  0x0000000000255756 in serve (csp=0x800ef2c08) at jcc.c:5056
>>> #10 0x0000000800708a7a in thread_start (curthread=0x800e13c00) at /usr/src/lib/libthr/thread/thr_create.c:292
>>> #11 0x0000000000000000 in ?? ()
>>> Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000
>>> 
>>> It seems to happen with and without JIT compilation.
>>> 
>>> Fabian
>>> _______________________________________________
>>> Privoxy-devel mailing list
>>> Privoxy-devel at lists.privoxy.org
>>> https://lists.privoxy.org/mailman/listinfo/privoxy-devel
>> 
> 
> _______________________________________________
> Privoxy-devel mailing list
> Privoxy-devel at lists.privoxy.org
> https://lists.privoxy.org/mailman/listinfo/privoxy-devel



More information about the Privoxy-devel mailing list