[Privoxy-devel] PATCH for pcre2 support

Gagan Sidhu broly at mac.com
Thu Mar 16 12:41:42 CET 2023


thanks for your incredibly insightful response.

> On Mar 16, 2023, at 3:56 AM, Fabian Keil <fk at fabiankeil.de> wrote:
> 
> Gagan Sidhu <broly at mac.com> wrote on 2023-03-13 at 11:28:42:
> 
>> therefore it is not my changes per-se that are the problem.
>> it is a combination of privoxy’s string preprocessing/postprocessing
>> and pcre2 that is the problem.
>> 
>> i will also add the ’server’ is up. when you go to
>> http://127.0.0.1:8118, you get the exact same output for pcre2 and pcre1.
> 
> Are you saying you are using the patch:
> SHA256 (substandard_pcre2.patch) = 142f99e4b685fee8c6592ea47a89cc4ea29622744458e70d3ae1f370abd9df27
> and get actual content when requesting http://127.0.0.1:8118/ and http://p.p:8118/? <http://p.p:8118/?>

no. what i’m saying is i get this msesage when visiting 127.0.0.1:8118 from either the pcre1 or pcre2 builds:

"Invalid header received from client.”

> 
>> some people were kind enough to share some things that may break in pcre2 (that worked in pcre1):
>> 
>> https://stackoverflow.com/a/73767663 <https://stackoverflow.com/a/73767663>
>> 
>> any assistance on fixing this issue would be great.
> 
> Very interesting.
> 
> I noticed another problem with the patch.
> 
> Destination rewriting seems to reproducible result in a stack overflow:
> 
> fk at t520 ~/git/privoxy $gdb-privoxy
> [...]
> [New LWP 101318 of process 33334]
> 2023-03-16 10:47:16.223 801012700 Connect: Accepted connection from 127.0.0.1 on socket 6
> 2023-03-16 10:47:16.223 801012000 Connect: Waiting for the next client connection. Currently active threads: 1
> 2023-03-16 10:47:16.224 801012700 Header: scan: CONNECT twitter.com:443 HTTP/1.1
> 2023-03-16 10:47:16.225 801012700 Tagging: Tagger 'listen-address' added tag 'LISTEN-ADDRESS: 127.0.1.1:8118'. No action bits update necessary.
> 2023-03-16 10:47:16.225 801012700 Tagging: Tagger 'http-method' added tag 'CONNECT'. Action bits updated accordingly.
> 2023-03-16 10:47:16.225 801012700 Tagging: Tagger 'client-ip-address' added tag 'IP-ADDRESS: 127.0.0.1'. No action bits update necessary.
> 2023-03-16 10:47:16.225 801012700 Header: scan: User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Firefox/102.0
> 2023-03-16 10:47:16.225 801012700 Tagging: Tagger 'user-agent' added tag 'User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Firefox/102.0'. No action bits update necessary.
> 2023-03-16 10:47:16.225 801012700 Header: scan: Proxy-Connection: keep-alive
> 2023-03-16 10:47:16.225 801012700 Header: scan: Connection: keep-alive
> 2023-03-16 10:47:16.226 801012700 Header: scan: Host: twitter.com:443
> 2023-03-16 10:47:16.226 801012700 Header: Modified: User-Agent: Mozilla/5.0 (X11; NetBSD i386; rv:102.0) Gecko/20100101 Firefox/102.0
> 2023-03-16 10:47:16.226 801012700 Header: crumble crunched: Proxy-Connection: keep-alive!
> 2023-03-16 10:47:16.226 801012700 Header: Keeping the client header 'Connection: keep-alive' around. The server connection will be kept alive if possible.
> 2023-03-16 10:47:16.267 801012700 Connect: Performing the TLS/SSL handshake with client. Hash of host: 7905d1c4e12c54933a44d19fcd5f9356
> 2023-03-16 10:47:16.291 801012700 Connect: Client successfully connected over TLSv1.3 (TLS_AES_128_GCM_SHA256).
> 2023-03-16 10:47:16.291 801012700 Header: Waiting for encrypted client headers
> 2023-03-16 10:47:16.291 801012700 Header: Encrypted headers received completely
> 2023-03-16 10:47:16.291 801012700 Header: Destination extracted from "Host" header. New request URL: /TCNOco/status/1634620446002774018
> 2023-03-16 10:47:16.292 801012700 Header: scan: GET /TCNOco/status/1634620446002774018 HTTP/1.1
> 2023-03-16 10:47:16.292 801012700 Tagging: Tagger 'listen-address' added tag 'LISTEN-ADDRESS: 127.0.1.1:8118'. No action bits update necessary.
> 2023-03-16 10:47:16.292 801012700 Tagging: Tagger 'http-method' added tag 'GET'. No action bits update necessary.
> 2023-03-16 10:47:16.292 801012700 Tagging: Tagger 'client-ip-address' added tag 'IP-ADDRESS: 127.0.0.1'. No action bits update necessary.
> 2023-03-16 10:47:16.292 801012700 Header: scan: Host: twitter.com
> 2023-03-16 10:47:16.292 801012700 Header: scan: User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Firefox/102.0
> 2023-03-16 10:47:16.293 801012700 Tagging: Tagger 'user-agent' added tag 'User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Firefox/102.0'. No action bits update necessary.
> 2023-03-16 10:47:16.293 801012700 Header: scan: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
> 2023-03-16 10:47:16.293 801012700 Header: scan: Accept-Language: en-CA
> 2023-03-16 10:47:16.293 801012700 Header: scan: Accept-Encoding: gzip, deflate, br
> 2023-03-16 10:47:16.293 801012700 Header: scan: DNT: 1
> 2023-03-16 10:47:16.293 801012700 Header: scan: Connection: keep-alive
> 2023-03-16 10:47:16.293 801012700 Header: scan: Upgrade-Insecure-Requests: 1
> 2023-03-16 10:47:16.293 801012700 Header: scan: Sec-Fetch-Dest: document
> 2023-03-16 10:47:16.294 801012700 Header: scan: Sec-Fetch-Mode: navigate
> 2023-03-16 10:47:16.294 801012700 Header: scan: Sec-Fetch-Site: none
> 2023-03-16 10:47:16.294 801012700 Header: scan: Sec-Fetch-User: ?1
> 2023-03-16 10:47:16.294 801012700 Header: Modified: User-Agent: Mozilla/5.0 (X11; NetBSD i386; rv:102.0) Gecko/20100101 Firefox/102.0
> 2023-03-16 10:47:16.294 801012700 Header: Keeping the client header 'Connection: keep-alive' around. The server connection will be kept alive if possible.
> 2023-03-16 10:47:16.294 801012700 Header: Accept-Language header crunched and replaced with: Accept-Language: en-nz
> 2023-03-16 10:47:16.294 801012700 Header: Encrypted request headers processed
> 2023-03-16 10:47:16.294 801012700 Request: https://twitter.com/TCNOco/status/1634620446002774018
> 2023-03-16 10:47:16.294 801012700 Redirect: pcrs command "s@^https?://twitter.com/([^?]*)@http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/$1@" changed "https://twitter.com/TCNOco/status/1634620446002774018" to "http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/TCNOco/status/1634620446002774018" (1 hit).
> 
> Thread 2 received signal SIGABRT, Aborted.
> Sent by kill() from pid 33334 and user 1001.
> [Switching to LWP 101318 of process 33334]
> kill () at kill.S:4
> 4	kill.S: No such file or directory.
> (gdb) where
> #0  kill () at kill.S:4
> #1  0x000000080089b4e0 in __fail (msg=0x8007a57a4 "stack overflow detected; terminated") at /usr/src/lib/libc/secure/stack_protector.c:130
> #2  0x000000080089b450 in __stack_chk_fail () at /usr/src/lib/libc/secure/stack_protector.c:137
> #3  0x000000000024abfd in rewrite_url (old_url=0x801c28100 "https://twitter.com/TCNOco/status/1634620446002774018", 
>    pcrs_command=0x801c10000 "s@^https?://twitter.com/([^?]*)@http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/$1@") at filters.c:1038
> #4  0x000000000024ad07 in redirect_url (csp=0x8010f1008) at filters.c:1257
> #5  0x00000000002583b5 in crunch_response_triggered (csp=0x8010f1008, crunchers=0x218920 <crunchers_all>) at jcc.c:953
> #6  0x00000000002569d6 in chat (csp=0x8010f1008) at jcc.c:4482
> #7  0x0000000000255736 in serve (csp=0x8010f1008) at jcc.c:5056
> #8  0x0000000800745a7a in thread_start (curthread=0x801012700) at /usr/src/lib/libthr/thread/thr_create.c:292
> #9  0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
> 
> The rewrite is enabled with an action like:
> 
> {+redirect{s@^https?://twitter.com/([^?]*)@http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/$1@}}
> twitter.com
> 

interesting. if the result is a stack overflow/SIGABRT, should the program not terminate? 
	- i will admit i have not had to address this kind of problem in quite some time, so i am rusty.

> 2023-03-16 10:47:16.294 801012700 Redirect: pcrs command "s@^https?://twitter.com/([^?]*)@http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/$1@" changed "https://twitter.com/TCNOco/status/1634620446002774018" to "http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/TCNOco/status/1634620446002774018" (1 hit).


i would prefer we stick to the regression tests because, from the above, it seems we are replacing the original string with a bigger string, right?
	-i’m not sure how this would be a problem.

alternatively, running the first line from the regression test this is what i get on pcre2:

> GagansMacPro:privoxy Gagan$  curl --proxy 'http://127.0.0.1:8118/' --include  -H 'Proxy-Connection:'  -H 'Connection: close'  -s  -S  --user-agent 'Privoxy-Regression-Test 0.7.3'  --max-time '5'  --globoff http://p.p/show-status 2>&1
> HTTP/1.1 200 OK
> Content-Length: 0
> Content-Type: text/html
> Cache-Control: no-cache
> Date: Thu, 16 Mar 2023 11:33:43 GMT
> Last-Modified: Thu, 16 Mar 2023 11:33:43 GMT
> Expires: Sat, 17 Jun 2000 12:00:00 GMT
> Pragma: no-cache
> Connection: close
> 

whereas from pcre1 it manages to produce far more:

> GagansMacPro:privoxy Gagan$  curl --proxy 'http://127.0.0.1:8118/' --include  -H 'Proxy-Connection:'  -H 'Connection: close'  -s  -S  --user-agent 'Privoxy-Regression-Test 0.7.3'  --max-time '5'  --globoff http://p.p/show-status 2>&1
> HTTP/1.1 200 OK
> Content-Length: 14946
> Content-Type: text/html
> Cache-Control: no-cache
> Date: Thu, 16 Mar 2023 11:38:43 GMT
> Last-Modified: Thu, 16 Mar 2023 11:38:43 GMT
> Expires: Sat, 17 Jun 2000 12:00:00 GMT
> Pragma: no-cache
> Connection: close
> 
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd”>
> *snip*

the question i have is what could cause this? is it the string replacement? i am not sure if the API changes alone are the problem. 

i will have to look at that string (""s@^https?://twitter.com/([^?]*)@http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/$1@ <http://vfaomgh4jxphpbdfizkm5gbtjahmei234giqj4facbwhrfjtcldauqad.onion/$1@>“) and see if it would break in pcre2, since there are changes in regex behaviour: https://stackoverflow.com/a/73767663 <https://stackoverflow.com/a/73767663> 

> Fabian



More information about the Privoxy-devel mailing list