[Privoxy-devel] TODO 157

Lee ler762 at gmail.com
Fri May 26 14:36:11 UTC 2017


sorry for the delay in responding - I got wrapped up in something else
yesterday :(

On 5/25/17, Fabian Keil <fk at fabiankeil.de> wrote:
> Lee <ler762 at gmail.com> wrote:
>
>> On 5/24/17, Fabian Keil <fk at fabiankeil.de> wrote:
>
>> > Lee <ler762 at gmail.com> wrote on ijbswa-developers@:
>
>> >> - what do you hope to gain making it a user-controlled size
>> >
>> > The "best" size depends on the environment Privoxy runs in
>> > and on the Privoxy admin's requirements so hard-coding it
>> > complicates tuning.
>> >
>> > From the commit message in my local tree (which will change
>> > before I commit it):
>>
>> in other words, you've already got a patch?
>
> Yes. I intend to commit it in a couple of days.
>
>> >     Add a receive-buffer-size directive
>> >
>> >     ... that can be used to set the size of the previously statically
>> >     allocated buffer in handle_established_connection().
>> >
>> >     Increasing the buffer size increases Privoxy's memory usage but
>> >     can lower the number of context switches and thereby reduce the
>> >     cpu usage and potentially increase the throughput.
>> >
>> >     This is mostly relevant for fast network connections and
>> >     large downloads that don't require filtering.
>>
>> for whatever it's worth - it makes a difference for me and I don't
>> have a fast network connection.
>
> It's supposed to mean that the impact is bigger when using
> fast network connections.
>
>> How do you feel about turning off filtering for things that most
>> probably don't need it?
  <.. snip ..>
>
> Have you observed these file types to be frequently served
> with a Content-Type that Privoxy considers to be filterable?

remember this thread
  https://sourceforge.net/p/ijbswa/mailman/message/26831156/
The last time I remember looking at it was around that time & I forgot
all about filtering not working on encrypted connections - I just kept
adding to the "don't bother filtering this" list.

Is there an easy way to tell if something is being filtered?

>> >     Currently BUFFER_SIZE is kept as default and lower limit.
>> >
>> >     We should probably change the default to 16384, though,
>> >     while some users may further increase it to 32768 or 65536.
>>
>> take a look at   http://httparchive.org/interesting.php
>
> Which of the graph do you consider relevant?

Average Individual Response Size

The more I think about it, the more I think my logic is wrong -- tcp
takes a few round trips to ramp up & by the time 16KB isn't large
enough the transfer will be done, but 32KB is larger than all but the
average JPG & Video size.

>> Unless you're expecting privoxy to run on a 16 bit machine I'd set the
>> default to 32KB
>
> I don't see the connection here either.

Personal preference?  I'll trade memory usage for speed & 32KB is a
nice round number :)

Or if you mean the 16 bit machine exception, I'm thinking that on a 16
bit machine you've got a 16 bit pointer size, so a single 32KB
allocation would be half of the available memory.


>> >     A dtrace command like:
>> >     sudo dtrace -n 'syscall::read:return /execname == "privoxy"/ {
>> > @[execname] = llquantize(arg0, 10, 0, 5, 20); @m = max(arg0)}'
>> >     can be used to properly tune the receive-buffer-size.

I probably slowed things down considerably by adding a call to
log_error in read_socket & write_socket to show the number of bytes
read.  ^shrug^ dunno if there's a comparable call for windows, but I
figured it was better than nothing.

>> >
>> >     If the buffer is too large it will increase Privoxy's memory
>> >     footprint without any benefit. As the memory is (currently)
>> >     cleared before using it, a buffer that is too large can
>> >     actually reduce the throughput.
>>
>> Is it really necessary to clear the buffer?  Seems like no, but my
>> ability to read & understand privoxy code leaves something to be
>> desired.
>
> Initialising the buffer makes it less likely that bugs result
> in stack or heap data being leaked.

which is a good enuf reason to do it :)

> Some code paths currently
> depend on it.

that's the part I'm not seeing.  *sigh*  I'll look again..

> Obviously they could be changed but this is
> unrelated to this commit.
>
> For reasonable buffer sizes I expect the performance impact
> to be minimal so I don't consider this a priority.

just curious - what do you consider a reasonable buffer size?
I've had it set to 46720 for I don't know how long & haven't noticed
any problems.

>> >     Things could be improved further by upwards scaling the buffer
>> >     dynamically based on how much of the previous allocation
>> >     was actually used.

You might want to think about limiting how much you scale up the
buffer each time.  If the user has window scaling & selective acks
enabled you could see a very large amount of data suddenly become
available.

For example - I had wireshark going while downloading a 42MB file and
there was one time my PC sent 248 duplicate ACKs.  The server did a
single fast retransmit & the ack counter went from 160611961 to
160971153 -- in other words, all of a sudden the OS had 359192 bytes
available for privoxy to read.

dunno how well this will work in an email, but here's the snippet from
wireshark:

 No.    Delta     Clock            Source         Destination   Prot
 Sport   Dport   Len   Info                                      Seq#
      Ack#
44027  0.000011  03:02:32.492782  10.10.2.33     37.187.48.136  TCP
 1122    443      78  [TCP Dup ACK 43531#248] 1122 → 443 [ACK]
1537250603   160611961
44028  0.000029  03:02:32.492811  37.187.48.136  10.10.2.33
TLSv1.2  443   1122     154  [TCP Fast Retransmission]
 160611961  1537250603
44029  0.000556  03:02:32.493367  10.10.2.33     37.187.48.136  TCP
 1122    443      66  1122 → 443 [ACK]
1537250603   160971153

160971153 - 160611961 = 359192

or a messier situation with 326 duplicate ACKs and three holes to fill:

24754  0.000047  03:02:20.071421  10.10.2.33     37.187.48.136  TCP
 1122    443      94  [TCP Dup ACK 24102#326] 1122 → 443 [ACK]
1537250603    142235481
  SACK: 142374489-142803097 142361457-142362905 142336841-142360009

when the last hole was filled in the ack counter went from 142371593
to 142803097 -- or 431504 more bytes available for privoxy to read in
one fell swoop


>> Do you have any idea how often the headers include a 'Content-Length:
>> nnn' line? Seems like allocation a larger buffer once would be
>> better/faster than trying to figure it out on the fly.
>
> I'm proposing to scale the buffer based on how much data
> Privoxy could read from the network with on read_socket()
> call. Whether or not a Content-Length header is available
> isn't relevant for this to work.

I'm just thinking it might be a pretty good hint as to what the
initial buffer size should be.

> If a large buffer isn't needed, allocating a large buffer
> anyway is unlikely to help.

right - zeroing out a large buffer where most of it won't be used is
just wasted overhead

>> >     Additionally the buffer should be referenced through
>> >     csp and also be used for other receive-related functions.
>>
>> I'm lazy/ ignorant/ can't read code/ whatever.. take your pick :) but
>> what other receive-related functions are there?  seems like
>> jcc.c:handle_established_connection does it all.
>
> You can grep for read_socket().

thanks!  I'll try that


>> >     Measured throughput when using four connections to
>> >     constantly request a 10 MB file:
>> >
>> >     ~320 MB/s with the default
>> >     ~400 MB/s with "receive-buffer-size    8192"
>> >     ~490 MB/s with "receive-buffer-size   16384"
>> >     ~610 MB/s with "receive-buffer-size   32768"
>> >     ~700 MB/s with "receive-buffer-size   65536"
>> >     ~755 MB/s with "receive-buffer-size  131072"
>> >     ~795 MB/s with "receive-buffer-size  262144"
>> >     ~804 MB/s with "receive-buffer-size  524288"
>> >     ~798 MB/s with "receive-buffer-size 1048576"
>> >     ~780 MB/s with "receive-buffer-size 2097152"
>>
>> Interesting test setup.  804MB = 6,432 Gigabits/sec which means the
>> test was done with at least 10Gb/s links.
>>
>> >     Sponsored by: Robert Klemme
>>
>> Robert Klemme has a 10Gb/s Internet connection!!?
>
> I don't know. I made those tests in a bhyve vm.
> While it has an emulated 10GB/s interface the tests
> used the loopback interface and not the external network
> or the Internet.

Is it possible to have a dropped packet when you're using the loopback
interface?

Doing your testing on a vm seems like a good way to figure out where
all the bottlenecks are, but it seems like you'd be missing most if
not all of the nasty things that happen on the internet like dropped
packets.

>> >> https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Declaring-Arrays
>> >>   Another GNU extension allows you to declare an array size using
>> >> variables, rather than only constants.
>> >
>> > I don't like GNU extensions in general but in this case a
>> > stack-allocated array is also inappropriate as the "best" buffer size
>> > for many environments would likely result in Privoxy reaching the
>> > stack size limit when there are more than a couple of threads at the
>> > same time.
>>
>> How do you make the buffer size allocation dynamic then?  Seems like
>> handle_established_connection is only called once per request, so how
>> do you change the buffer size after the initial allocation?
>
> Once the buffer is allocated on the heap it can be resized
> using realloc() or free'd and allocated again with the new size.

OK

Thanks,
Lee


More information about the Privoxy-devel mailing list