[Privoxy-users] Help filtering text or syntax in a web page.
Fabian Keil
fk at fabiankeil.de
Tue Dec 28 11:53:05 UTC 2021
Jason Parson <jasonlparson at gmail.com> wrote on 2021-12-20 at 23:28:58
(forwarding with his permission):
> Is it possible to filter text in a web page. I want to block any reference
> to places like those shown below. I use xmatrix or ublock to see if the
> pages have the sites attached to them. Here is an example. I don't want
> adodedtm.com, go-impulse.net, unpkg.com or googleapis.com to show up. I want
> the privoxy to block the sites completely. Is it possible to filter these
> and other sites like it
Yes. Privoxy can remove URLs from pages in which case the client
will not make the request.
Letting Privoxy block requests is usually easier, though.
> I have reviewed the documentation a number of times. Either I am missing it
> or it is not there.
Filters are documented at:
https://www.privoxy.org/user-manual/filter-file.html
> I would also, like to find out if it is possible to filter content the same
> way uBlock does. Here is an example of my filters. I hope you understand
> it. I want to be able to filter third-party sites, inline-scripts,
> inline-fonts etc. and other parameters
>
> ||amazonaws*^$important,all
>
> ||google*^$important,all
>
> ||googletagmanager^$important,all
>
> ||google-analytics^$important,all
>
> ||googletagservices^$important,all
>
> ||adobedtm^$important,all
>
> ||launchdarkly^$important,all
>
> ||newrelic^$important,all
>
> ||segment^$important,all
>
> ||amazon-adsystem.com^$important,all
>
> ||aswpsdkus*^$important,all
>
> ||chartbeat*^$important,all
>
> ||criteo.net^$important,all
>
> ||demdex*^$important,all
>
> ||doubleclick.net^$important,all
>
> ||go-mpulse^$important,all
>
> ||h-cdn*^$important,all
>
> ||imrworldwide.com^$important,all
>
> ||lytics*^$important,all
>
> ||omny*^$important,all
>
> ||opecloud.com^$important,all
>
> ||rlcdn.com^$important,all
>
> ||taplytics*^$important,all
>
> ||yimg*^$important,all
>
> ||litix*^$important,all
>
> ||cachefly*^$important,all
>
> ||s2.pluralsight.com/analytics/*^$important,all
>
> ||pluralsight.com/analytics/*^$important,all
>
> ||demdex.net^$important,all
>
> ||indexww*^$important,all
>
> ||ispot*^$important,all
>
> ||kampyle*^$important,all
>
> ||scene7*^$important,all
>
> ||facebook*^$important,all
>
> ||romote*^$important,all
>
> ||hibu*^$important,all
>
> ||audioeye*^$important,all
>
> ||flipboard*^$important,all
>
> #||arkoselabs.com*^$important,all
>
> #||.^$important,all
>
>
>
> *$1p,important,all
>
> *$3p,important,all
>
> *$image,important,all
>
> *$third-party,important,all
>
> *$frame,important,all
>
> *$script,important,all
>
> *$document,important,all
>
> *$subdocument,important,all
>
> *$stylesheet,important,all
>
> *$css,important,all
>
> *$xhr,important,all
>
> *$beacon,important,all
>
> *$font,important,all
>
> *$inline-font,important,all
>
> *$inline-script,important,all
>
> *$ghide,important,all
>
> *$first-party,important,all
>
> *$doc,important,all
>
> *$shide,important,all
>
> *$strict1p,important,all
>
> *$popunder,important,all
>
> *$strict3p,important,all
>
> *$empty,important,all
>
> #*$all,important
>
>
>
> *$removeparam
>
> *$removeparam=/^*
>
> *$removeparam=/^/
>
> *$removeparam=/^utm*_/
>
> *$removeparam=/^utm*_*/
>
> *$removeparam=/^__*/
>
> *$removeparam=/sp_
>
> *$removeparam=/^_*/
>
> *$removeparam=/^-/
>
>
>
> #@#+js()
>
>
>
> #*/*.js$important,all,redirect=noopjs:100
>
> #||*/*.js$important,all,redirect-rule=noopjs
>
>
>
> #||google.*/*.js$all,redirect=noopjs:100
>
> #||googleapis.*/*.*$all,redirect=noopjs:100
>
>
>
> @@||app.pluralsight.com^$stylesheet,domain=app.pluralsight.com
>
> @@||app.pluralsight.com^$image,domain=app.pluralsight.com
>
> @@||app.pluralsight.com^$script,domain=app.pluralsight.com
>
> @@||app.pluralsight.com^$xhr,domain=app.pluralsight.com
>
> @@||img.pluralsight.com^$image,domain=app.pluralsight.com
>
> @@||s2.pluralsight.com^$stylesheet,domain=app.pluralsight.com
>
> @@||s2.pluralsight.com^$image,domain=app.pluralsight.com
>
> @@||s2.pluralsight.com^$script,domain=app.pluralsight.com
>
> @@||vid21.pluralsight.com^$xhr,domain=app.pluralsight.com
>
> @@||vid5.pluralsight.com^$xhr,domain=app.pluralsight.com
>
> @@||gravatar.com^$image,domain=app.pluralsight.com
>
> @@||vid5.pluralsight.com^$xhr,domain=app.pluralsight.com
>
>
>
> @@||imgix.net^$image,domain=pluralsight.imgix.net
>
> @@||imgix2.net^$image,domain=pluralsight.imgix.net
>
>
>
> ! 2021-10-29 https://www.pluralsight.com
>
> @@||www.pluralsight.com.cdn.cloudflare.net^$stylesheet,domain=www.pluralsigh
> t.com
>
> @@||www.pluralsight.com.cdn.cloudflare.net^$image,domain=www.pluralsight.com
>
>
>
> @@||google.com^$stylesheet,domain=google.com
>
> @@||google.com^$image,domain=google.com
>
>
>
> @@||gstatic.com^$stylesheet,domain=paypal.com
>
> @@||gstatic.com^$image,domain=paypal.com
>
> @@||gstatic.com^$script,domain=paypal.com
>
>
>
> @@||gstatic.com^$stylesheet,domain=paypalobjects.com
>
> @@||gstatic.com^$image,domain=paypalobjects.com
>
> @@||gstatic.com^$script,domain=paypalobjects.com
>
>
>
> @@||gstatic.com^$stylesheet,domain=recaptcha.net
>
> @@||gstatic.com^$image,domain=recaptcha.net
>
> @@||gstatic.com^$script,domain=recaptcha.net
I'm not familiar with this syntax but maybe others on this list are.
> And my last question. When I go to a site. Sometimes it works and other
> times it does not. Do you have or know of a tool that can tell me why a
> pages fails as it relates to privoxy. I want my system security very strict.
> I need to know why a given site does not work.
In general it helps to enable debugging as described at:
<https://www.privoxy.org/user-manual/contact.html>
and check the logs.
Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.privoxy.org/pipermail/privoxy-users/attachments/20211228/50e35663/attachment.bin>
More information about the Privoxy-users
mailing list