[Privoxy-devel] add a new cgi function?

Lee ler762 at gmail.com
Sat Aug 4 11:41:03 UTC 2018


On 8/4/18, Fabian Keil <fk at fabiankeil.de> wrote:
> Lee <ler762 at gmail.com> wrote:
>
>> On 8/1/18, Lee <ler762 at gmail.com> wrote:
>
>> > How do you feel about adding a 'show url info' cgi function more
>> > suitable for use by a program?
>
> Seems reasonable.
>
>> > show-url-info hits the disk way too much..  I had process monitor
>> > running
>> >   ( https://docs.microsoft.com/en-us/sysinternals/downloads/procmon )
>> > and each call to show-url-info reads
>> > templates\show-url-info   Offset: 0, Length: 4,096, Priority: Normal
>> > templates\show-url-info   Offset: 4,096, Length: 4,096
>> > templates\show-url-info   Offset: 8,192, Length: 1,998
>> > templates\mod-title
>> > templates\mod-unstable-warning
>> > templates\mod-local-help
>> > templates\mod-support-and-service
>> >
>> > and takes about 0.001 second.  Which seems fast enough until you
>> > realize disk i/o is going to take about a minute for every 60,000
>> > calls.   And the hosts file I'm working with is
>> > $ grep -v "^#" unified-hosts.txt | wc -l
>> > 63266
>> >
>> >
>> > The other big change was not checking for file system changes in jcc.c
>> > ::
>> > serve
>> > $ git diff jcc.c
>> > diff --git a/jcc.c b/jcc.c
>> > index d1fca148..df4f1448 100644
>> > --- a/jcc.c
>> > +++ b/jcc.c
>> > @@ -3311,7 +3311,10 @@ static void serve(struct client_state *csp)
>> >           }
>> >        }
>> >
>> > -      if (continue_chatting && any_loaded_file_changed(csp))
>> > +/*
>> > + *    don't check for action/filter file changes if processing cgi
>> > requests
>> > + */
>> > +      if (continue_chatting && !(csp->flags & CSP_FLAG_CRUNCHED) &&
>> > any_loaded_file_changed(csp))
>> >        {
>> >           continue_chatting = 0;
>> >           config_file_change_detected = 1;
>> >
>> >
>> > I was thinking I'd just make another cgi function that would call
>> > any_loaded_file_changed & that would take care of it.
>>
>> except it looks like any_loaded_file_changed doesn't actually reload
>> anything that changed.  Rather than add another function I added this
>> bit to cgisimple.c so that show-status would make sure everything was
>> current before showing the status:
>> diff --git a/cgisimple.c b/cgisimple.c
>> index 1f13d5ce..6e1c9387 100644
>> --- a/cgisimple.c
>> +++ b/cgisimple.c
>> @@ -1084,6 +1084,15 @@ jb_err cgi_show_status(struct client_state *csp,
>>     assert(rsp);
>>     assert(parameters);
>>
>> +   /*
>> +    *  make sure config files are current
>> +    */
>> +   if (run_loader(csp))
>> +   {
>> +      log_error(LOG_LEVEL_FATAL, "a loader failed - must exit");
>> +      /* Never get here - LOG_LEVEL_FATAL causes program exit */
>> +   }
>> +
>>     if ('\0' != *(lookup(parameters, "file")))
>>     {
>>
>> are you ok with that change?
>
> Why is this necessary? run_loader() is called from listen_loop()
> already.

My guess[1] is because I've got this configured
  default-server-timeout 300
my program adds a
  "Connection: keep-alive"
to the request, and I didn't want disk accesses slowing things down
while I'm checking urls, so I changed serve to not call
any_loaded_file_changed for cgi requests.
  (see attached diff.txt)

We had a "TODO list proposal" thread back in April 2009 where I came up with
# awk script to see if a URL is blocked by Privoxy or not
I ended up with two curl calls to show-url-info before running the
script because one call didn't always reload the config.  Adding this
code to show-status guarantees I need to call curl only once to
refresh the config files.

background: Windows 10 isn't so much an operating system as an O/S As A Service
Major updates tend to reset things back to the default; my original
block-test.awk was taking almost 20 minutes to run but after I got a
windows update last week that required rebooting the next time I ran
the original program it took almost 36 minutes.  I probably need to
exclude things from anti-virus checking again, but even with that it's
still going to take way too long.  Using the gawk |& extension to talk
to a co-process and changing the privoxy code to not read from disk
gets it down to about 1 minute.

Lee

[1] I have a hard time following the code & it's quite possible I
missed or misunderstood something.
-------------- next part --------------
diff --git a/actions.c b/actions.c
index 6249de9e..d87b0cc9 100644
--- a/actions.c
+++ b/actions.c
@@ -778,6 +778,54 @@ jb_err merge_current_action (struct current_action_spec *dest,
 }
 
 
+/*********************************************************************
+ *
+ * Function    :  merge_single_actions
+ *      same thing as merge_current_action except
+ *      skip processing of multi actions
+ *      no, i have no idea what the diff is between single & multi actions
+ *
+ * Description :  Merge two actions together.
+ *                Similar to "dest += src".
+ *                Differences between this and merge_actions()
+ *                is that this one doesn't allocate memory for
+ *                strings (so "src" better be in memory for at least
+ *                as long as "dest" is, and you'd better free
+ *                "dest" using "free_current_action").
+ *                Also, there is no  mask or remove lists in dest.
+ *                (If we're applying it to a URL, we don't need them)
+ *
+ * Parameters  :
+ *          1  :  dest = Current actions, to modify.
+ *          2  :  src = Action to add.
+ *
+ * Returns  0  :  no error
+ *        !=0  :  error, probably JB_ERR_MEMORY.
+ *
+ *********************************************************************/
+jb_err merge_single_actions (struct current_action_spec *dest,
+                             const struct action_spec *src)
+{
+   int i;
+   jb_err err = JB_ERR_OK;
+
+   dest->flags  &= src->mask;
+   dest->flags  |= src->add;
+
+   for (i = 0; i < ACTION_STRING_COUNT; i++)
+   {
+      char * str = src->string[i];
+      if (str)
+      {
+         str = strdup_or_die(str);
+         freez(dest->string[i]);
+         dest->string[i] = str;
+      }
+   }
+   return err;
+}
+
+
 /*********************************************************************
  *
  * Function    :  update_action_bits_for_tag



diff --git a/actions.h b/actions.h
index af401766..a36c74f9 100644
--- a/actions.h
+++ b/actions.h
@@ -70,6 +70,8 @@ extern void init_current_action     (struct current_action_spec *dest);
 extern void free_current_action     (struct current_action_spec *src);
 extern jb_err merge_current_action  (struct current_action_spec *dest,
                                      const struct action_spec *src);
+extern jb_err merge_single_actions  (struct current_action_spec *dest,
+                                     const struct action_spec *src);
 extern char * current_action_to_html(const struct client_state *csp,
                                      const struct current_action_spec *action);
 extern char * actions_to_line_of_text(const struct current_action_spec *action);



diff --git a/cgi.c b/cgi.c
index 22601760..bfcab1c9 100644
--- a/cgi.c
+++ b/cgi.c
@@ -112,6 +112,10 @@ static const struct cgi_dispatcher cgi_dispatchers[] = {
          cgi_show_url_info,
          "Look up which actions apply to a URL and why",
          TRUE },
+   { "show-url-final-info",
+         cgi_show_url_final_info,
+         "Look up the final actions that apply to a URL",
+         TRUE },
 #ifdef FEATURE_TOGGLE
    { "toggle",
          cgi_toggle,



diff --git a/cgisimple.c b/cgisimple.c
index 1f13d5ce..6e1c9387 100644
--- a/cgisimple.c
+++ b/cgisimple.c
@@ -1084,6 +1084,15 @@ jb_err cgi_show_status(struct client_state *csp,
    assert(rsp);
    assert(parameters);
 
+   /*
+    *  make sure config files are current
+    */
+   if (run_loader(csp))
+   {
+      log_error(LOG_LEVEL_FATAL, "a loader failed - must exit");
+      /* Never get here - LOG_LEVEL_FATAL causes program exit */
+   }
+
    if ('\0' != *(lookup(parameters, "file")))
    {
       return cgi_show_file(csp, rsp, parameters);
@@ -1691,6 +1700,245 @@ jb_err cgi_show_url_info(struct client_state *csp,
 }
 
 
+/*********************************************************************
+ *
+ * Function    :  cgi_show_url_final_info
+ *
+ * Description :  CGI function that shows just the "Final results:"
+ *                section from cgi_show_url_info.
+ *                If all you want to know is if a URL would be blocked
+ *                or not, this is the function for you!
+ *
+ * Parameters  :
+ *          1  :  csp = Current client state (buffers, headers, etc...)
+ *          2  :  rsp = http_response data structure for output
+ *          3  :  parameters = map of cgi parameters
+ *
+ * CGI Parameters :
+ *            url : The url whose actions are to be determined.
+ *                  If url is unset, the url-given conditional will be
+ *                  set, so that all but the form can be suppressed in
+ *                  the template.
+ *
+ * Returns     :  JB_ERR_OK on success
+ *                JB_ERR_MEMORY on out-of-memory error.
+ *
+ *********************************************************************/
+jb_err cgi_show_url_final_info(struct client_state *csp,
+                               struct http_response *rsp,
+                               const struct map *parameters)
+{
+   char *url_param;
+   struct map *exports;
+
+   assert(csp);
+   assert(rsp);
+   assert(parameters);
+
+   if (NULL == (exports = default_exports(csp, "show-url-final-info")))
+   {
+      return JB_ERR_MEMORY;
+   }
+
+   /*
+    * Get the url= parameter (if present) and remove any leading/trailing spaces.
+    */
+   url_param = strdup_or_die(lookup(parameters, "url"));
+   chomp(url_param);
+
+   /*
+    * Handle prefixes.  4 possibilities:
+    * 1) "http://" or "https://" prefix present and followed by URL - OK
+    * 2) Only the "http://" or "https://" part is present, no URL - change
+    *    to empty string so it will be detected later as "no URL".
+    * 3) Parameter specified but doesn't start with "http(s?)://" - add a
+    *    "http://" prefix.
+    * 4) Parameter not specified or is empty string - let this fall through
+    *    for now, next block of code will handle it.
+    */
+   if (0 == strncmp(url_param, "http://", 7))
+   {
+      if (url_param[7] == '\0')
+      {
+         /*
+          * Empty URL (just prefix).
+          * Make it totally empty so it's caught by the next if ()
+          */
+         url_param[0] = '\0';
+      }
+   }
+   else if (0 == strncmp(url_param, "https://", 8))
+   {
+      if (url_param[8] == '\0')
+      {
+         /*
+          * Empty URL (just prefix).
+          * Make it totally empty so it's caught by the next if ()
+          */
+         url_param[0] = '\0';
+      }
+   }
+   else if ((url_param[0] != '\0')
+      && ((NULL == strstr(url_param, "://")
+            || (strstr(url_param, "://") > strstr(url_param, "/")))))
+   {
+      /*
+       * No prefix or at least no prefix before
+       * the first slash - assume http://
+       */
+      char *url_param_prefixed = strdup_or_die("http://");
+
+      if (JB_ERR_OK != string_join(&url_param_prefixed, url_param))
+      {
+         free_map(exports);
+         return JB_ERR_MEMORY;
+      }
+      url_param = url_param_prefixed;
+   }
+
+   if (url_param[0] == '\0')
+   {
+      /* URL paramater not specified, display query form only. */
+      free(url_param);
+      if (map_block_killer(exports, "url-given")
+        || map(exports, "url", 1, "", 1))
+      {
+         free_map(exports);
+         return JB_ERR_MEMORY;
+      }
+   }
+   else
+   {
+      /* Given a URL, so query it. */
+      jb_err err;
+      char *s;
+      struct file_list *fl;
+      struct url_actions *b;
+      struct http_request url_to_query[1];
+      struct current_action_spec action[1];
+      int i;
+
+      if (map(exports, "url", 1, html_encode(url_param), 0))
+      {
+         free(url_param);
+         free_map(exports);
+         return JB_ERR_MEMORY;
+      }
+
+      init_current_action(action);
+
+      if (map(exports, "default", 1, current_action_to_html(csp, action), 0))
+      {
+         free_current_action(action);
+         free(url_param);
+         free_map(exports);
+         return JB_ERR_MEMORY;
+      }
+
+      memset(url_to_query, '\0', sizeof(url_to_query));
+      err = parse_http_url(url_param, url_to_query, REQUIRE_PROTOCOL);
+      assert((err != JB_ERR_OK) || (url_to_query->ssl == !strncmpic(url_param, "https://", 8)));
+
+      free(url_param);
+
+      if (err == JB_ERR_MEMORY)
+      {
+         free_http_request(url_to_query);
+         free_current_action(action);
+         free_map(exports);
+         return JB_ERR_MEMORY;
+      }
+      else if (err)
+      {
+         /* Invalid URL */
+
+         err = map(exports, "matches", 1, "<b>[Invalid URL specified!]</b>" , 1);
+         if (!err) err = map(exports, "final", 1, lookup(exports, "default"), 1);
+         if (!err) err = map_block_killer(exports, "valid-url");
+
+         free_current_action(action);
+         free_http_request(url_to_query);
+
+         if (err)
+         {
+            free_map(exports);
+            return JB_ERR_MEMORY;
+         }
+
+         return template_fill_for_cgi(csp, "show-url-final-info", exports, rsp);
+      }
+
+      for (i = 0; i < MAX_AF_FILES; i++)
+      {
+         if (NULL == csp->config->actions_file_short[i]
+             || !strcmp(csp->config->actions_file_short[i], "standard.action")) continue;
+
+         b = NULL;
+         if ((fl = csp->actions_list[i]) != NULL)
+         {
+            if ((b = fl->f) != NULL)
+            {
+               b = b->next;
+            }
+         }
+
+         for ( ; b != NULL; b = b->next)
+         {
+            if (url_match(b->url, url_to_query))
+            {
+               /* if (merge_current_action(action, b->action))   -LR-  orig */
+               if (merge_single_actions(action, b->action))
+               {
+                  free_http_request(url_to_query);
+                  free_current_action(action);
+                  free_map(exports);
+                  return JB_ERR_MEMORY;
+               }
+            }
+         }
+      }
+
+      free_current_action(csp->action);
+      get_url_actions(csp, url_to_query);
+
+      free_http_request(url_to_query);
+
+      s = current_action_to_html(csp, action);
+
+      free_current_action(action);
+
+      if (map(exports, "final", 1, s, 0))
+      {
+         free_map(exports);
+         return JB_ERR_MEMORY;
+      }
+   }
+
+   /* return template_fill_for_cgi(csp, "show-url-final-info", exports, rsp);   -LR- */
+   rsp->body = \
+"<!DOCTYPE html><html lang=\"en\"><head><title>URL Block Info</title></head>\n"\
+"<body><table cellpadding=\"20\" cellspacing=\"10\" border=\"0\" width=\"100%\">\n"\
+"<!-- @if-url-given-start -->\n"\
+"<!-- @if-valid-url-start -->\n"\
+"<tr><td><h2>Final results:</h2>\n"\
+"<b>@final@</b>\n"\
+"</td></tr>\n"\
+"<!-- if-valid-url-end@ -->\n"\
+"<!-- if-url-given-end@ -->\n"\
+"<tr><td><h2>Look up the actions for a URL:</h2>\n"\
+"<form method=\"GET\" action=\"@default-cgi at show-url-final-info\">\n"\
+"<p><input type=\"text\" name=\"url\" size=\"80\" value=\"@url@\"><input type=\"submit\" value=\"Go\"></p>\n"\
+"</form>\n"\
+"</td></tr></table>\n"\
+"</body></html>\n";
+
+   template_fill(&rsp->body, exports);
+   free_map(exports);
+   return 0;
+
+}
+
+
 /*********************************************************************
  *
  * Function    :  cgi_robots_txt



diff --git a/cgisimple.h b/cgisimple.h
index 52642a40..790574f3 100644
--- a/cgisimple.h
+++ b/cgisimple.h
@@ -61,6 +61,9 @@ extern jb_err cgi_show_status  (struct client_state *csp,
 extern jb_err cgi_show_url_info(struct client_state *csp,
                                 struct http_response *rsp,
                                 const struct map *parameters);
+extern jb_err cgi_show_url_final_info(struct client_state *csp,
+                                      struct http_response *rsp,
+                                      const struct map *parameters);
 extern jb_err cgi_show_request (struct client_state *csp,
                                 struct http_response *rsp,
                                 const struct map *parameters);



diff --git a/jcc.c b/jcc.c
index d1fca148..df4f1448 100644
--- a/jcc.c
+++ b/jcc.c
@@ -3311,7 +3311,10 @@ static void serve(struct client_state *csp)
          }
       }
 
-      if (continue_chatting && any_loaded_file_changed(csp))
+/*
+ *    don't check for action/filter file changes if processing cgi requests
+ */
+      if (continue_chatting && !(csp->flags & CSP_FLAG_CRUNCHED) && any_loaded_file_changed(csp))
       {
          continue_chatting = 0;
          config_file_change_detected = 1;



More information about the Privoxy-devel mailing list