URL command documentation

URL commands can be used to show information about a webpage and to bypass certain features.

URL command documentation

Postby jasonmc » Sat Jul 10, 2004 2:33 am

One of the most important and useful features of safesquid seem to be completely missing from the documentation, so here's a list of all the URL commands and what they do:


profiles - display a list of enabled profiles
https - make an https SSL request from a non-SSL client, also can be used to process HTTPS content (remove banners, scan viruses)
i.e. http://https..www.cibc.com would be the same as https://www.cibc.com

these 2 features are sorta designed to work together:
prefetch - prefetch a file in the background without downloading it to the client.
template - display a template instead of the requested file.

I put these in because someone once requested there be a way to download banner ads but not display them, so the website
is still getting money for ad impressions. What you would do with this is do a regular expression substitution inside the webpage
body prefix the ad image url with "template[tinygif]..prefetch.." like http://template[tinygif]..prefetch..www.adsite.com/ad.jpg
Or I suppose you could also use the redirect feature to redirect requests for banners to this.

offline - browse in offline mode, only cached files can be viewed.. and cache files won't be validated if they're stale.
filter - display any matching filter entry for requested URL.
cache - display information about a cached file
proxytest - this one is neat :) when forwarding to another proxy, this will make the proxy connect back to safesquid
and safesquid will display the headers that would have been passed onto the website... I added this so if someone wishes to surf
anonymously through open proxies they can see if the website can still identify them.

bypass[OPTIONS] - this can selectively bypass (or unbypass) most features. OPTIONS is a string of letters representing the features.
here's are the available letters:
f - url filtering
h - header filtering (both client and server)
m - mime filtering
r - URL redirection
c - cookie filtering
w - rewriting
e - external programs (both request and response)
p - forwarding
k - keyword filtering
d - dns blacklist
l - limits (I should probably remove this one :))
a - antivirus scanning
i - ICAP

a + or - symbol can be used to change between bypassing and unbypassing, if the feature was bypassed in the access entry.

some examples:
bypass[fh]..www.slashdot.org <- bypasses URL and header filtering
bypass[e-i]..www.safesquid.com <- bypass external programs and UN-bypass ICAP
bypass..www.exn.ca <- bypass everything

bypassing is useful to work around sites that are having problems with some types of filtering.

fresh - fetch fresh copy of file from website, instead of using cache. Sometimes the cache refresh logic
gets things wrong (not my fault, I followed the RFC.. but not every website does), so this can fix that.
raw - show raw file (HTML), on FTP directory lists it'll show the raw listing.
cookies - display cookies sent to and received from website.
mime - show matching mime entry for requested URL.
headers - show headers sent by browser and received from website.
score - show score for page when doing keyword filtering.
diff - this will show the diff-like output of the changes made by the rewrite feature to a webiste, useful for debugging
regular expression patterns.
htmltree - pretty much useless, I put it there to debug my HTML parser when prefetching. It'll show a parsed HTML tree.
might be useful for people wanting to debug their HTML *shrug*
process - I just added this, it'll bypass the maxbuffer setting and buffer/process the file anyways, so if someone wants to scan
a large file for virsues they can use this.

There's a few other things to note:

- when a URL command is used on a site that sends back a 302 redirect, the URL command is added to the URL in the Location header,
so that the URL command still applies when the browser follows the redirect.
- when a request is made that has a URL command in the Referer header but not in the URL (like when someone clicks a link
on a page they used a URL command on), the proxy will send a 302 redirect to the same URL but with URL commands. This makes it
possible to continuously browse with features bypassed.
- URL commands are also extracted from the Host header, so they work when the proxy server is transparent.
- URL commands are also prefixed to URL's sent by the Redirect feature, well.. except if 'bypass' or 'bypass[r]' is used since the redirect feature would be bypassed.
jasonmc
 
Posts: 616
Joined: Thu Apr 15, 2004 7:42 pm
Location: Ontario

Return to URL command

Who is online

Users browsing this forum: No registered users and 1 guest

cron