Mind Chasers Inc.
Mind Chasers Inc.

Basic Python Squid Redirector / Rewriter for Content Filtering / Ad blocking

A basic Python URL rewriter is developed and tested searching for both key words and expressions. The test configuration is also reviewed

advertisement

Overview

The Squid caching proxy is an excellent, long established open source project with an active mail list. Aside from the core proxy and cache functionality, Squid is also great for managing, filtering, & analyzing HTTP and HTTPS accesses. An example of this is using a content filter to either rewrite or redirect URLs, and a typical application for this is blocking tracking sites and objectionable content, such as porn. This article briefly reviews writing a simple content filter in Python and testing it.

For development and testing, we set up two squid servers:

  1. Embedded Linux system running a Yocto-built distribution with Linux (similar to what we use to protect our own network)
  2. A PC running Ubuntu Linux 18.04

Each Squid server is positioned between our ISP firewall and our test client. Our test client is a second PC running Ubuntu Linux 18.04, and we make use of the command line wget utility to test our squid servers.

Figure 1. High Level Test Network
squid server system block diagram

Server Configuration

On our Yocto system, we are running Squid version 3.5.26, and on Ubuntu we are running version 3.5.27. We installed squid on Ubuntu using the apt package manager:

$ lsb_release -a

Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic

$ sudo apt install squid

$ squid -v
Squid Cache: Version 3.5.27
Service Name: squid
Ubuntu linux
configure options:  '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' 
'--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' 
'--libexecdir=${prefix}/lib/squid3' '--srcdir=.' '--disable-maintainer-mode' '--disable-dependency-tracking' 
...

On Yocto, we make a few tweaks to the server. For example, make sure we can view our Squid logs at /var/log/squid, and the owner is "squid".:

# mkdir -p /var/log/squid
# chown -R squid:squid /var/log/squid

Note that on Ubuntu, squid runs by default as the user proxy.

On both servers, we need to set up an access control list (acl), so we can use a remote client:

squid.conf
acl localnet src 192.168.3.0/24

http_access allow localnet

Client Configuration

For our testing on the client, we temporarily configure use of the proxy in a shell (e.g., gnome-terminal). Before doing this, make sure wget is working properly.

$ wget http://example.com 

$ export http_proxy=<server ip address>:3128

Basic Squid Test

As a basic sanity test, we'll run Squid on each server without our rewriter (content filter) enabled and run a few wget requests on our client. On your Ubuntu server, you can start squid with "$ sudo squid".

$ wget http://cnn.com

$ wget http://foxnews.com 

View the contents of the files that should have been returned (e..g, index.html). You should see HTML content in the files from the sites you requested. If it's not working properly, view the squid cache.log file on the server at /var/log/squid.

While you're testing, keep in mind the following useful squid commands (On Ubuntu, preface each with a sudo):

squid  # start the server
squid -k reconfigure # restart the server each time you tweak your redirector
squid -k interrupt # bring squid to a stop
squid -k check # is it running? 

For additional help configuring Squid, Kulbir Saini's Squid Proxy Server 3.1 is an excellent reference, even for Squid version 4.

Testing the Custom Content Filter

Now it's time to enable our rewriter and test it out. The relevant excerpt of our squid.conf file is shown below. If you're unfamiliar with these settings, then refer to the references provided below.

squid.conf
url_rewrite_extras "%>a %>rm %un"
url_rewrite_children 3 startup=0 idle=1 concurrency=10
url_rewrite_program /build/squid_redirect/squid-redirect.py

Refer to the source / python script provided below and move it to the location specified in your squid.conf file for the url_rewrite_program directive. Also, refer to the reference for this directive for an understanding of the communication between the Squid server and the redirector helper.

After making changes to squid.conf or your Python redirect script squid-redirect.py, don't forget to restart your server and check that it's running. On our Ubuntu Squid server, we enter:

$ sudo squid -k reconfigure

$ ps -e | grep squid
24122 ?        00:00:00 squid
24124 ?        00:00:00 squid

This particular redirector will perform:

  • a rewrite if "sex" is in the URL (not a good idea, e.g., would block http://oasisunisex.com/)
  • a redirect if the URL suffix is ".xxx".

Please keep in mind that this is just example / demo code to show how things work.

Debugging the Custom Content Filter

Notice that the content filter script makes use of Python's logging facility. On our Ubuntu 18.04 system, we find the squid-redirect.log file at /var/spool/squid. In it, we find lines such as:

squid-redirect.log
DEBUG:root:2018-11-24 00:53:09: 0 http://sex.xxx/ 127.0.0.1 GET -

The "0 http://sex.xxx/ 127.0.0.1 GET -" string is what was passed to our content filter script by Squid.

For development and testing, the squid-redirect.py file can be executed in a shell without running Squid. Open up a new shell on the server in the folder where squid-redirect.py is located, execute squid-redirect.py, and start passing it requests using stdin (type them in yourself):

$ cd /build/squid_redirect/

$ python3 squid-redirect.py
0 http://sex.xxx/ 127.0.0.1 GET -
0 OK rewrite-url=http://example.com

When executing this script locally in your shell, Python logging will write the squid-redirect.log file to the same local directory.

Python script: squid-redirect.py
#!/usr/bin/env python3
""" 
Copyright 2018 Mind Chasers Inc,
file: squid-redirect.py
Demo code.  No warranty of any kind.  Use at your own risk
"""

import re
import sys
import logging
from datetime import datetime

logging.basicConfig(filename='squid-redirect.log',level=logging.DEBUG)
xxx = re.compile('\.xxx?/$')

def main():
    """
        keep looping and processing requests
        request format is based on url_rewrite_extras "%>a %>rm %un"
    """
    request  = sys.stdin.readline()
    while request:
        [ch_id,url,ipaddr,method,user]=request.split()
        logging.debug(datetime.now().strftime('%Y-%m-%d %H:%M:%S') + ': ' + request +'\n')
        response  = ch_id + ' OK'
        if 'sex' in url:
            response +=  ' rewrite-url=http://example.com'
        elif xxx.search(url):
            response +=  ' status=301 url=http://example.com'
        response += '\n'
        sys.stdout.write(response)
        sys.stdout.flush()
        request = sys.stdin.readline()
    
if __name__ == '__main__':
    main()

References

Didn't find an answer to your question? Post your issue below or in our new FORUM, and we'll try our best to help you find a solution.

And please note that we update our site daily with new content related to our open source approach to network security and system design. If you would like to be notified about these changes, then please follow us on Twitter and join our mailing list.

share
subscribe to mailing list:

Date: July 26, 2018

Author: dinusha

Comment:

I tried redirecting program on squid 3.5, but it will not redirect any thing

Date: Nov. 21, 2018

Author: Aiden

Comment:

I just cant understand this code , how can i print the value(url,ch_id,...) for testing ?

Date: Nov. 21, 2018

Author: Mind Chasers

Comment:

Hi Aiden, The easiest way to test and debug this code is to execute it on the command line and pass in requests yourself using stdin: $ python3 squid-redirect.py The difficulty can be figuring out a valid request. Therefore, I suggest you add an additional log line to the script after request is assigned: request = sys.stdin.readline() logging.debug('request: ' + request) Restart squid and perform a wget from your client (e.g., wget sex.xxx ). You should see in the squid-redirect.log something like 0 http://sex.xxx/ 192.168.3.50 GET - On our embedded system, the squid-redirect.log file is found in /var/cache/squid When debugging the script using this direct call method, you can also add the following to your Python script so it enters debug mode: import pdb; pdb.set_trace() And when debugging it this way, the log file should be in your local directory where the script is being called (e.g., /build/squid_redirect). Please let us know if this helps.

Date: Nov. 22, 2018

Author: Aiden

Comment:

the problem is why i dont see anything in squid-redirect.log , do i have to chown ??

Date: Nov. 22, 2018

Author: Aiden

Comment:

it said ValueError: not enough values to unpack (expected 5, got 0)

Date: Nov. 22, 2018

Author: Aiden

Comment:

Here is my problem now , the script run fine when i go to some web have "SEX" is blocked , but nothing in squid-redirect.log or i can see the output of squid-redirect.py

Date: Nov. 22, 2018

Author: Mind Chasers

Comment:

You may have multiple squid-redirect.py files on your drive: one from running squid and one from running it direct in the shell. When you're running squid, the squid-redirect.log file is most likely under /var. Please perform the following from /var: # find . -name squid-redirect.log Do you see this log file under /var? If so, make sure your permissions are set properly to allow squid to write it. Regarding output from squid-redirect.py, this is passed back to squid, so you won't see it ( unless you are calling it directly yourself during testing ). We're going to update this article to make it clearer and explain ways to test it. Thank you for your feedback.

Date: Nov. 24, 2018

Author: Mind Chasers

Comment:

Hi Aiden, Were you able to find data in your squid-redirect.log file? I was just testing this using Ubuntu 18.04, and it seems to work fine. I used '$ sudo apt install squid' and added an 'acl localnet src <subnet>' to the default squid.conf file in addition to the three rewrite lines shown above. After restarting squid, I was able to perform a "wget sex.xxx" and see the following in /var/spool/squid/squid-redirect.log: DEBUG:root:2018-11-24 00:56:40: 0 http://sex.xxx/ 127.0.0.1 GET - What OS are you using? We're going to update this page to show usage with both Yocto and Ubuntu 18.04.

Date: Nov. 24, 2018

Author: Aiden

Comment:

my squid-redirect.log same folder with squid-redirect.py , and i want to see output of python script how can i do it ?

Date: Nov. 25, 2018

Author: Aiden

Comment:

cat squid-redirect.log and i got this DEBUG:root:request:

Date: Nov. 26, 2018

Author: Mind Chasers

Comment:

Hi Aiden, Your post indicates that you're not getting a request line in the log, but you previously wrote that the script is working, so this is confusing. Please check your email and reply with the files and information we're requesting. Thank you.

Add a new comment here or reply to one above:

For enhanced features and capabilities, please sign in or authenticate using a popular third party

your email address will be kept private

to upload an image

previous month
next month
Su
Mo
Tu
Wd
Th
Fr
Sa
loading