Basic Python Squid Redirector / Rewriter

A basic Python URL rewriter is developed and tested searching for both key words and expressions. The test configuration is also reviewed

advertisement

Overview

The Squid caching proxy is an excellent, long established open source project with an active mail list. Aside from the core proxy and cache functionality, Squid is also great for managing, filtering, & analyzing HTTP and HTTPS accesses. An example of this is using a content filter to either rewrite or redirect URLs. This article briefly reviews writing one in Python and testing it.

Configure a Test Server and System

We set up both a test server and client for testing our prototype redirector. We make use of IPv4 local routing on a 192.168.3.0/24 network.

Our server is an embedded Linux development board running a Yocto-built distribution with Linux 4.1. It is positioned between our ISP firewall and our test client.

We are running Squid 3.5.2 built using the Yocto recipe: meta-openembedded/meta-networking/recipes-daemons/squid/squid_3.5.20.bb and extended with our own custom configuration (bbappend):

# squid -v
Squid Cache: Version 3.5.20
Service Name: squid
configure options:  '--build=x86_64-linux' '--host=arm-poky-linux-gnueabi' '--target=arm-poky-linux-gnueabi' '--prefix=/usr' '--exec_prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' 
'--libexecdir=/usr/libexec' '--datadir=/usr/share' '--sysconfdir=/etc' '--sharedstatedir=/com' '--localstatedir=/var' '--libdir=/usr/lib' '--includedir=/usr/include' '--oldincludedir=/usr/include' 
'--infodir=/usr/share/info' '--mandir=/usr/share/man' '--disable-silent-rules' '--disable-dependency-tracking' 
'--with-default-user=squid' '--enable-auth-basic=DB SASL LDAP NIS PAM' '--sysconfdir=/etc/squid' '--enable-ssl' '--disable-inlined' '--disable-optimizations' '--enable-arp-acl's '--disable-wccp' 
'--disable-wccp2' '--disable-htcp' '--enable-delay-pools' '--enable-linux-netfilter' '--disable-translation' '--disable-auto-locale' '--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid' 
'--disable-static' '--enable-ipv6' '--with-netfilter-conntrack=/usr/include' 'squid_cv_gnu_atomics=yes' 'build_alias=x86_64-linux' 'host_alias=arm-poky-linux-gnueabi' 'target_alias=arm-poky-linux-gnueabi' 
'CC=arm-poky-linux-gnueabi-gcc  -march=armv7ve -mfpu=neon  -mfloat-abi=hard -mcpu=cortex-a7 'CFLAGS=-O2 -pipe -g -feliminate-unused-debug-types 
'LDFLAGS=-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed' 'CPPFLAGS=' 'CXX=arm-poky-linux-gnueabi-g++  -march=armv7ve -mfpu=neon  -mfloat-abi=hard 
-mcpu=cortex-a7 'CXXFLAGS=-O2 -pipe -g -feliminate-unused-debug-types 
...

We make a few tweaks to the server for running Squid. For example, make sure we can view our Squid logs at /var/log/squid and the owner is "squid". Also set up NAT using iptables:

# In our squid.conf, we have access_log daemon:/var/log/squid/access.log minimal
root@:~# mkdir -p /var/log/squid
root@:~# chown -R squid:squid /var/log/squid

# Our Ethernet port on our server is eth1.  Configure it for NAT using iptables:
# We're just setting up NAT for our preliminary network testing.  Squid doesn't need it.  
iptables --table nat -A POSTROUTING -s 192.168.3.0/24 -o eth1 -j MASQUERADE

Next set up the test client, which we'll test our redirector with both wget and a browser. The client is a Ubuntu 16.04 PC. For our testing, we temporarily modify the routing and the proxy setting. Note that our server's host address on our LAN is 192.168.3.214.

$ sudo ip route replace default via 192.168.3.214 dev enp2s0
$ export http_proxy=192.168.3.214:3128

It might be a good idea to test that you can ping through the test server at this point. For this, unset http_proxy and run a basic tcpdump on the server:

# on the client:
$ unset http_proxy
$ ping twitter.com

# on the server:
root@:~# tcpdump -i eth1 -vv
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
07:31:21.513263 IP (tos 0x0, ttl 63, id 48056, offset 0, flags [DF], proto UDP (17), length 57)
    192.168.0.100.35209 > 192.168.0.1.domain: [udp sum ok] 21143+ A? twitter.com. (29)
07:31:21.516468 IP (tos 0x0, ttl 64, id 29283, offset 0, flags [DF], proto UDP (17), length 72)
    192.168.0.100.51602 > 192.168.0.1.domain: [bad udp cksum 0x81fb -> 0xd6f0!] 7789+ PTR? 100.0.168.192.in-addr.arpa. (44)

Next as a sanity check we'll run Squid on the server without a rewriter enabled. For help configuring Squid, you might be interested in our write up, which needs updating. Also, Kulbir Saini's "Squid Proxy Server 3.1" is still an excellent reference, even for the Squid 4 builds.

Now it's time to enable our rewriter and test it out. The relevant excerpt of our squid.conf file is shown below. If you're unfamiliar with these settings, then refer to the references provided below.

squid.conf
url_rewrite_extras "%>a %>rm %un"
url_rewrite_children 3 startup=0 idle=1 concurrency=10
url_rewrite_program /build/squid_redirect/squid-redirect.py

While you're testing, keep in mind the following useful squid commands:

# squid -k reconfigure # restart the server each time you tweak your redirector
# squid -k interrupt # bring squid to a stop
# squid -k check # is it running? 

Refer to the source / python script provided below and move it to the location specified in your squid.conf file for the url_rewrite_program directive. Also, refer to the reference for this directive for an understanding of the communication between the Squid server and the redirector helper.

This particular redirector will perform:

  • a rewrite if "sex" is in the URL (not a good idea, e.g., would block http://oasisunisex.com/)
  • a redirect if the URL suffix is ".xxx".

Please keep in mind that this is just example / demo code to show how things work.

Python: squid-redirect.py
#!/usr/bin/env python3
""" 
Copyright 2017 Mind Chasers Inc,
file: squid-redirect.py
Demo code.  No warranty of any kind.  Use at your own risk

Todo:
    process expression list into db
    mem cache outside of db for hot handling: mru, mou
    hot reject list
    hot accept list
    what can be done in C?
"""

import re
import sys
import logging
from datetime import datetime

logging.basicConfig(filename='squid-redirect.log',level=logging.DEBUG)
xxx = re.compile('\.xxx?/$')

def main():
    """
        keep looping and processing requests
        request format is based on url_rewrite_extras "%>a %>rm %un"
    """
    request  = sys.stdin.readline()
    while request:
        [ch_id,url,ipaddr,method,user]=request.split()
        logging.debug(datetime.now().strftime('%Y-%m-%d %H:%M:%S') + ': ' + request +'\n')
        response  = ch_id + ' OK'
        if 'sex' in url:
            response +=  ' rewrite-url=http://mindchasers.com/blocked'
        elif xxx.search(url):
            response +=  ' status=301 url=https://mindchasers.net/blocked'
        response += '\n'
        sys.stdout.write(response)
        sys.stdout.flush()
        request = sys.stdin.readline()
    
if __name__ == '__main__':
    main()

References

Please help us improve this article by adding your comment or question:

email addresses are neither displayed nor shared