URL blacklisting

This article discusses a bunch of Squid + TPROXY standing on a separate machine. As it turned out, this topic is almost not covered on the Internet.

By the nature of the occupation, the task arose of filtering the blacklist of sites from ROSKOMNADZOR . During the audit, we were threatened that if we do not, then they will impose sanctions. Said done (just a few minutes) stupid and not very friendly to our customers implementation, which simply did not give access to all sites blacklisted. At the same time, filtering was just by IP. And of course it blocked access to all sites on IP.

It was necessary to redo, and even remodel competently. Firstly, it’s not beautiful / clumsy, and secondly, the task itself is very interesting.

In this article I will not talk about how to configure automatic receipt of this list. I can only say one thing, I did not bother with constantly signing the request, but did it once on a computer with installed software and a key.

Let's start with the machine with Squid3 installed.


Almost all settings were taken from official Squid + TPROXY

ip -f inet rule add fwmark 1 lookup 100
ip -f inet route add local default dev eth0 table 100


cat /etc/sysctl.conf:
....
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0

So that we can determine what traffic has already been tested, and which is not, we create a separate VLAN, in our case 5, for outgoing packets. Let me remind you that in TPROXY mode, the source address does not change.
default via 192.168.70.2 dev eth0.5 
192.168.1.35/25 dev eth0  proto kernel  scope link  src 192.168.1.36 
192.168.70.0/30 dev eth0.5  proto kernel  scope link  src 192.168.70.1

Iptables rules
iptables -t mangle -N DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT

To prevent existing connections from falling into the TPROXY rule
iptables  -t mangle -A PREROUTING -p tcp -m socket -j DIVERT

Actually the TPROXY rule itself
iptables  -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY --tproxy-mark 0x1/0x1 --on-port 3129

Basic settings.
http_port 3129 tproxy disable-pmtu-discovery=off
...
acl bad_urls url_regex "/etc/squid3/bad_hosts.list"
....
http_access deny bad_urls
http_access allow localnet
deny_info http://www.somehost.ru bad_urls
...

The last line in the config will make REDIRECT to the specified site in case the user requests one of the banned sites.

The rest of the settings in squid3 to taste.

Settings on the router


We will need to apply these settings to the router.
cat /etc/sysctl.conf:
....
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0

Naturally included NAT gray subnets.

We create a chain in which we will filter
iptables -t mangle -F DIVERT

In this chain, we will label all packets that have an outgoing or destination port of 80 (only when writing an article I realized that the port instructions are redundant here, but I left it, suddenly it’s easier for anyone to understand)
iptables -t mangle -A DIVERT ! -i eth1.5 -p tcp --dport 80 -j MARK --set-mark 1
iptables -t mangle -A DIVERT ! -i eth1.5 -p tcp --sport 80 -j MARK --set-mark 1

We will send to this chain all packets whose source addresses or destination are 80 and we did not receive these packets from 5 VLANs (can someone tell me why in this case the connection is not traced and you have to do a reverse check)
iptables -t mangle -A PREROUTING ! -i eth1.5 -p tcp -m set --match-set badip dst -m tcp --dport 80 -j DIVERT
iptables -t mangle -A PREROUTING ! -i eth1.5 -p tcp -m set --match-set badip src -m tcp --sport 80 -j DIVERT

We create a rule which, if there is a label 1 on the packet, will send it to the 101 routing table
ip rule add fwmark 1 lookup 101

The routing table itself
ip route list table 101
default via 192.168.1.35 dev eth1.3

And lastly. So here we have a list of bad sites
create badip hash:ip
ipset flush badip
ipset add badip 111.111.111.111
ipset add badip 2.2.2.2
...


Lists of blocked sites on both machines are loaded automatically, but are interpreted differently.

Packet movement pattern


image
Blue - line outgoing packet.
Green - return packet.

Also popular now: