NAME
pf.conf
—
packet filter configuration
file
DESCRIPTION
The pf(4) packet filter modifies, drops, or passes packets according to
rules or definitions specified in pf.conf
.
This is an overview of the sections in this manual page:
- Packet Filtering
- Packet filtering, including network address translation (NAT).
- Options
- Global options tune the behaviour of the packet filtering engine.
- Queueing
- Queueing provides rule-based bandwidth control.
- Tables
- Tables provide a method for dealing with large numbers of addresses.
- Anchors
- Anchors are containers for rules and tables.
- Stateful Filtering
- Stateful filtering tracks packets by state.
- Traffic Normalisation
- Including scrub, fragment handling, and blocking spoofed traffic.
- Operating System Fingerprinting
- A method for detecting a host's operating system.
- Examples
- Some example rulesets.
The current line can be extended over multiple lines using a backslash (‘\’). Comments can be put anywhere in the file using a hash mark (‘#’), and extend to the end of the current line. Care should be taken when commenting out multi-line text: the comment is effective until the end of the entire block.
Argument names not beginning with a letter, digit, or underscore must be quoted.
Additional configuration files can be included with the
include
keyword, for example:
include "/etc/pf/sub.filter.conf"
Macros can be defined that will later be expanded in context. Macro names must start with a letter, digit, or underscore, and may contain any of those characters. Macro names may not be reserved words (for example pass, in, out). Macros are not expanded inside quotes.
For example:
ext_if = "kue0" all_ifs = "{" $ext_if lo0 "}" pass out on $ext_if from any to any pass in on $ext_if proto tcp from any to any port 25
PACKET FILTERING
pf(4) has the ability to block, pass, and match packets based on attributes of their layer 3 and layer 4 headers. Filter rules determine which of these actions are taken; filter parameters specify the packets to which a rule applies.
For each packet processed by the packet filter, the filter rules are evaluated in sequential order, from first to last. For block and pass, the last matching rule decides what action is taken; if no rule matches the packet, the default action is to pass the packet without creating a state. For match, rules are evaluated every time they match; the pass/block state of a packet remains unchanged.
Most parameters are optional. If a parameter is specified, the rule only applies to packets with matching attributes. Certain parameters can be expressed as lists, in which case pfctl(8) generates all needed rule combinations.
By default pf(4) filters packets statefully: the first time a packet matches a pass rule, a state entry is created. The packet filter examines each packet to see if it matches an existing state. If it does, the packet is passed without evaluation of any rules. After the connection is closed or times out, the state entry is automatically removed.
The following actions can be used in the filter:
- block
- The packet is blocked. There are a number of ways in which a
block rule can behave when blocking a packet. The
default behaviour is to drop packets silently,
however this can be overridden or made explicit either globally, by
setting the block-policy option, or on a per-rule
basis with one of the following options:
- drop
- The packet is silently dropped.
- return
- This causes a TCP RST to be returned for TCP packets and an ICMP UNREACHABLE for other types of packets.
- return-icmp
- return-icmp6
- This causes ICMP messages to be returned for packets which match the rule. By default this is an ICMP UNREACHABLE message, however this can be overridden by specifying a message as a code or number.
- return-rst
- This applies only to TCP packets, and issues a TCP RST which closes the connection. An optional parameter, ttl, may be given with a TTL value.
Options returning ICMP packets currently have no effect if pf(4) operates on a bridge(4), as the code to support this feature has not yet been implemented.
The simplest mechanism to block everything by default and only pass packets that match explicit rules is specify a first filter rule of:
block all
- match
- The packet is matched. This mechanism is used to provide fine grained
filtering without altering the block/pass state of a packet.
match rules differ from block and pass rules in that
parameters are set every time a packet matches the rule, not only on the
last matching rule. For the following parameters, this means that the
parameter effectively becomes “sticky” until explicitly
overridden: nat-to, binat-to,
rdr-to, queue,
rtable, and scrub.
log is different still, in that the action happens every time a rule matches i.e. a single packet can get logged more than once.
- pass
- The packet is passed; state is created unless the no state option is specified.
The following parameters can be used in the filter:
- in or out
- A packet always comes in on, or goes out through, one interface. in and out apply to incoming and outgoing packets; if neither are specified, the rule will match packets in both directions.
- log
- In addition to the action specified, a log message is generated. Only the packet that establishes the state is logged, unless the no state option is specified. The logged packets are sent to a pflog(4) interface, by default pflog0. This interface is monitored by the pflogd(8) logging daemon, which dumps the logged packets to the file /var/log/pflog in pcap(3) binary format.
- log (all)
- Used to force logging of all packets for a connection. This is not necessary when no state is explicitly specified. As with log, packets are logged to pflog(4).
- log (matches)
- Used to force logging of this packet on all subsequent matching rules.
- log (user)
- Logs the UID and PID of the socket on the local host used to send or receive a packet, in addition to the normal information.
- log (to ⟨interface⟩)
- Send logs to the specified pflog(4) interface instead of pflog0.
- quick
- If a packet matches a rule which has the quick option set, this rule is considered the last matching rule, and evaluation of subsequent rules is skipped.
- on ⟨interface⟩
- This rule applies only to packets coming in on, or going out through, this
particular interface or interface group. For more information on interface
groups, see the
group
keyword in ifconfig(8). any will match any existing interface except loopback ones. - on rdomain ⟨number⟩
- This rule applies only to packets coming in on, or going out through, this particular routing domain.
- ⟨af⟩
- This rule applies only to packets of this address family. Supported values are inet and inet6.
- proto ⟨protocol⟩
- This rule applies only to packets of this protocol. Common protocols are ICMP, ICMP6, TCP, and UDP. For a list of all the protocol name to number mappings used by pfctl(8), see the file /etc/protocols.
- from ⟨source⟩ port ⟨source⟩ os ⟨source⟩ to ⟨dest⟩ port ⟨dest⟩
- This rule applies only to packets with the specified source and
destination addresses and ports.
Addresses can be specified in CIDR notation (matching netblocks), as symbolic host names, interface names or interface group names, or as any of the following keywords:
- any
- Any address.
- no-route
- Any address which is not currently routable.
- route ⟨label⟩
- Any address matching the given route(8) label.
- self
- Expands to all addresses assigned to all interfaces.
- ⟨table⟩
- Any address matching the given table.
- urpf-failed
- Any source address that fails a unicast reverse path forwarding (URPF) check, i.e. packets coming in on an interface other than that which holds the route back to the packet's source address.
Ranges of addresses are specified using the ‘-’ operator. For instance: “10.1.1.10 - 10.1.1.12” means all addresses from 10.1.1.10 to 10.1.1.12, hence addresses 10.1.1.10, 10.1.1.11, and 10.1.1.12.
Interface names, interface group names, and self can have modifiers appended:
- :0
- Do not include interface aliases.
- :broadcast
- Translates to the interface's broadcast address(es).
- :network
- Translates to the network(s) attached to the interface.
- :peer
- Translates to the point-to-point interface's peer address(es).
Host names may also have the :0 option appended to restrict the name resolution to the first of each v4 and v6 address found.
Host name resolution and interface to address translation are done at ruleset load-time. When the address of an interface (or host name) changes (under DHCP or PPP, for instance), the ruleset must be reloaded for the change to be reflected in the kernel. Surrounding the interface name (and optional modifiers) in parentheses changes this behaviour. When the interface name is surrounded by parentheses, the rule is automatically updated whenever the interface changes its address. The ruleset does not need to be reloaded. This is especially useful with nat.
Ports can be specified either by number or by name. For example, port 80 can be specified as www. For a list of all port name to number mappings used by pfctl(8), see the file /etc/services.
Ports and ranges of ports are specified using these operators:
= (equal) != (unequal) < (less than) ≤ (less than or equal) > (greater than) ≥ (greater than or equal) : (range including boundaries) >< (range excluding boundaries) <> (except range)
‘><’, ‘<>’ and ‘:’ are binary operators (they take two arguments). For instance:
- port 2000:2004
- means ‘all ports ≥ 2000 and ≤ 2004’, hence ports 2000, 2001, 2002, 2003, and 2004.
- port 2000 >< 2004
- means ‘all ports > 2000 and < 2004’, hence ports 2001, 2002, and 2003.
- port 2000 <> 2004
- means ‘all ports < 2000 or > 2004’, hence ports 1–1999 and 2005–65535.
The operating system of the source host can be specified in the case of TCP rules with the os modifier. See the OPERATING SYSTEM FINGERPRINTING section for more information.
The host, port, and OS specifications are optional, as in the following examples:
pass in all pass in from any to any pass in proto tcp from any port < 1024 to any pass in proto tcp from any to any port 25 pass in proto tcp from 10.0.0.0/8 port ≥ 1024 \ to ! 10.1.2.3 port != ssh pass in proto tcp from any os "OpenBSD" pass in proto tcp from route "DTAG"
The following additional parameters can be used in the filter:
- all
- This is equivalent to "from any to any".
- allow-opts
- By default, IPv4 packets with IP options or IPv6 packets with routing extension headers are blocked. When allow-opts is specified for a pass rule, packets that pass the filter based on that rule (last matching) do so even if they contain IP options or routing extension headers. For packets that match state, the rule that initially created the state is used. The implicit pass rule that is used when a packet does not match any rules does not allow IP options.
- divert-packet port ⟨port⟩
- Used to send matching packets to divert(4) sockets bound to port port. If the default option of fragment reassembly is enabled, scrubbing with reassemble tcp is also enabled for divert-packet rules.
- divert-reply
- Used to receive replies for sockets that are bound to addresses which are not local to the machine. See setsockopt(2) for information on how to bind these sockets.
- divert-to ⟨host⟩ port ⟨port⟩
- Used to redirect packets to a local socket bound to host and port. The packets will not be modified, so getsockname(2) on the socket will return the original destination address of the packet.
- flags ⟨a⟩ /⟨b⟩ | any
- This rule only applies to TCP packets that have the flags
⟨a⟩ set out of set
⟨b⟩. Flags not specified in
⟨b⟩ are ignored. For stateful
connections, the default is flags S/SA. To indicate
that flags should not be checked at all, specify flags
any. The flags are: (F)IN, (S)YN, (R)ST, (P)USH, (A)CK, (U)RG,
(E)CE, and C(W)R.
- flags S/S
- Flag SYN is set. The other flags are ignored.
- flags S/SA
- This is the default setting for stateful connections. Out of SYN and ACK, exactly SYN may be set. SYN, SYN+PSH, and SYN+RST match, but SYN+ACK, ACK, and ACK+RST do not. This is more restrictive than the previous example.
- flags /SFRA
- If the first set is not specified, it defaults to none. All of SYN, FIN, RST, and ACK must be unset.
Because flags S/SA is applied by default (unless no state is specified), only the initial SYN packet of a TCP handshake will create a state for a TCP connection. It is possible to be less restrictive, and allow state creation from intermediate (non-SYN) packets, by specifying flags any. This will cause pf(4) to synchronize to existing connections, for instance if one flushes the state table. However, states created from such intermediate packets may be missing connection details such as the TCP window scaling factor. States which modify the packet flow, such as those affected by af-to, modulate, nat-to, rdr-to, or synproxy state options, or scrubbed with reassemble tcp, will also not be recoverable from intermediate packets. Such connections will stall and time out.
- group ⟨group⟩
- Similar to user, this rule only applies to packets of sockets owned by the specified group.
- icmp-type ⟨type⟩ code ⟨code⟩
- icmp6-type ⟨type⟩ code ⟨code⟩
- This rule only applies to ICMP or ICMP6 packets with the specified type and code. Text names for ICMP types and codes are listed in icmp(4) and icmp6(4). The protocol and the ICMP type indicator (icmp-type or icmp6-type) must match.
- label ⟨string⟩
- Adds a label to the rule, which can be used to identify the rule. For
instance, “pfctl -s labels” shows per-rule statistics for
rules that have labels.
The following macros can be used in labels:
- $dstaddr
- The destination IP address.
- $dstport
- The destination port specification.
- $if
- The interface.
- $nr
- The rule number.
- $proto
- The protocol name.
- $srcaddr
- The source IP address.
- $srcport
- The source port specification.
For example:
ips = "{ 1.2.3.4, 1.2.3.5 }" pass in proto tcp from any to $ips \ port > 1023 label "$dstaddr:$dstport"
Expands to:
pass in inet proto tcp from any to 1.2.3.4 \ port > 1023 label "1.2.3.4:>1023" pass in inet proto tcp from any to 1.2.3.5 \ port > 1023 label "1.2.3.5:>1023"
The macro expansion for the label directive occurs only at configuration file parse time, not during runtime.
- once
- Creates a one shot rule that will remove itself from an active ruleset after the first match. In case this is the only rule in the anchor, the anchor will be destroyed automatically after the rule is matched.
- probability ⟨number⟩
- A probability attribute can be attached to a rule, with a value set
between 0 and 100%, in which case the rule is honoured using the given
probability value. For example, the following rule will drop 20% of
incoming ICMP packets:
block in proto icmp probability 20%
- received-on ⟨interface⟩
- Only match packets which were received on the specified interface (or interface group). any will match any existing interface except loopback ones.
- rtable ⟨number⟩
- Used to select an alternate routing table for the routing lookup. Only effective before the route lookup happened, i.e. when filtering inbound.
- set prio ⟨priority⟩ | (⟨priority⟩, ⟨priority⟩)
- Packets matching this rule will be assigned a specific queueing priority.
Priorities are assigned as integers 0 through 7, with a default priority
of 3. If the packet is transmitted on a
vlan(4) interface, the queueing priority will also be written as
the priority code point in the 802.1Q VLAN header. If two priorities are
given, packets which have a TOS of lowdelay and TCP
ACKs with no data payload will be assigned to the second one. Packets with
a higher priority number are processed first, and packets with the same
priority are processed in the order in which they are received.
For example:
pass in proto tcp to port 25 set prio 2 pass in proto tcp to port 22 set prio (2, 5)
The interface priority queues accessed by the set prio keyword are always enabled and do not require any additional configuration, unlike the queues described below and in the QUEUEING section.
- set queue ⟨queue⟩ | (⟨queue⟩, ⟨queue⟩)
- Packets matching this rule will be assigned to the specified queue. If two
queues are given, packets which have a TOS of
lowdelay and TCP ACKs with no data payload will be
assigned to the second one. See
QUEUEING for setup details.
For example:
pass in proto tcp to port 25 set queue mail pass in proto tcp to port 22 set queue(ssh_bulk, ssh_prio)
- set tos ⟨string⟩ | ⟨number⟩
- Enforces a TOS for matching packets. string may be one of critical, inetcontrol, lowdelay, netcontrol, throughput, reliability, or one of the DiffServ Code Points: ef, af11 ... af43, cs0 ... cs7; number may be either a hex or decimal number.
- tag ⟨string⟩
- Packets matching this rule will be tagged with the specified string. The tag acts as an internal marker that can be used to identify these packets later on. This can be used, for example, to provide trust between interfaces and to determine if packets have been processed by translation rules. Tags are "sticky", meaning that the packet will be tagged even if the rule is not the last matching rule. Further matching rules can replace the tag with a new one but will not remove a previously applied tag. A packet is only ever assigned one tag at a time. Tags take the same macros as labels (see above).
- tagged ⟨string⟩
- Used with filter or translation rules to specify that packets must already
be tagged with the given tag in order to match the rule. Inverse tag
matching can also be done by specifying the
!
operator before the tagged keyword. - tos ⟨string⟩ | ⟨number⟩
- This rule applies to packets with the specified TOS bits set.
string may be one of critical,
inetcontrol, lowdelay,
netcontrol, throughput,
reliability, or one of the DiffServ Code Points:
ef, af11 ... af43,
cs0 ... cs7; number may be
either a hex or decimal number.
For example, the following rules are identical:
pass all tos lowdelay pass all tos 0x10 pass all tos 16
- user ⟨user⟩
- This rule only applies to packets of sockets owned by the specified user.
For outgoing connections initiated from the firewall, this is the user
that opened the connection. For incoming connections to the firewall
itself, this is the user that listens on the destination port.
When listening sockets are bound to the wildcard address, pf(4) cannot determine if a connection is destined for the firewall itself. To avoid false matches on just the destination port, combine a user rule with source or destination address self.
All packets, both outgoing and incoming, of one connection are associated with the same user and group. Only TCP and UDP packets can be associated with users.
User and group refer to the effective (as opposed to the real) IDs, in case the socket is created by a setuid/setgid process. User and group IDs are stored when a socket is created; when a process creates a listening socket as root (for instance, by binding to a privileged port) and subsequently changes to another user ID (to drop privileges), the credentials will remain root.
User and group IDs can be specified as either numbers or names. The syntax is similar to the one for ports. The following example allows only selected users to open outgoing connections:
block out proto tcp all pass out proto tcp from self user { < 1000, dhartmei }
Translation
Translation options modify either the source or destination address and port of the packets associated with a stateful connection. pf(4) modifies the specified address and/or port in the packet and recalculates IP, TCP, and UDP checksums as necessary.
Subsequent rules will see packets as they look after any addresses and ports have been translated. These rules will therefore have to filter based on the translated address and port number.
The state entry created permits pf(4) to keep track of the original address for traffic associated with that state and correctly direct return traffic for that connection.
Different types of translation are possible with pf:
- af-to
- Translation between different address families (NAT64) is handled using
af-to rules. Because address family translation
overrides the routing table, it's only possible to use
af-to on inbound rules, and a source address for the
resulting translation must always be specified.
The optional second argument is the host or subnet the original addresses are translated into for the destination. The lowest bits of the original destination address form the host part of the new destination address according to the specified subnet. It is possible to embed a complete IPv4 address into an IPv6 address using a network prefix of /96 or smaller.
When a destination address is not specified it is assumed that the host part is 32-bit long. For IPv6 to IPv4 translation this would mean using only the lower 32 bits of the original IPv6 destination address. For IPv4 to IPv6 translation the destination subnet defaults to the subnet of the new IPv6 source address with a prefix length of /96. See RFC 6052 Section 2.2 for details on how the prefix determines the destination address encoding.
For example, the following rules are identical:
pass in inet af-to inet6 from 2001:db8::1 to 2001:db8::/96 pass in inet af-to inet6 from 2001:db8::1
In the above example the matching IPv4 packets will be modified to have a source address of 2001:db8::1 and a destination address will get prefixed with 2001:db8::/96, e.g. 198.51.100.100 will be translated to 2001:db8::c633:6464.
In the reverse case the following rules are identical:
pass in inet6 af-to inet from 198.51.100.1 to 0.0.0.0/0 pass in inet6 af-to inet from 198.51.100.1
The destination IPv4 address is assumed to be embedded inside the original IPv6 destination address, e.g. 64:ff9b::c633:6464 will be translated to 198.51.100.100.
The current implementation will only extract IPv4 addresses from the IPv6 addresses with a prefix length of /96 and greater.
- binat-to
- A binat-to rule specifies a bidirectional mapping between an external IP netblock and an internal IP netblock. It expands to an outbound nat-to rule and an inbound rdr-to rule.
- nat-to
- A nat-to option specifies that IP addresses are to
be changed as the packet traverses the given interface. This technique
allows one or more IP addresses on the translating host to support network
traffic for a larger range of machines on an "inside" network.
Although in theory any IP address can be used on the inside, it is
strongly recommended that one of the address ranges defined by RFC 1918 be
used. Those netblocks are:
10.0.0.0 – 10.255.255.255 (all of net 10, i.e. 10/8) 172.16.0.0 – 172.31.255.255 (i.e. 172.16/12) 192.168.0.0 – 192.168.255.255 (i.e. 192.168/16)
nat-to is usually applied outbound. If applied inbound, nat-to to a local IP address is not supported.
- rdr-to
- The packet is redirected to another destination and possibly a different
port. rdr-to can optionally specify port ranges
instead of single ports. For instance:
- match in ... port 2000:2999 rdr-to ... port 4000
- redirects ports 2000 to 2999 (inclusive) to port 4000.
- match in ... port 2000:2999 rdr-to ... port 4000:*
- redirects port 2000 to 4000, port 2001 to 4001, ..., port 2999 to 4999.
rdr-to is usually applied inbound. If applied outbound, rdr-to to a local IP address is not supported.
In addition to modifying the address, some translation rules may modify source or destination ports for TCP or UDP connections; implicitly in the case of nat-to options and explicitly in the case of rdr-to ones. Port numbers are never translated with a binat-to rule.
Translation options apply only to packets that pass through the specified interface, and if no interface is specified, translation is applied to packets on all interfaces. For instance, redirecting port 80 on an external interface to an internal web server will only work for connections originating from the outside. Connections to the address of the external interface from local hosts will not be redirected, since such packets do not actually pass through the external interface. Redirections cannot reflect packets back through the interface they arrive on, they can only be redirected to hosts connected to different interfaces or to the firewall itself.
However packets may be redirected to hosts connected to the interface the packet arrived on by using redirection with NAT. For example:
pass in on $int_if proto tcp from $int_net to $ext_if port 80 \ rdr-to $server pass out on $int_if proto tcp to $server port 80 \ received-on $int_if nat-to $int_if
Note that redirecting external incoming connections to the loopback address will effectively allow an external host to connect to daemons bound solely to the loopback address, circumventing the traditional blocking of such connections on a real interface. For example:
pass in on egress proto tcp from any to any port smtp \ rdr-to 127.0.0.1 port spamd
Unless this effect is desired, any of the local non-loopback addresses should be used instead as the redirection target, which allows external connections only to daemons bound to this address or not bound to any address.
For af-to, nat-to and rdr-to options for which there is a single redirection address which has a subnet mask smaller than 32 for IPv4 or 128 for IPv6 (more than one IP address), a variety of different methods for assigning this address can be used:
- bitmask
- The bitmask option applies the network portion of the redirection address to the address to be modified (source with nat-to, destination with rdr-to).
- least-states [sticky-address]
- The least-states option selects the address with the
least active states from a given address pool and considers given weights
associated with address(es). Weights can be specified between 1 and 65535.
Addresses with higher weights are selected more often.
sticky-address can be specified to ensure that multiple connections from the same source are mapped to the same redirection address. Associations are destroyed as soon as there are no longer states which refer to them; in order to make the mappings last beyond the lifetime of the states, increase the global options with set timeout src.track.
- random [sticky-address]
- The random option selects an address at random within the defined block of addresses. sticky-address is as described above.
- round-robin [sticky-address]
- The round-robin option loops through the redirection address(es) and considers given weights associated with address(es). Weights can be specified between 1 and 65535. Addresses with higher weights are selected more often. sticky-address is as described above.
- source-hash [key]
- The source-hash option uses a hash of the source address to determine the redirection address, ensuring that the redirection address is always the same for a given source. An optional key can be specified after this keyword either in hex or as a string; by default pfctl(8) randomly generates a key for source-hash every time the ruleset is reloaded.
- static-port
- With nat rules, the static-port option prevents pf(4) from modifying the source port on TCP and UDP packets.
When more than one redirection address or a table is specified, round-robin and least-states are the only permitted pool types.
Routing
If a packet matches a rule with one of the following route options set, the packet filter will route the packet according to the type of route option. When such a rule creates state, the route option is also applied to all packets matching the same connection.
- dup-to
- The dup-to option creates a duplicate of the packet and routes it like route-to. The original packet gets routed as it normally would.
- reply-to
- The reply-to option is similar to route-to, but routes packets that pass in the opposite direction (replies) to the specified interface. Opposite direction is only defined in the context of a state entry, and reply-to is useful only in rules that create state. It can be used on systems with multiple external connections to route all outgoing packets of a connection through the interface the incoming connection arrived through (symmetric routing enforcement).
- route-to
- The route-to option routes the packet to the specified interface with an optional address for the next hop. When a route-to rule creates state, only packets that pass in the same direction as the filter rule specifies will be routed in this way. Packets passing in the opposite direction (replies) are not affected and are routed normally.
For the dup-to, reply-to, and route-to route options for which there is a single redirection address which has a subnet mask smaller than 32 for IPv4 or 128 for IPv6 (more than one IP address), the methods least-states, random, round-robin, and source-hash, as described above, can be used.
OPTIONS
pf(4) may be tuned for various situations using the set command.
- set block-policy
- The block-policy option sets the default behaviour
for the packet block action:
- drop
- Packet is silently dropped.
- return
- A TCP RST is returned for blocked TCP packets, an ICMP UNREACHABLE is returned for blocked UDP packets, and all other packets are silently dropped.
- set debug
- Set the debug level, which limits the severity of
log messages printed by
pf(4). This should be a keyword from the following ordered list
(highest to lowest):
emerg
,alert
,crit
,err
,warning
,notice
,info
, anddebug
. These keywords correspond to the similar (LOG_) values specified to the syslog(3) library routine. - set fingerprints
- Load fingerprints of known operating systems from the given filename. By default fingerprints of known operating systems are automatically loaded from pf.os(5), but can be overridden via this option. Setting this option may leave a small period of time where the fingerprints referenced by the currently active ruleset are inconsistent until the new ruleset finishes loading.
- set hostid
- The 32-bit hostid identifies this firewall's state table entries to other firewalls in a pfsync(4) failover cluster. By default the hostid is set to a pseudo-random value, however it may be desirable to manually configure it, for example to more easily identify the source of state table entries. The hostid may be specified in either decimal or hexadecimal.
- set limit
- Sets hard limits on the memory pools used by the packet filter. See
pool(9) for an explanation of memory pools.
For example, to set the maximum number of entries in the memory pool used by state table entries (generated by pass rules which do not specify no state) to 20000:
set limit states 20000
To set the maximum number of entries in the memory pool used for fragment reassembly to 2000:
set limit frags 2000
This maximum may not exceed, and should be well below, the maximum number of mbuf clusters (sysctl kern.maxclusters) in the system.
To set the maximum number of entries in the memory pool used for tracking source IP addresses (generated by the sticky-address and src.track options) to 2000:
set limit src-nodes 2000
To set limits on the memory pools used by tables:
set limit tables 1000 set limit table-entries 100000
The first limits the number of tables that can exist to 1000. The second limits the overall number of addresses that can be stored in tables to 100000.
Various limits can be combined on a single line:
set limit { states 20000, frags 2000, src-nodes 2000 }
- set loginterface
- Enable collection of packet and byte count statistics for the given
interface or interface group. These statistics can be viewed using:
# pfctl -s info
In this example pf(4) collects statistics on the interface named dc0:
set loginterface dc0
One can disable the loginterface using:
set loginterface none
- set optimization
- Optimize state timeouts for one of the following network environments:
- aggressive
- Aggressively expire connections. This can greatly reduce the memory usage of the firewall at the cost of dropping idle connections early.
- conservative
- Extremely conservative settings. Avoid dropping legitimate connections at the expense of greater memory utilization (possibly much greater on a busy network) and slightly increased processor utilization.
- high-latency
- A high-latency environment (such as a satellite connection).
- normal
- A normal network environment. Suitable for almost all networks.
- satellite
- Alias for high-latency.
- set reassemble
- The reassemble option is used to enable or disable the reassembly of fragmented packets, and can be set to yes (the default) or no. If no-df is also specified, fragments with the dont-fragment bit set are reassembled too, instead of being dropped; the reassembled packet will have the dont-fragment bit cleared.
- set ruleset-optimization
-
- basic
- Enable basic ruleset optimization. This is the default behaviour.
Basic ruleset optimization does four things to improve the performance
of ruleset evaluations:
- remove duplicate rules
- remove rules that are a subset of another rule
- combine multiple rules into a table when advantageous
- re-order the rules to improve evaluation performance
- none
- Disable the ruleset optimizer.
- profile
- Uses the currently loaded ruleset as a feedback profile to tailor the ordering of quick rules to actual network traffic.
It is important to note that the ruleset optimizer will modify the ruleset to improve performance. A side effect of the ruleset modification is that per-rule accounting statistics will have different meanings than before. If per-rule accounting is important for billing purposes or whatnot, either the ruleset optimizer should not be used or a label field should be added to all of the accounting rules to act as optimization barriers.
Optimization can also be set as a command-line argument to pfctl(8), overriding the settings in
pf.conf
. - set skip on ⟨ifspec⟩
- List interfaces for which packets should not be filtered. Packets passing in or out on such interfaces are passed as if pf was disabled, i.e. pf does not process them in any way. This can be useful on loopback and other virtual interfaces, when packet filtering is not desired and can have unexpected effects. ifspec is only evaluated when the ruleset is loaded; interfaces created later will not be skipped.
- set state-defaults
- The state-defaults option sets the state options for
states created from rules without an explicit keep
state. For example:
set state-defaults pflow, no-sync
- set state-policy
- The state-policy option sets the default behaviour
for states:
- if-bound
- States are bound to an interface.
- floating
- States can match packets on any interfaces (the default).
- set timeout
-
- frag
- Seconds before an unassembled fragment is expired.
- interval
- Interval between purging expired states and fragments.
- src.track
- Length of time to retain a source tracking entry after the last state expires.
When a packet matches a stateful connection, the seconds to live for the connection will be updated to that of the protocol and modifier which corresponds to the connection state. Each packet which matches this state will reset the TTL. Tuning these values may improve the performance of the firewall at the risk of dropping valid idle connections.
- tcp.closed
- The state after one endpoint sends an RST.
- tcp.closing
- The state after the first FIN has been sent.
- tcp.established
- The fully established state.
- tcp.finwait
- The state after both FINs have been exchanged and the connection is closed. Some hosts (notably web servers on Solaris) send TCP packets even after closing the connection. Increasing tcp.finwait (and possibly tcp.closing) can prevent blocking of such packets.
- tcp.first
- The state after the first packet.
- tcp.opening
- The state after the second packet but before both endpoints have acknowledged the connection.
ICMP and UDP are handled in a fashion similar to TCP, but with a much more limited set of states:
- icmp.error
- The state after an ICMP error came back in response to an ICMP packet.
- icmp.first
- The state after the first packet.
- udp.first
- The state after the first packet.
- udp.multiple
- The state if both hosts have sent packets.
- udp.single
- The state if the source host sends more than one packet but the destination host has never sent one back.
Other protocols are handled similarly to UDP:
- other.first
- other.multiple
- other.single
Timeout values can be reduced adaptively as the number of state table entries grows.
- adaptive.end
- When reaching this number of state entries, all timeout values become zero, effectively purging all state entries immediately. This value is used to define the scale factor; it should not actually be reached (set a lower state limit, see below).
- adaptive.start
- When the number of state entries exceeds this value, adaptive scaling begins. All timeout values are scaled linearly with factor (adaptive.end - number of states) / (adaptive.end - adaptive.start).
Adaptive timeouts are enabled by default, with an adaptive.start value equal to 60% of the state limit, and an adaptive.end value equal to 120% of the state limit. They can be disabled by setting both adaptive.start and adaptive.end to 0.
The adaptive timeout values can be defined both globally and for each rule. When used on a per-rule basis, the values relate to the number of states created by the rule, otherwise to the total number of states.
For example:
set timeout tcp.first 120 set timeout tcp.established 86400 set timeout { adaptive.start 6000, adaptive.end 12000 } set limit states 10000
With 9000 state table entries, the timeout values are scaled to 50% (tcp.first 60, tcp.established 43200).
QUEUEING
Packets can be assigned to queues for the purpose of bandwidth
control. At least one declaration is required to configure queues, and later
any packet filtering rule can reference the defined queues by name. During
the filtering component of pf.conf
, the last
referenced queue name is where any passed packets will
be queued, while for blocked packets it specifies where any resulting ICMP
or TCP RST packets should be queued. If the referenced queue does not exist
on the outgoing interface the default queue for that interface is used.
Queues attached to an interface build a tree, thus each queue can have
further child queues. Only leaf queues, i.e. queues without children, can be
used to assign packets to. The root queue must specifically reference an
interface, all other queues pick up the interface(s) they should be created
on from their parent queues unless explicitly specified.
In the following example, a queue named std is created on the interface em0, with 3 child queues ssh, mail and http.
queue std on em0 bandwidth 100M queue ssh parent std bandwidth 10M queue mail parent std bandwidth 10M queue http parent std bandwidth 80M default
The specified bandwidth is the target bandwidth, every queue can receive more bandwidth as long as the parent still has some available. The maximum bandwidth that should be assigned to a given queue can be limited using the max keyword. Similarily, a minimum (reserved) bandwidth can be specified.
queue ssh parent std bandwidth 10M, min 5M, max 25M
For each of these 3 bandwidth specifications an additional burst bandwidth and time can be specified.
queue ssh parent std bandwidth 10M burst 90M for 100ms
All bandwidth values must be specified as an absolute value. The suffixes K, M, and G are used to represent bits, kilobits, megabits, and gigabits per second, respectively. The value must not exceed the interface bandwidth.
In addition to the bandwidth specifications queues support the following options:
- default
- Packets not matched by another queue are assigned to this queue. Exactly one default queue per interface is required.
- on ⟨interface⟩
- Specifies the interface the queue operates on. If not given, it operates on all matching interfaces.
- parent ⟨name⟩
- Defines which parent queue the queue should be attached to. Mandantory for all queues except root queues. The parent queue must exist.
- qlimit ⟨limit⟩
- The maximum number of packets held in the queue. The default is 50.
Packets can be assigned to queues based on filter rules by using the queue keyword. Normally only one queue is specified; when a second one is specified it will instead be used for packets which have a TOS of lowdelay and for TCP ACKs with no data payload.
To continue the previous example, the examples below would specify the four referenced queues, plus a few child queues. Interactive ssh(1) sessions get a queue with a minimum bandwidth; scp(1) and sftp(1) bulk transfers go to a separate queue. The queues are then referenced by filtering rules (see PACKET FILTERING, above).
queue rootq on em0 bandwidth 100M max 100M queue http parent rootq bandwidth 60M burst 90M for 100ms queue developers parent http bandwidth 45M queue employees parent http bandwidth 15M queue mail parent rootq bandwidth 10M queue ssh parent rootq bandwidth 20M queue ssh_interactive parent ssh bandwidth 10M min 5M queue ssh_bulk parent ssh bandwidth 10M queue std parent rootq bandwidth 20M default block return out on em0 inet all set queue std pass out on em0 inet proto tcp from $developerhosts to any port 80 \ set queue developers pass out on em0 inet proto tcp from $employeehosts to any port 80 \ set queue employees pass out on em0 inet proto tcp from any to any port 22 \ set queue(ssh_bulk, ssh_interactive) pass out on em0 inet proto tcp from any to any port 25 \ set queue mail
TABLES
Tables are named structures which can hold a collection of addresses and networks. Lookups against tables in pf(4) are relatively fast, making a single rule with tables much more efficient, in terms of processor usage and memory consumption, than a large number of rules which differ only in IP address (either created explicitly or automatically by rule expansion).
Tables can be used as the source or destination of filter or translation rules. They can also be used for the redirect address of nat-to and rdr-to and in the routing options of filter rules, but only for least-states and round-robin pools.
Tables can be defined with any of the following pfctl(8) mechanisms. As with macros, reserved words may not be used as table names.
- manually
- Persistent tables can be manually created with the add or replace option of pfctl(8), before or after the ruleset has been loaded.
- pf.conf
- Table definitions can be placed directly in this file and loaded at the
same time as other rules are loaded, atomically. Table definitions inside
pf.conf
use the table statement, and are especially useful to define non-persistent tables. The contents of a pre-existing table defined without a list of addresses to initialize it is not altered whenpf.conf
is loaded. A table initialized with the empty list,{ }
, will be cleared on load.
Tables may be defined with the following attributes:
- const
- The const flag prevents the user from altering the contents of the table once it has been created. Without that flag, pfctl(8) can be used to add or remove addresses from the table at any time, even when running with securelevel(7) = 2.
- counters
- The counters flag enables per-address packet and byte counters, which can be displayed with pfctl(8).
- persist
- The persist flag forces the kernel to keep the table even when no rules refer to it. If the flag is not set, the kernel will automatically remove the table when the last rule referring to it is flushed.
This example creates a table called private, to hold RFC 1918 private network blocks, and a table called badhosts, which is initially empty. A filter rule is set up to block all traffic coming from addresses listed in either table:
table <private> const { 10/8, 172.16/12, 192.168/16 } table <badhosts> persist block on fxp0 from { <private>, <badhosts> } to any
The private table cannot have its contents changed and the badhosts table will exist even when no active filter rules reference it. Addresses may later be added to the badhosts table, so that traffic from these hosts can be blocked by using the following:
# pfctl -t badhosts -Tadd
204.92.77.111
A table can also be initialized with an address list specified in one or more external files, using the following syntax:
table <spam> persist file "/etc/spammers" file "/etc/openrelays" block on fxp0 from <spam> to any
The files /etc/spammers and /etc/openrelays list IP addresses, one per line. Any lines beginning with a ‘#’ are treated as comments and ignored. In addition to being specified by IP address, hosts may also be specified by their hostname. When the resolver is called to add a hostname to a table, all resulting IPv4 and IPv6 addresses are placed into the table. IP addresses can also be entered in a table by specifying a valid interface name, a valid interface group, or the self keyword, in which case all addresses assigned to the interface(s) will be added to the table.
ANCHORS
Besides the main ruleset, pf.conf
can
specify anchor attachment points. An anchor is a
container that can hold rules, address tables, and other anchors. When
evaluation of the main ruleset reaches an anchor rule,
pf(4)
will proceed to evaluate all rules specified in that anchor.
The following example blocks all packets on the external interface by default, then evaluates all rules in the anchor named "spam", and finally passes all outgoing connections and incoming connections to port 25:
ext_if = "kue0" block on $ext_if all anchor spam pass out on $ext_if all pass in on $ext_if proto tcp from any to $ext_if port smtp
Anchors can be manipulated through pfctl(8) without reloading the main ruleset or other anchors. This loads a single rule into the anchor, which blocks all packets from a specific address:
# echo "block in quick from 1.2.3.4 to any" | pfctl -a spam -f -
The anchor can also be populated by adding a load
anchor rule after the anchor rule. When
pfctl(8) loads pf.conf
, it will also load all
the rules from the file /etc/pf-spam.conf into the
anchor.
anchor spam load anchor spam from "/etc/pf-spam.conf"
Filter rule anchors can also be loaded inline in the ruleset within a brace-delimited block. Brace delimited blocks may contain rules or other brace-delimited blocks. When anchors are loaded this way the anchor name becomes optional. Since the parser specification for anchor names is a string, double quote characters (‘"’) should be placed around the anchor name.
anchor "external" on egress { block anchor out { pass proto tcp from any to port { 25, 80, 443 } } pass in proto tcp to any port 22 }
Anchor rules can also specify packet filtering parameters using the same syntax as filter rules. When parameters are used, the anchor rule is only evaluated for matching packets. This allows conditional evaluation of anchors, like:
block on $ext_if all anchor spam proto tcp from any to any port smtp pass out on $ext_if all pass in on $ext_if proto tcp from any to $ext_if port smtp
The rules inside anchor "spam" are only evaluated for TCP packets with destination port 25. Hence, the following will only block connections from 1.2.3.4 to port 25:
# echo "block in quick from 1.2.3.4 to any" | pfctl -a spam -f -
Matching filter and translation rules marked with the quick option are final and abort the evaluation of the rules in other anchors and the main ruleset. If the anchor itself is marked with the quick option, ruleset evaluation will terminate when the anchor is exited if the packet is matched by any rule within the anchor.
An anchor references other anchor attachment points using the following syntax:
- anchor ⟨name⟩
- Evaluates the filter rules in the specified anchor.
An anchor has a name which specifies the path where pfctl(8) can be used to access the anchor to perform operations on it, such as attaching child anchors to it or loading rules into it. Anchors may be nested, with components separated by ‘/’ characters, similar to how file system hierarchies are laid out. The main ruleset is actually the default anchor, so filter and translation rules, for example, may also be contained in any anchor.
Anchor rules are evaluated relative to the anchor in which they are contained. For example, all anchor rules specified in the main ruleset will reference anchor attachment points underneath the main ruleset, and anchor rules specified in a file loaded from a load anchor rule will be attached under that anchor point.
Anchors may end with the asterisk (‘*’) character, which signifies that all anchors attached at that point should be evaluated in the alphabetical ordering of their anchor name. For example, the following will evaluate each rule in each anchor attached to the "spam" anchor:
anchor "spam/*"
Note that it will only evaluate anchors that are directly attached to the "spam" anchor, and will not descend to evaluate anchors recursively.
Since anchors are evaluated relative to the anchor in which they are contained, there is a mechanism for accessing the parent and ancestor anchors of a given anchor. Similar to file system path name resolution, if the sequence ‘..’ appears as an anchor path component, the parent anchor of the current anchor in the path evaluation at that point will become the new current anchor. As an example, consider the following:
# printf 'anchor "spam/allowed"\n' | pfctl -f - # printf 'anchor "../banned"\npass\n' | pfctl -a spam/allowed -f -
Evaluation of the main ruleset will lead into the spam/allowed anchor, which will evaluate the rules in the spam/banned anchor, if any, before finally evaluating the pass rule.
STATEFUL FILTERING
pf(4) filters packets statefully, which has several advantages. For TCP connections, comparing a packet to a state involves checking its sequence numbers, as well as TCP timestamps if a rule using the reassemble tcp parameter applies to the connection. If these values are outside the narrow windows of expected values, the packet is dropped. This prevents spoofing attacks, such as when an attacker sends packets with a fake source address/port but does not know the connection's sequence numbers. Similarly, pf(4) knows how to match ICMP replies to states. For example, to allow echo requests (such as those created by ping(8)) out statefully and match incoming echo replies correctly to states:
pass out inet proto icmp all
icmp-type echoreq
Also, looking up states is usually faster than evaluating rules. If there are 50 rules, all of them are evaluated sequentially in O(n). Even with 50000 states, only 16 comparisons are needed to match a state, since states are stored in a binary search tree that allows searches in O(log2 n).
Furthermore, correct handling of ICMP error messages is critical to many protocols, particularly TCP. pf(4) matches ICMP error messages to the correct connection, checks them against connection parameters, and passes them if appropriate. For example if an ICMP source quench message referring to a stateful TCP connection arrives, it will be matched to the state and get passed.
Finally, state tracking is required for nat-to and rdr-to options, in order to track address and port translations and reverse the translation on returning packets.
pf(4) will also create state for other protocols which are effectively stateless by nature. UDP packets are matched to states using only host addresses and ports, and other protocols are matched to states using only the host addresses.
If stateless filtering of individual packets is desired, the no state keyword can be used to specify that state will not be created if this is the last matching rule. Note that packets which match neither block nor pass rules, and thus are passed by default, are effectively passed as if no state had been specified.
A number of parameters can also be set to affect how pf(4) handles state tracking, as detailed below.
State Modulation
Much of the security derived from TCP is attributable to how well the initial sequence numbers (ISNs) are chosen. Some popular stack implementations choose very poor ISNs and thus are normally susceptible to ISN prediction exploits. By applying a modulate state rule to a TCP connection, pf(4) will create a high quality random sequence number for each connection endpoint.
The modulate state directive implicitly keeps state on the rule and is only applicable to TCP connections.
For instance:
block all pass out proto tcp from any to any modulate state pass in proto tcp from any to any port 25 flags S/SFRA \ modulate state
Note that modulated connections will not recover when the state table is lost (firewall reboot, flushing the state table, etc.). pf(4) will not be able to infer a connection again after the state table flushes the connection's modulator. When the state is lost, the connection may be left dangling until the respective endpoints time out the connection. It is possible on a fast local network for the endpoints to start an ACK storm while trying to resynchronize after the loss of the modulator. The default flags settings (or a more strict equivalent) should be used on modulate state rules to prevent ACK storms.
Note that alternative methods are available to prevent loss of the state table and allow for firewall failover. See carp(4) and pfsync(4) for further information.
SYN Proxy
By default, pf(4) passes packets that are part of a TCP handshake between the endpoints. The synproxy state option can be used to cause pf(4) itself to complete the handshake with the active endpoint, perform a handshake with the passive endpoint, and then forward packets between the endpoints.
No packets are sent to the passive endpoint before the active endpoint has completed the handshake, hence so-called SYN floods with spoofed source addresses will not reach the passive endpoint, as the sender can't complete the handshake.
The proxy is transparent to both endpoints; they each see a single connection from/to the other endpoint. pf(4) chooses random initial sequence numbers for both handshakes. Once the handshakes are completed, the sequence number modulators (see previous section) are used to translate further packets of the connection. synproxy state includes modulate state.
Rules with synproxy will not work if pf(4) operates on a bridge(4).
Example:
pass in proto tcp from any to any port www synproxy state
Stateful Tracking Options
A number of options related to stateful tracking can be applied on a per-rule basis. One of keep state, modulate state, or synproxy state must be specified explicitly to apply these options to a rule.
- floating
- States can match packets on any interfaces (the opposite of if-bound). This is the default.
- if-bound
- States are bound to an interface (the opposite of floating).
- max ⟨number⟩
- Limits the number of concurrent states the rule may create. When this limit is reached, further packets that would create state are dropped until existing states time out.
- no-sync
- Prevent state changes for states created by this rule from appearing on the pfsync(4) interface.
- pflow
- States created by this rule are exported on the pflow(4) interface.
- sloppy
- Uses a sloppy TCP connection tracker that does not check sequence numbers at all, which makes insertion and ICMP teardown attacks way easier. This is intended to be used in situations where one does not see all packets of a connection, e.g. in asymmetric routing situations. It cannot be used with modulate or synproxy state.
- ⟨timeout⟩ ⟨seconds⟩
- Changes the timeout values used for states created by this rule. For a list of all valid timeout names, see OPTIONS above.
Multiple options can be specified, separated by commas:
pass in proto tcp from any to any \ port www keep state \ (max 100, source-track rule, max-src-nodes 75, \ max-src-states 3, tcp.established 60, tcp.closing 5)
When the source-track keyword is specified, the number of states per source IP is tracked.
- source-track global
- The number of states created by all rules that use this option is limited. Each rule can specify different max-src-nodes and max-src-states options, however state entries created by any participating rule count towards each individual rule's limits.
- source-track rule
- The maximum number of states created by this rule is limited by the rule's max-src-nodes and max-src-states options. Only state entries created by this particular rule count toward the rule's limits.
The following limits can be set:
- max-src-nodes ⟨number⟩
- Limits the maximum number of source addresses which can simultaneously have state table entries.
- max-src-states ⟨number⟩
- Limits the maximum number of simultaneous state entries that a single source address can create with this rule.
For stateful TCP connections, limits on established connections (connections which have completed the TCP 3-way handshake) can also be enforced per source IP.
- max-src-conn ⟨number⟩
- Limits the maximum number of simultaneous TCP connections which have completed the 3-way handshake that a single host can make.
- max-src-conn-rate ⟨number⟩ / ⟨seconds⟩
- Limit the rate of new connections over a time interval. The connection rate is an approximation calculated as a moving average.
When one of these limits is reached, further packets that would create state are dropped until existing states time out.
Because the 3-way handshake ensures that the source address is not being spoofed, more aggressive action can be taken based on these limits. With the overload ⟨table⟩ state option, source IP addresses which hit either of the limits on established connections will be added to the named table. This table can be used in the ruleset to block further activity from the offending host, redirect it to a tarpit process, or restrict its bandwidth.
The optional flush keyword kills all states created by the matching rule which originate from the host which exceeds these limits. The global modifier to the flush command kills all states originating from the offending host, regardless of which rule created the state.
For example, the following rules will protect the webserver against hosts making more than 100 connections in 10 seconds. Any host which connects faster than this rate will have its address added to the ⟨bad_hosts⟩ table and have all states originating from it flushed. Any new packets arriving from this host will be dropped unconditionally by the block rule.
block quick from <bad_hosts> pass in on $ext_if proto tcp to $webserver port www keep state \ (max-src-conn-rate 100/10, overload <bad_hosts> flush global)
TRAFFIC NORMALISATION
Traffic normalisation is a broad umbrella term for aspects of the packet filter which deal with verifying packets, packet fragments, spoof traffic, and other irregularities.
Scrub
Scrub involves sanitising packet content in such a way that there are no ambiguities in packet interpretation on the receiving side. It is invoked with the scrub option, added to regular rules.
Parameters are specified enclosed in parentheses. At least one of the following parameters must be specified:
- max-mss ⟨number⟩
- Enforces a maximum segment size (MSS) for matching TCP packets.
- min-ttl ⟨number⟩
- Enforces a minimum TTL for matching IP packets.
- no-df
- Clears the dont-fragment bit from a matching IPv4
packet. Some operating systems have NFS implementations which are known to
generate fragmented packets with the dont-fragment
bit set. pf(4) will drop such fragmented dont-fragment
packets unless no-df is specified.
Unfortunately some operating systems also generate their dont-fragment packets with a zero IP identification field. Clearing the dont-fragment bit on packets with a zero IP ID may cause deleterious results if an upstream router later fragments the packet. Using random-id is recommended in combination with no-df to ensure unique IP identifiers.
- random-id
- Replaces the IPv4 identification field with random values to compensate for predictable values generated by many hosts. This option only applies to packets that are not fragmented after the optional fragment reassembly.
- reassemble tcp
- Statefully normalises TCP connections. reassemble
tcp performs the following normalisations:
- TTL
- Neither side of the connection is allowed to reduce their IP TTL. An attacker may send a packet such that it reaches the firewall, affects the firewall state, and expires before reaching the destination host. reassemble tcp will raise the TTL of all packets back up to the highest value seen on the connection.
- Timestamp Modulation
- Modern TCP stacks will send a timestamp on every TCP packet and echo the other endpoint's timestamp back to them. Many operating systems will merely start the timestamp at zero when first booted, and increment it several times a second. The uptime of the host can be deduced by reading the timestamp and multiplying by a constant. Also observing several different timestamps can be used to count hosts behind a NAT device. And spoofing TCP packets into a connection requires knowing or guessing valid timestamps. Timestamps merely need to be monotonically increasing and not derived off a guessable base time. reassemble tcp will cause scrub to modulate the TCP timestamps with a random number.
- Extended PAWS Checks
- There is a problem with TCP on long fat pipes, in that a packet might get delayed for longer than it takes the connection to wrap its 32-bit sequence space. In such an occurrence, the old packet would be indistinguishable from a new packet and would be accepted as such. The solution to this is called PAWS: Protection Against Wrapped Sequence numbers. It protects against it by making sure the timestamp on each packet does not go backwards. reassemble tcp also makes sure the timestamp on the packet does not go forward more than the RFC allows. By doing this, pf(4) artificially extends the security of TCP sequence numbers by 10 to 18 bits when the host uses appropriately randomized timestamps, since a blind attacker would have to guess the timestamp as well.
For example:
match in all scrub (no-df random-id
max-mss 1440)
Fragment Handling
The size of IP datagrams (packets) can be significantly larger than the maximum transmission unit (MTU) of the network. In cases when it is necessary or more efficient to send such large packets, the large packet will be fragmented into many smaller packets that will each fit onto the wire. Unfortunately for a firewalling device, only the first logical fragment will contain the necessary header information for the subprotocol that allows pf(4) to filter on things such as TCP ports or to perform NAT.
One alternative is to filter individual fragments with filter rules. If packet reassembly is turned off, it is passed to the filter. Filter rules with matching IP header parameters decide whether the fragment is passed or blocked, in the same way as complete packets are filtered. Without reassembly, fragments can only be filtered based on IP header fields (source/destination address, protocol), since subprotocol header fields are not available (TCP/UDP port numbers, ICMP code/type). The fragment option can be used to restrict filter rules to apply only to fragments, but not complete packets. Filter rules without the fragment option still apply to fragments, if they only specify IP header fields. For instance:
pass in proto tcp from any to any port 80
The rule above never applies to a fragment, even if the fragment is part of a TCP packet with destination port 80, because without reassembly this information is not available for each fragment. This also means that fragments cannot create new or match existing state table entries, which makes stateful filtering and address translation (NAT, redirection) for fragments impossible.
In most cases, the benefits of reassembly outweigh the additional memory cost, so reassembly is on by default.
The memory allocated for fragment caching can be limited using pfctl(8). Once this limit is reached, fragments that would have to be cached are dropped until other entries time out. The timeout value can also be adjusted.
When forwarding reassembled IPv6 packets, pf refragments them with the original maximum fragment size. This allows the sender to determine the optimal fragment size by path MTU discovery.
Blocking Spoofed Traffic
Spoofing is the faking of IP addresses, typically for malicious purposes. The antispoof directive expands to a set of filter rules which will block all traffic with a source IP from the network(s) directly connected to the specified interface(s) from entering the system through any other interface.
For example:
antispoof for lo0
Expands to:
block drop in on ! lo0 inet from 127.0.0.1/8 to any block drop in on ! lo0 inet6 from ::1 to any
For non-loopback interfaces, there are additional rules to block incoming packets with a source IP address identical to the interface's IP(s). For example, assuming the interface wi0 had an IP address of 10.0.0.1 and a netmask of 255.255.255.0:
antispoof for wi0 inet
Expands to:
block drop in on ! wi0 inet from 10.0.0.0/24 to any block drop in inet from 10.0.0.1 to any
Caveat: Rules created by the antispoof directive interfere with packets sent over loopback interfaces to local addresses. One should pass these explicitly.
OPERATING SYSTEM FINGERPRINTING
Passive OS fingerprinting is a mechanism to inspect nuances of a TCP connection's initial SYN packet and guess at the host's operating system. Unfortunately these nuances are easily spoofed by an attacker so the fingerprint is not useful in making security decisions. But the fingerprint is typically accurate enough to make policy decisions upon.
The fingerprints may be specified by operating system class, by version, or by subtype/patchlevel. The class of an operating system is typically the vendor or genre and would be OpenBSD for the pf(4) firewall itself. The version of the oldest available OpenBSD release on the main FTP site would be 2.6 and the fingerprint would be written as:
"OpenBSD 2.6"
The subtype of an operating system is typically used to describe the patchlevel if that patch led to changes in the TCP stack behavior. In the case of OpenBSD, the only subtype is for a fingerprint that was normalised by the no-df scrub option and would be specified as:
"OpenBSD 3.3
no-df"
Fingerprints for most popular operating systems are provided by pf.os(5). Once pf(4) is running, a complete list of known operating system fingerprints may be listed by running:
# pfctl -so
Filter rules can enforce policy at any level of operating system specification assuming a fingerprint is present. Policy could limit traffic to approved operating systems or even ban traffic from hosts that aren't at the latest service pack.
The unknown class can also be used as the fingerprint which will match packets for which no operating system fingerprint is known.
Examples:
pass out proto tcp from any os OpenBSD block out proto tcp from any os Doors block out proto tcp from any os "Doors PT" block out proto tcp from any os "Doors PT SP3" block out from any os "unknown" pass on lo0 proto tcp from any os "OpenBSD 3.3 lo0"
Operating system fingerprinting is limited only to the TCP SYN packet. This means that it will not work on other protocols and will not match a currently established connection.
Caveat: operating system fingerprints are occasionally wrong. There are three problems: an attacker can trivially craft his packets to appear as any operating system he chooses; an operating system patch could change the stack behavior and no fingerprints will match it until the database is updated; and multiple operating systems may have the same fingerprint.
EXAMPLES
In this example, the external interface is kue0. We use a macro for the interface name, so it can be changed easily. All incoming traffic is "normalised", and everything is blocked and logged by default.
ext_if = "kue0" match in all scrub (no-df max-mss 1440) block return log on $ext_if all
Here we specifically block packets we don't want: anything coming from source we have no back routes for; packets whose ingress interface does not match the one in the route back to their source address; anything that does not have our address (157.161.48.183) as source; broadcasts (cable modem noise); and anything from reserved address space or invalid addresses.
block in from no-route to any block in from urpf-failed to any block out log quick on $ext_if from ! 157.161.48.183 to any block in quick on $ext_if from any to 255.255.255.255 block in log quick on $ext_if from { 10.0.0.0/8, 172.16.0.0/12, \ 192.168.0.0/16, 255.255.255.255/32 } to any
For ICMP, pass out/in ping queries. State matching is done on host addresses and ICMP ID (not type/code), so replies (like 0/0 for 8/0) will match queries. ICMP error messages (which always refer to a TCP/UDP packet) are handled by the TCP/UDP states.
pass on $ext_if inet proto icmp all icmp-type 8 code 0
For UDP, pass out all UDP connections. DNS connections are passed in.
pass out on $ext_if proto udp all pass in on $ext_if proto udp from any to any port domain
For TCP, pass out all TCP connections and modulate state. SSH, SMTP, DNS, and IDENT connections are passed in. We do not allow Windows 9x SMTP connections since they are typically a viral worm.
pass out on $ext_if proto tcp all modulate state pass in on $ext_if proto tcp from any to any \ port { ssh, smtp, domain, auth } block in on $ext_if proto tcp from any \ os { "Windows 95", "Windows 98" } to any port smtp
Here we pass in/out all IPv6 traffic: note that we have to enable this in two different ways, on both our physical interface and our tunnel.
pass quick on gif0 inet6 pass quick on $ext_if proto ipv6
This example illustrates packet tagging. There are three interfaces: $int_if, $ext_if, and $wifi_if (wireless). NAT is being done on $ext_if for all outgoing packets. Packets in on $int_if are tagged and passed out on $ext_if. All other outgoing packets (i.e. packets from the wireless network) are only permitted to access port 80.
pass in on $int_if from any to any tag INTNET pass in on $wifi_if from any to any block out on $ext_if from any to any pass out quick on $ext_if tagged INTNET pass out on $ext_if proto tcp from any to any port 80
In this example, we tag incoming packets as they are redirected to spamd(8). The tag is used to pass those packets through the packet filter.
match in on $ext_if inet proto tcp from <spammers> to port smtp \ tag SPAMD rdr-to 127.0.0.1 port spamd block in on $ext_if pass in on $ext_if inet proto tcp tagged SPAMD
This example maps incoming requests on port 80 to port 8080, on which a daemon is running (because, for example, it is not run as root, and therefore lacks permission to bind to port 80).
match in on $ext_if proto tcp from any to any port 80 \ rdr-to 127.0.0.1 port 8080
If a pass rule is used with the quick modifier, packets matching the translation rule are passed without inspecting subsequent filter rules.
pass in quick on $ext_if proto tcp from any to any port 80 \ rdr-to 127.0.0.1 port 8080
In the example below, vlan12 is configured as 192.168.168.1; the machine translates all packets coming from 192.168.168.0/24 to 204.92.77.111 when they are going out any interface except vlan12. This has the net effect of making traffic from the 192.168.168.0/24 network appear as though it is the Internet routable address 204.92.77.111 to nodes behind any interface on the router except for the nodes on vlan12. Thus, 192.168.168.1 can talk to the 192.168.168.0/24 nodes.
match out on ! vlan12 from 192.168.168.0/24 to any nat-to 204.92.77.111
In the example below, the machine sits between a fake internal 144.19.74.* network, and a routable external IP of 204.92.77.100. The last rule excludes protocol AH from being translated.
pass out on $ext_if from 144.19.74.0/24 nat-to 204.92.77.100 pass out on $ext_if proto ah from 144.19.74.0/24
In the example below, packets bound for one specific server, as well as those generated by the sysadmins are not proxied; all other connections are.
pass in on $int_if proto { tcp, udp } from any to any port 80 \ rdr-to 127.0.0.1 port 80 pass in on $int_if proto { tcp, udp } from any to $server port 80 pass in on $int_if proto { tcp, udp } from $sysadmins to any port 80
This example maps outgoing packets' source port to an assigned proxy port instead of an arbitrary port. In this case, proxy outgoing isakmp with port 500 on the gateway.
match out on $ext_if inet proto udp from any port isakmp to any \ nat-to ($ext_if) port 500
One more example uses rdr-to to redirect a TCP and UDP port to an internal machine.
match in on $ext_if inet proto tcp from any to ($ext_if) port 8080 \ rdr-to 10.1.2.151 port 22 match in on $ext_if inet proto udp from any to ($ext_if) port 8080 \ rdr-to 10.1.2.151 port 53
In this example, a NAT gateway is set up to translate internal addresses using a pool of public addresses (192.0.2.16/28). A given source address is always translated to the same pool address by using the source-hash keyword. The gateway also translates incoming web server connections to a group of web servers on the internal network.
match out on $ext_if inet from any to any nat-to 192.0.2.16/28 \ source-hash match in on $ext_if proto tcp from any to any port 80 \ rdr-to { 10.1.2.155 weight 2, 10.1.2.160 weight 1, \ 10.1.2.161 weight 8 } round-robin
The bidirectional address translation example uses a single binat-to rule that expands to a nat-to and an rdr-to rule.
pass on $ext_if from 10.1.2.120 to any binat-to 192.0.2.17
The previous example is identical to the following set of rules:
pass out on $ext_if inet from 10.1.2.120 to any \ nat-to 192.0.2.17 static-port pass in on $ext_if inet from any to 192.0.2.17 rdr-to 10.1.2.120
In the example below, a router handling both address families translates an internal IPv4 subnet to IPv6 using the well-known 64:ff9b::/96 prefix:
pass in on $v4_if inet af-to inet6 from ($v6_if) to 64:ff9b::/96
Paired with the example above, the example below can be used on another router handling both address families to translate back to IPv4:
pass in on $v6_if inet6 to 64:ff9b::/96 af-to inet from ($v4_if)
GRAMMAR
Syntax for pf.conf
in BNF:
line = ( option | pf-rule | antispoof-rule | queue-rule | anchor-rule | anchor-close | load-anchor | table-rule | include ) option = "set" ( [ "timeout" ( timeout | "{" timeout-list "}" ) ] | [ "ruleset-optimization" [ "none" | "basic" | "profile" ] ] | [ "optimization" [ "default" | "normal" | "high-latency" | "satellite" | "aggressive" | "conservative" ] ] [ "limit" ( limit-item | "{" limit-list "}" ) ] | [ "loginterface" ( interface-name | "none" ) ] | [ "block-policy" ( "drop" | "return" ) ] | [ "state-policy" ( "if-bound" | "floating" ) ] [ "state-defaults" state-opts ] [ "fingerprints" filename ] | [ "skip on" ifspec ] | [ "debug" ( "none" | "urgent" | "misc" | "loud" ) ] | [ "reassemble" ( "yes" | "no" ) [ "no-df" ] ] ) pf-rule = action [ ( "in" | "out" ) ] [ "log" [ "(" logopts ")"] ] [ "quick" ] [ "on" ( ifspec | "rdomain" number ) ] [ af ] [ protospec ] hosts [ filteropts ] logopts = logopt [ [ "," ] logopts ] logopt = "all" | "matches" | "user" | "to" interface-name filteropts = filteropt [ [ "," ] filteropts ] filteropt = user | group | flags | icmp-type | icmp6-type | "tos" tos | ( "no" | "keep" | "modulate" | "synproxy" ) "state" [ "(" state-opts ")" ] | "scrub" "(" scrubopts ")" | "fragment" | "allow-opts" | "once" | "divert-packet" "port" port | "divert-reply" | "divert-to" host "port" port | "label" string | "tag" string | [ ! ] "tagged" string | "set prio" ( number | "(" number [ [ "," ] number ] ")" ) | "set queue" ( string | "(" string [ [ "," ] string ] ")" ) | "rtable" number | "probability" number"%" | "af-to" af "from" ( redirhost | "{" redirhost-list "}" ) [ "to" ( redirhost | "{" redirhost-list "}" ) ] | "binat-to" ( redirhost | "{" redirhost-list "}" ) [ portspec ] [ pooltype ] | "rdr-to" ( redirhost | "{" redirhost-list "}" ) [ portspec ] [ pooltype ] | "nat-to" ( redirhost | "{" redirhost-list "}" ) [ portspec ] [ pooltype ] [ "static-port" ] | [ route ] | [ "set tos" tos ] | [ [ "!" ] "received-on" ( interface-name | interface-group ) ] scrubopts = scrubopt [ [ "," ] scrubopts ] scrubopt = "no-df" | "min-ttl" number | "max-mss" number | "reassemble tcp" | "random-id" antispoof-rule = "antispoof" [ "log" ] [ "quick" ] "for" ifspec [ af ] [ "label" string ] table-rule = "table" "<" string ">" [ tableopts ] tableopts = tableopt [ tableopts ] tableopt = "persist" | "const" | "counters" | "file" string | "{" [ tableaddrs ] "}" tableaddrs = tableaddr-spec [ [ "," ] tableaddrs ] tableaddr-spec = [ "!" ] tableaddr [ "/" mask-bits ] tableaddr = hostname | ifspec | "self" | ipv4-dotted-quad | ipv6-coloned-hex queue-rule = "queue" string [ "on" interface-name ] queueopts-list anchor-rule = "anchor" [ string ] [ ( "in" | "out" ) ] [ "on" ifspec ] [ af ] [ protospec ] [ hosts ] [ filteropt-list ] [ "{" ] anchor-close = "}" load-anchor = "load anchor" string "from" filename queueopts-list = queueopts-list queueopts | queueopts queueopts = [ "bandwidth" bandwidth ] | [ "min" bandwidth ] | [ "max" bandwidth ] | [ "parent" string ] | [ "default" ] | [ "qlimit" number ] bandwidth = bandwidth-spec [ "burst" bandwidth-spec "for" number "ms" ] bandwidth-spec = number ( "" | "K" | "M" | "G" ) action = "pass" | "match" | "block" [ return ] return = "drop" | "return" | "return-rst" [ "(" "ttl" number ")" ] | "return-icmp" [ "(" icmpcode [ [ "," ] icmp6code ] ")" ] | "return-icmp6" [ "(" icmp6code ")" ] icmpcode = ( icmp-code-name | icmp-code-number ) icmp6code = ( icmp6-code-name | icmp6-code-number ) ifspec = ( [ "!" ] ( interface-name | interface-group ) ) | "{" interface-list "}" interface-list = [ "!" ] ( interface-name | interface-group ) [ [ "," ] interface-list ] route = ( "route-to" | "reply-to" | "dup-to" ) ( routehost | "{" routehost-list "}" ) [ pooltype ] af = "inet" | "inet6" protospec = "proto" ( proto-name | proto-number | "{" proto-list "}" ) proto-list = ( proto-name | proto-number ) [ [ "," ] proto-list ] hosts = "all" | "from" ( "any" | "no-route" | "urpf-failed" | "self" | host | "{" host-list "}" | "route" string ) [ port ] [ os ] "to" ( "any" | "no-route" | "self" | host | "{" host-list "}" | "route" string ) [ port ] ipspec = "any" | host | "{" host-list "}" host = [ "!" ] ( address [ "weight" number ] | address [ "/" mask-bits ] [ "weight" number ] | "<" string ">" ) redirhost = address [ "/" mask-bits ] routehost = host | host "@" interface-name | "(" interface-name [ address [ "/" mask-bits ] ] ")" address = ( interface-name | interface-group | "(" ( interface-name | interface-group ) ")" | hostname | ipv4-dotted-quad | ipv6-coloned-hex ) host-list = host [ [ "," ] host-list ] redirhost-list = redirhost [ [ "," ] redirhost-list ] routehost-list = routehost [ [ "," ] routehost-list ] port = "port" ( unary-op | binary-op | "{" op-list "}" ) portspec = "port" ( number | name ) [ ":" ( "*" | number | name ) ] os = "os" ( os-name | "{" os-list "}" ) user = "user" ( unary-op | binary-op | "{" op-list "}" ) group = "group" ( unary-op | binary-op | "{" op-list "}" ) unary-op = [ "=" | "!=" | "<" | "≤" | ">" | "≥" ] ( name | number ) binary-op = number ( "<>" | "><" | ":" ) number op-list = ( unary-op | binary-op ) [ [ "," ] op-list ] os-name = operating-system-name os-list = os-name [ [ "," ] os-list ] flags = "flags" ( [ flag-set ] "/" flag-set | "any" ) flag-set = [ "F" ] [ "S" ] [ "R" ] [ "P" ] [ "A" ] [ "U" ] [ "E" ] [ "W" ] icmp-type = "icmp-type" ( icmp-type-code | "{" icmp-list "}" ) icmp6-type = "icmp6-type" ( icmp-type-code | "{" icmp-list "}" ) icmp-type-code = ( icmp-type-name | icmp-type-number ) [ "code" ( icmp-code-name | icmp-code-number ) ] icmp-list = icmp-type-code [ [ "," ] icmp-list ] tos = ( "lowdelay" | "throughput" | "reliability" | [ "0x" ] number ) state-opts = state-opt [ [ "," ] state-opts ] state-opt = ( "max" number | "no-sync" | timeout | "sloppy" | "pflow" | "source-track" [ ( "rule" | "global" ) ] | "max-src-nodes" number | "max-src-states" number | "max-src-conn" number | "max-src-conn-rate" number "/" number | "overload" "<" string ">" [ "flush" [ "global" ] ] | "if-bound" | "floating" ) timeout-list = timeout [ [ "," ] timeout-list ] timeout = ( "tcp.first" | "tcp.opening" | "tcp.established" | "tcp.closing" | "tcp.finwait" | "tcp.closed" | "udp.first" | "udp.single" | "udp.multiple" | "icmp.first" | "icmp.error" | "other.first" | "other.single" | "other.multiple" | "frag" | "interval" | "src.track" | "adaptive.start" | "adaptive.end" ) number limit-list = limit-item [ [ "," ] limit-list ] limit-item = ( "states" | "frags" | "src-nodes" | "tables" | "table-entries" ) number pooltype = ( "bitmask" | "least-states" | "random" | "round-robin" | "source-hash" [ ( hex-key | string-key ) ] ) [ sticky-address ] include = "include" filename
FILES
- /etc/hosts
- Host name database.
- /etc/pf.conf
- Default location of the ruleset file.
- /etc/pf.os
- Default location of OS fingerprints.
- /etc/protocols
- Protocol name database.
- /etc/services
- Service name database.
SEE ALSO
HISTORY
The pf.conf
file format first appeared in
OpenBSD 3.0.