A question about IP fragmentation and ipfw

cyberman · May 7, 2009

While I read through the code for ipfw I found out that it check the packets before the TCP/IP stack:

ether_input -> ether_demux -> ether_ipfw_chk -> ip_fw_chk_ptr [ipfw_chk]

For IP fragmentation it require that the first fragment contain an entire L4 header:

Code:

        if (offset == 0) {
            switch (proto) {
            case IPPROTO_TCP:
                [color="Red"][B]PULLUP_TO(hlen, ulp, struct tcphdr);[/B][/color]
                dst_port = TCP(ulp)->th_dport;
                src_port = TCP(ulp)->th_sport;
                args->f_id.flags = TCP(ulp)->th_flags;
                break;

Code:

#define PULLUP_TO(len, p, T)                        \
do {                                    \
    int x = (len) + sizeof(T);                    \
    if ((m)->m_len < x) {                        \
        args->m = m = m_pullup(m, x);                \
        if (m == NULL)                        \
            goto pullup_failed;                \
    }                                \
    p = (mtod(m, char *) + (len));                    \
} while (0)

Have I misunderstood the code or it's the a feature of ipfw? Is it required that the first fragment must contain the entire L4 header (I haven't seen that in the protocol stack code)? Or it's just a feature of ipfw?

It seems pf in FreeBSD try resolve IP fragmentation by itself:

Code:

    /* We do IP header normalization and packet reassembly here */
    if (pf_normalize_ip(m0, dir, kif, &reason, &pd) != PF_PASS) {
        action = PF_DROP;
        goto done;
    }
    m = *m0;
    h = mtod(m, struct ip *);

    off = h->ip_hl << 2;
    if (off < (int)sizeof(*h)) {
        action = PF_DROP;
        REASON_SET(&reason, PFRES_SHORT);
        log = 1;
        goto done;
    }

    pd.src = (struct pf_addr *)&h->ip_src;
    pd.dst = (struct pf_addr *)&h->ip_dst;
    PF_ACPY(&pd.baddr, dir == PF_OUT ? pd.src : pd.dst, AF_INET);
    pd.ip_sum = &h->ip_sum;
    pd.proto = h->ip_p;
    pd.af = AF_INET;
    pd.tos = h->ip_tos;
    pd.tot_len = ntohs(h->ip_len);
    pd.eh = eh;

    /* handle fragments that didn't get reassembled by normalization */
    if (h->ip_off & htons(IP_MF | IP_OFFMASK)) {
        action = pf_test_fragment(&r, dir, kif, m, h,
            &pd, &a, &ruleset);
        goto done;
    }

    switch (h->ip_p) {

    case IPPROTO_TCP: {

In Linux, netfilter will process input packets after ip_defrag(), and before ip_fragment() for output packets.

rwatson@ · May 16, 2009

IPv4's minimum packet MTU is 576, and the maximum IP header length is 60, which means the TCP header should always fit in the first fragment. As with a number of some other firewalls, IPFW will drop first fragments that don't meet this requirement, on the basis that they should never occur. I'm not aware of cases where they do occur in practice, and I'd expect a tough time getting such packets through many firewalls/NATs/etc on the above grounds. Are you running into a case where this is happening in a real-world system?

cyberman · May 17, 2009

rwatson@ said:
IPv4's minimum packet MTU is 576, and the maximum IP header length is 60, which means the TCP header should always fit in the first fragment. As with a number of some other firewalls, IPFW will drop first fragments that don't meet this requirement, on the basis that they should never occur. I'm not aware of cases where they do occur in practice, and I'd expect a tough time getting such packets through many firewalls/NATs/etc on the above grounds. Are you running into a case where this is happening in a real-world system?

First to say I haven't run into any case in a real-world system, I'm thinking to porting a firewall so I read the source code of ipfw, pf and netfilter, netfilter does LOCAL_IN check after IP defragment and LOCAL_OUT before IP fragment, and from the comment of the source code pf tries to do defragment by itself:

Code:

    /* We do IP header normalization and packet reassembly here */
    if (pf_normalize_ip(m0, dir, kif, &reason, &pd) != PF_PASS) {
        action = PF_DROP;
        goto done;
    }
    m = *m0;
    h = mtod(m, struct ip *);

    off = h->ip_hl << 2;
    if (off < (int)sizeof(*h)) {
        action = PF_DROP;
        REASON_SET(&reason, PFRES_SHORT);
        log = 1;
        goto done;
    }

But it seems netfilter is bound to Linux protocol stack very tightly and pf has little comment in the source code, so I've only read ipfw through.

Is there any document states 'IPv4's minimum packet MTU is 576'? From RFC1192' 'Table 7-1: Common MTUs in the Internet' the Official minimum MTU from RFC791 is 68, and in that table the minimum MTU in practical is 296 of Point-to-Point. Is 296 is the minimum then it's OK, but the document is published in 1990 maybe there are some new ones. And again, is it require the IP fragment must use MTU as the fragment size?

In RFC791 states that:

Code:

    Every internet module must be able to forward a datagram of 68
    octets without further fragmentation.  This is because an internet
    header may be up to 60 octets, and the minimum fragment is 8 octets.

    Every internet destination must be able to receive a datagram of 576
    octets either in one piece or in fragments to be reassembled.

rwatson@ · May 17, 2009

Two thoughts:

There's an interesting and perhaps rather extenuated thread on the topic on the IETF v6ops mailing list from 2007 that's worth reading:

http://www.ops.ietf.org/lists/v6ops/v6ops.2007/msg00792.html

My choice of phrasing was poor, in retrospect. However, the practical integration of the '576' size appears in many protocols, including DNS, and is used to avoid UDP fragmentation. The thrust of the above thread, however, relates to protocols encapsulated in IPSEC, which effectively reduces the end-to-end PMTU due to additional headers. It could be that ipfw's assumption is standards-unfriendly with respect to extremely small MTU's, but not reality-unfriendly.

Second, on normalization: these policies about short fragments, normalization, etc, are intended to address fragmentation-related attacks against firewalls, IDS's, in which prematurely fragmented or "overlapping" fragments of a segment are constructed to confuse policy enforcement.

A question about IP fragmentation and ipfw

cyberman

rwatson@

cyberman

rwatson@