Fixing a PMTUD Black hole is a multistep process, and it starts with finding the correct MTU/MRU of your link.
Now as I’ve discussed, every path can have its own unique MTU/MRU value, but we are usually interested in the max value that is dictated by your ISP.
When you send a packet, it always routes through your ISP. Because of different protocols in place and their overheads (mostly layer 2 ones), it is common for your ISP to force MTU/MRU of less than 1500 bytes on your link.
If a packet exceeds these values, your ISP is required to send the appropriate ICMP messages either back to you (for the MTU), or to the server sending the data (for the MRU). These messages give the corresponding hosts a chance to adapt themselves to the link.
If your ISP decides to not send the required ICMP messages (or they get lost in transaction for some reason), all sorts of issues could arise. And for solving that, the first step is to manually determine your links MTU/MRU values.
The best way for finding your link’s MTU/MRU is by sending ICMP packets (or more precisely, Pings) to the other host.
To be able to interpret the results, we first need to have an understanding of an ICMP packet’s structure.
Each PDU in layer 3, consists of different parts. Lets take a look at a typical IPv4 ICMP packet:
As you can see, we have 20 bytes of IPv4 header at the top, followed by 8 bytes of ICMP header, and finally the data or the
If we send some packets to a remote host with the DF flag set, and if those packets exceed the maximum packet size of our link, not only they should never reach the remote host, we should also receive an ICMP message from a router along the way (likely our own ISP), informing us.
The easiest way to do so, is by using ICMP Ping. The
ping command, is available in pretty much every platform you can think of.
To summarize: We set the DF flag on our ICMP packet and send a big enough ICMP Ping payload, after exceeding the maximum packet size of our link, we will observe the results.
Constructing the ping command
We first try to send a 1500 bytes ping packet to a remote server. Shortly I will explain why.
Open the terminal and issue this command:
ping -c 4 -s 1472 -M do 184.108.40.206
The arguments are pretty simple:
||Number of pings|
||Size of the payload. Remember that each ICMP payload has 28 bytes of overhead ( 1472 + 28 = 1500 )|
||Path MTU Discovery strategy. The
||The remote host we’re sending the packets to (In this case, one of google’s public DNS servers)|
cmd.exe and issue this command:
ping -l 1472 -f 220.127.116.11
Again, arguments are very simple:
||Size of the payload (just like above)|
||Set the Don’t Fragment flag|
||Sorry google! Your DNS servers are just too awesome|
Identifying the correct MTU
Now if you do get a ping reply with the above commands, it means that at least for the path between you and google’s DNS servers, your MTU is 15002 and you should not have any MTU (and likely MRU) problems.
If you suspect you are having MTU issues on your link, the first step is to reproduce it. So try again those commands with another IP address (or ideally the IP address you have issue with) until you do not get a pong from the other end.
On a healthy link with MTU of less than 1500 bytes, you should see a response like this:
PING 18.104.22.168 (22.214.171.124) 1472(1500) bytes of data. From 10.11.12.13 icmp_seq=1 Frag needed and DF set (mtu = 1492) ping: local error: Message too long, mtu=1492 ping: local error: Message too long, mtu=1492 ping: local error: Message too long, mtu=1492 --- 126.96.36.199 ping statistics --- 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3051ms
Pinging 188.8.131.52 with 1472 bytes of data: Reply from 10.11.12.13: Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. Ping statistics for 184.108.40.206: Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
As you can see, we got a response from a hop on the link that it can not pass our packet. We also get the hop’s IP address and as a bonus, Linux ping also shows the MTU.
Most likely than not, the said hop is either the next immediate router on your path (i.e, your home router) or your ISP.
You can then adjust the payload size (in this case 1492-28=
1464) again and retry. You do that until you get a response from the other end.
That however, is how it should work and if it did work like this, you wouldn’t be needing to find the MTU manually.
If you do get a plain ping-timeout reply every time, then you might indeed have a PMTUD black hole on your path. Finding the MTU at this point is as easy as adjusting ping’s payload.
To summarize: You reduce the ping’s payload until you start getting replies. You could start by reducing its size in half (which you most likely would get a reply) and then fine-tune it from there3.
Identifying the correct MRU
You usually shouldn’t be needing to adjust this. It’s not your problem but the next-hop’s ones4.
There is a certain twist in finding your MRU:
You typically can’t force the other end to send you a specific packet size with DF flag set.
Setting the DF flag on your ping packet, does not automatically mean the reply would also have the DF flag set (In fact my testing shows it doesn’t). And even if you could force that, you still would have trouble finding out whether in fact PMTUD works for your link’s MRU or not. That is unless you control the other end as well.
To summarize: Your best bet for finding the correct MRU of your link, is to ping your host from a remote location (making it the MTU of that remote location). If you don’t have access to a remote host, search the web for online ping services and use those instead.
Caveats and pitfalls
You should be aware that there are some situations in which you might not get the expected results. Some of them are as follows:
Your ISP might silently remove the DF flags: This is really a bad practice but some ISPs opt for this as a way of solving MTU issues altogether. On such connections, once a packet reaches the ISP, the DF flag is removed.
Some remote hosts might send truncated replies: To protect their network, some remote hosts instead of replying the ping with the same payload, they truncate it. Making the reply somewhat invalid. While the client can usually correctly handle this for a single ping packet, the situation could get complicated once they get fragmented. Best to use a host known to not do that (like google DNS servers).
A firewall might be blocking your ping request/response: If that’s the case and you are sure it’s not being done on your end, you are pretty much out of luck using ping. One way would be constructing a TCP packet to send to a remote host (this is somewhat more complex). Another way is to just run Wireshark and observe the normal traffic for couple of minutes to make an educated guess about your links MTU/MRU.
You might be behind a transparent proxy: That means you think you have a direct connection but you really don’t. Most of your traffic goes through what’s called a Transparent Proxy. Even if you do get a ping response from a remote host, that’s not necessarily reflect your real MTU/MRU to the transparent proxy.
Your link MTU might change over time: This is rather unusual for home networks but I’ve seen this on mobile networks. In such cases the all time low MTU of your link is your only reliable MTU.
A firewall might block fragmented ping requests/responses: Yes, no kidding, I’ve seen this too. Whether it was accidental and the result of connection tracking issue, or on purpose to possibly discourage the use of ping payloads to transfer real data (e.g., to bypass firewalls), it effectively blocks pings as soon as they get fragmented. So basically you may not have any MTU issue at all and yet, ping results would suggest otherwise. This one is really unfair to troubleshoot!
The NIC on your host, migh be set to use a MTU value other than 1500: Specially in windows, this may cause lots of weird results. Whether its set via adapter’s setting or the
netshcommand, that can influence your result.
Some low-level network drivers might affect the result: Again, specially in windows, this can easily happen with security softwares such as Kaspersky’s NDIS filter.
- Note that some ping implementations like BusyBox, are not suitable for this since they lacked the required parameters. ^
- Or possibly even more, but uncommon. It could also mean your ISP is being really naughty, more on that later. ^
- If you never get a reply and you are sure your internet connection is working, then a firewall in the path might be blocking it. I will briefly introduce another approach at the end of the article. ^
- Refer to the PMTUD discussion for more info. ^