Postfix - Problem Sending Semi-Large Attachments

I'm having a small problem with my mail server and I hope somebody will have an idea or two.

My mail server runs FreeBSD 7.1 with a fully updated port system. I use Postfix as an MTA requiring SASL to send via SMTP and has TLS support to encrypt messages. I use amavisd-new and clamav to scan e-mail as it arrives. (No spam filter, just viruses.) I also use postgrey to implement greylisting. The mail server is on it's own dedicated 3.0 meg down/512k up DSL line. There are three network cards in the server - one goes to the DSL modem, another via cat5 to a laptop in my office for administration, and another to a wireless router. Customers can connect to the wifi router (AKA hotspot) and use our internet on ports 80 and 8080; pf simply passes the data on these ports out the DSL line and blocks all other connections from that network interface. Our business has a separate 6.0 meg down/768k up DSL line. The mail client is Mozilla Thunderbird.

The problem is that when somebody sends an e-mail with a large attachment (~3-8 meg), sometimes it will send and other times not. The bigger the attachment, the more likely sending is to fail. Here's the kicker - if I send mail using the administration laptop in my office connected directly to the server or using a computer connected to the wifi hotspot (also local to the server, though it has to be looping through the NICs because pf does not allow SMTP from the wifi card), messages can be sent with large attachments without a problem. To me, this indicates a problem with the DSL line connecting the server to the internet. However, tests run by my ISP show the line to be fine and there are no drops shown in /var/log/messages or /var/log/ppp.log, just the normal automatic re-authentication via PPPoE after 20 hours of being connected.

I just did a test and reproduced the trouble. Thunderbird was sending fine and stalled at 51%, then said that sending of message failed because connecting to the SMTP server failed. Looking on the server, there's nothing new in /var/log/messages or /var/log/ppp.log, and /var/log/maillog indicates simply 'postfix/smtpd[pid]: lost connection after DATA (nnnn bytes) from unknown[ip]' followed by 'disconnect from unknown[ip]'.

I'm frankly at a loss of what to do next. Any ideas of what might be wrong or what to try to troubleshoot further?
 
Could you post a network diagram (in
Code:
 tags), because I still don't understand the network after having read this 4 times. Like: the mail server is on a dedicated DSL line, and suddenly a different separate DSL connection appears; I don't see how they fit together, if at all. 

How is mail going out? Is Postfix your LAN's smtp server (i.e.: is it spooling the mail locally before sending it to the DSL modem? Is that server in Thunderbird's smtp server settings?). 

Stuff like that ..
 
Sorry for the confusion. Yes, they are two completely separate internet lines. (Reading over it again, I can see where it could be unclear - sorry.) Hope this comes through without losing formatting...


Code:
3 meg DSL ---> DSL modem ------|
  /\                           |
  ||                           \/
  ||                         E-mail Mail Server
  ||                            |       |
  ||   Administration laptop <---       |
  ||                                    |
  ||                                    ---> Wi-Fi Hotspot Router
  ||
  \/
<<<INTERNET>>>>
   /\
   ||
   \/
6 meg DSL ---> DSL Modem ---> Corporate Router ---> PCs

Each PC has the mail server set as both it's POP3 and SMTP server. When a message is sent, it goes out the 6 meg DSL line, through the internet, in the 3 meg DSL line to the mail server. (Both DSL lines are with the same ISP, though it really shouldn't matter IMO.)

Please let me know if things are still unclear. (I seem to have a way of making things seem more complex than they are... :( )
 
Why isn't the mailserver hooked up to the corp router instead of burdening two DSL connections with 'in-house' traffic? I can imagine that the 3 meg line comes in handy for MX purposes (receiving mail from the outside only), but the traffic streams below look very inefficient.

Code:
local delivery, 768K shared DSL bottleneck:
PC --> 768K DSL --> ISP router --> 3M DSL -> Postfix

remote delivery, 512K shared DSL bottleneck:
PC --> 768K DSL -> ISP router --> 3M DSL --> Postfix --> 512K DSL --> destination

In both cases, the limited upload bandwidth needs to be shared with more important traffic (tcp acks for the 3 and 6 meg downloads, which may amount to 100K+ and 300K+ bandwidth, leaving even less for email).

Three possibilities:
1. make a shortcut from the corp network to the mailserver (keeping traffic entirely local, and only using the 3M DSL download for incoming email from external sites, and its 512K upload for mail to external sites)
2. use traffic shaping to separate tcp acks and email, giving them guaranteed bandwidth, prioritising them over all other traffic (which may end up being pushed down to 0 K)
3. use a separate SMTP server on the corp network, making it possible for all Thunderbird users to have their outgoing email 'absorbed' immediately instead of waiting endlessly for the smtp server two DSL lines away to signal them that it worked or failed. That's quite a tcp ack storm going on there with several people mailing at once, I can tell you.

Whichever way I look at this, I don't see this as a Postfix problem, but as a traffic congestion/contention problem.
 
Checkout your /var/log/maillog for errors. Do you see anything? If not, than Postfix is not your problem. You may need to updater master.cf to increase debug level. See postfix docs.
 
The problems I see with using tcpdump is that at any given time there are a few dozen machines checking for mail from the same IP I'd be sending from. Plus, if I send a 5 meg file, the output from tcpdump would be a lot to sift through (and I really don't know what I'm looking for).


DutchDaemon said:
Why isn't the mailserver hooked up to the corp router instead of burdening two DSL connections with 'in-house' traffic? I can imagine that the 3 meg line comes in handy for MX purposes (receiving mail from the outside only), but the traffic streams below look very inefficient.

I agree 100% - they are extremely inefficient. However, I'm prevented by a contract from connecting our LAN in any way to a system that has any other access to the internet. They also refuse to forward a single port through the corporate firewall; this is why I have to pay for an extra line to begin with.

I did find a mailing list discussion from 2006 that seems to imply that the problem may be caused by a firewall somewhere between the two dropping ICMP packets. There aren't more details in that discussion though. Does this seem like a logical conclusion?


I did have an idea writing this. I'm going to try sending myself several large e-mail attachments tonight from one of my home computers. That should tell me whether the problem is with my mail server/3 meg DSL or with specific to our corporate firewall/6 meg DSL. I really should have thought of this and tried it earlier, because if it's the latter, I just wasted everybody's time. :(
 
I am extremely sorry for wasting everybody's time. I just tried exactly what I described in the previous post and am able to send mail with 6-7 meg attachments from home without trouble. Therefore, the problem is specific to either our 6 meg DSL line or our corporate firewall. I wish I could say that I'm happy that it's not something wrong with my server, but I now have to call our ISP and firewall provider and try to get one of them to own up to the problem. (IMO, struggling with support people is far worse than struggling with technology...)

Thank you to everybody who posted on this. Again, my apologies for not isolating the problem before posting.
 
No problem, but don't forget to also look at network congestion. With dozens of people using POP3/IMAP and SMTP on a shared DSL line that's also used (I presume) for stuff like web browsing and other download activities. the 768K upload becomes a TCP ACK bottleneck very quickly, and without traffic shaping or prioritising, that could become a problem. This usually presents itself in stateful connections and long sessions (like POP3/IMAP/SMTP) timing out because acks/nacks get lost.
 
Try to increase timeout of postfix to 600s for example.

Code:
smtp_data_done_timeout
smtp_data_xfer_timeout
smtp_data_init_timeout

Hope it helps.
 
Back
Top