Mail Server - Funky Penguin's Geek Cookbook

Hmm, the pre-mix doesn’t actually reference the traefik network. Is it necessary?

I removed the reference to the traefik network and it seems to work fine.

I’m trying to sync my email over from my old server using mbsync. It’s super slow. Way slower than when I tried it before with a configured server rather than using docker. Have you seen performance problems using the cephfs? Any suggestions for how to pinpoint it?

CPU usage on all the nodes is at 3% or so. The nodes all have 16GB of RAM. NVME drives for boot and a separate NVME drive for storage. These things should be stupid fast.

top - 19:00:39 up 3 days, 22:02, 1 user, load average: 0.00, 0.02, 0.01
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16316016 total, 11687504 free, 1821032 used, 2807480 buff/cache
KiB Swap: 16662524 total, 16662524 free, 0 used. 14079380 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
23619 ceph      20   0 1147072 324496  28944 S   1.3  2.0  11:48.93 ceph-osd   
 1120 root      20   0 1622524  97136  33344 S   1.0  0.6  37:15.53 dockerd    
25133 ceph      20   0  520532 144456  17240 S   0.7  0.9   5:12.71 ceph-mds   
22597 ceph      20   0 1523088 1.086g  21560 S   0.3  7.0   8:34.26 ceph-mon   
26307 root      20   0       0      0      0 S   0.3  0.0   0:00.11 kworker/1:0
27062 root      20   0   41800   3628   3032 R   0.3  0.0   0:00.06 top        
    1 root      20   0   39424   7384   3892 S   0.0  0.0   0:06.88 systemd    
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kthreadd   
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.14 ksoftirqd/0
    7 root      20   0       0      0      0 S   0.0  0.0   0:38.60 rcu_sched  
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh     
    9 root      rt   0       0      0      0 S   0.0  0.0   0:00.06 migration/0
   10 root      rt   0       0      0      0 S   0.0  0.0   0:01.44 watchdog/0

I agree it doesn’t seem to be ceph which is struggling. Maybe try benchmarking copying a 10GB (or a size > your available RAM, to avoid caching) file locally one one of the nodes, vs copying the same file into cephfs?

docker-mailserver does include fetchmail support, although I haven’t played with it. Maybe you could use setup.sh to suck the mail off your old host that way?

D

On the local filesystem:

sudo dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.833802 s, 1.3 GB/s

On the cephfs filesystem:

sudo dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 30.706 s, 35.0 MB/s

That’s pretty terrible.

Okay, I’m trying to get rainloop up and running. It looks like they support sieve now. This is my first time trying to get a service up in this context, so I’m looking for advice :slight_smile: Here’s my docker-compose.yml. I get an error that I can’t get a secure connection to rainloop.gerg.org, so I’m wondering what I’m doing wrong:

version: '3'

services:
  mail:
    image: tvial/docker-mailserver:latest
    ports:
      - "25:25"
      - "587:587"
      - "993:993"
    volumes:
      - /var/data/mailserver/maildata:/var/mail
      - /var/data/mailserver/mailstate:/var/mail-state
      - /var/data/mailserver/config:/tmp/docker-mailserver
      - /var/data/mailserver/letsencrypt:/etc/letsencrypt
    env_file: /var/data/mailserver/.env
    networks:
      - internal
    deploy:
      replicas: 1

  rainloop:
    image: hardware/rainloop
    networks:
      - internal
      - traefik_public
    deploy:
      labels:
        - traefik.frontend.rule=Host:rainloop.gerg.org
        - traefik.docker.network=traefik_public
        - traefik.port=80
     volumes:
      - /var/data/mailserver/rainloop:/rainloop/data

networks:
  traefik_public:
    external: true
  internal:
    driver: overlay
    ipam:
      config:
        - subnet: 172.16.2.0/24

Looks like traefik.port should be 8888 instead of 80. Seems to work! Though I don’t have sieve working quite yet…

Excellent :slight_smile: Yes, traefik_port should be whatever the “app” container listens on. Most containers listen on port 80, but if they don’t (and it’s not mentioned in their docs), you can find out by either:

  1. Inspecting their Dockerfile
  2. Inspecting the container using “docker inspect [container-id]”

I’m keen to know how sieve goes - you’re one step ahead of my recipe now :wink:

D

I didn’t have starttls turned on for sieve in rainloop. Once I did that, it seems to be working.

I ran into an odd issue with sieve rules. They make a bogus directory show up in the mail apps. The workaround is to put an override into config/dovecot.cf

plugin {
  sieve = /var/mail/sieve/%d/%n/.dovecot.sieve
  sieve_dir = /var/mail/sieve/%d/%n/sieve
}

I actually like this fix better than the one that’s working its way through mailserver.

Here’s a reference to the issue: https://github.com/tomav/docker-mailserver/issues/508

I have a problem with my mailserver. It’s getting pounded (apparently by one of my devices?) I see this in the logs:

Oct 20 00:53:14 011c9f505ae6 dovecot: imap-login: Maximum number of connections from user+IP exceeded (mail_max_userip_connections=10): user=<ggilley@gerg.org>, method=PLAIN, rip=10.255.0.2, lip=10.255.0.14, TLS, session=<GgPL5e9bGfQK/wAC>

Top shows:

I can’t start up a shell on the docker container:

oci runtime error: exec failed: container_linux.go:265: starting container process caused "process_linux.go:84: executing setns process caused \"exit status 15\""

And I can’t find a way to restart the service. Am I at the docker stack rm, docker stack deploy stage?

Remember, you’re using swarm ingress, so any inbound traffic will appear to come from that address. Maybe you’re being brute-forced? You should be able to shell into the container though, using something like this:

[root@ds3 ~]# docker exec -it 5947 bash
root@5947b30c889b:/#

Can you get logs using setup.sh?

Instead of deleting and redeploying the whole stack, you could just stop the specific container, and let swarm auto-recover it, but it will most likely auto-start it on a different node, which might break your inbound mail NAT.

I couldn’t stop the docker processes. Couldn’t even reboot. I ended up having to cycle power on the machine to get things back.

Then mail wasn’t working. Turns out keepalived on one of the machines was in a weird state (I must have missed starting ipvs, but it was in /etc/rc.local), so the firewall was pointing at the wrong machine. The downside of having the mail process pinned to one machine. Guess I need to spend some more time investigating mail in a swarm (unless you’ve already figured it out :slight_smile:

I wish I’d already figured it out, but sadly not :frowning:

I ended up in a bad state again with "maximum number of connections exceeded. My “solution” is to add to the dovecot.cf:

protocol imap {
  # Space separated list of plugins to load (default is global mail_plugins).
  #mail_plugins = $mail_plugins

  # Maximum number of IMAP connections allowed for a user from each IP address.
  # NOTE: The username is compared case-sensitively.
  mail_max_userip_connections = 100
}

Then to restart the mailserver (I couldn’t find a way to restart a service from docker stack):

sudo docker service update --force mailserver_mail

I thought the LetsEncrypt certificates would automatically renew. They didn’t and mail is not happy. Did I miss something?or did I mis-configure.

Greg

Eeeew. I thought so too, but I’m in the same boat. I’ll check it out…

OK, so preliminary research says we have to renew our certs by doing something like this:

cd /var/data/mailserver
docker run -ti --rm -v "$(pwd)"/letsencrypt:/etc/letsencrypt certbot/certbot renew

Sadly, this doesn’t work for my certs, which were registered --dns --manual - as it turns out, I have to regenerate them every 90 days :frowning:

Let me know how it goes?
D

No luck here:

Processing /etc/letsencrypt/renewal/mail.gerg.org.conf
-------------------------------------------------------------------------------
Cert is due for renewal, auto-renewing...
Could not choose appropriate plugin: The manual plugin is not working; there may be problems with your existing configuration.
The error was: PluginError('An authentication script must be provided with --manual-auth-hook when using the manual plugin non-interactively.',)
Attempting to renew cert (mail.gerg.org) from /etc/letsencrypt/renewal/mail.gerg.org.conf produced an unexpected error: The manual plugin is not working; there may be problems with your existing configuration.
The error was: PluginError('An authentication script must be provided with --manual-auth-hook when using the manual plugin non-interactively.',). Skipping.
All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/mail.gerg.org/fullchain.pem (failure)

-------------------------------------------------------------------------------

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/mail.gerg.org/fullchain.pem (failure)
-------------------------------------------------------------------------------

I followed your recipe using the domain challenge, so I guess I also have to do the manual updates. Since I don’t have to worry about it again for 3 months, I’ll figure something out closer to that time. :slight_smile:

Yeah, likewise, I just manually regenerated my certs. Some ideas here - we could add a “cron-type” container ala-NextCloud, which attempts the cert renewal daily (it should do nothing provided the cert is not due for expiry). I noticed that the DNS TXT entry for the verification didn’t change, so it may be possible to fully-automate the “manual” regeneration :slight_smile: