Mail Server - Funky Penguin's Geek Cookbook


#1

Many of the recipies that follow require email access of some kind. It's normally possible to use a hosted service such as SendGrid, or just a gmail account. If (like me) you'd like to self-host email for your stacks, then the following recipe provides a full-stack mail server running on the docker HA swarm.


This is a companion discussion topic for the original entry at https://geek-cookbook.funkypenguin.co.nz/recipies/mail/

#2

The instructions say to create a directory called /var/data/mailserver, but the docker-compose file references /var/data/mail. (And the git repository references /var/data/docker-mailserver)


#3

The docker-compose file references a traefik network. The git repository does not. Which is correct? Also, when I do a list of networks, there is no traefik network.

NETWORK ID          NAME                  DRIVER              SCOPE
b97bd270f512        bridge                bridge              local
c0342e092ab5        docker_gwbridge       bridge              local
c01707ef700f        host                  host                local
9dffna7c85f1        ingress               overlay             swarm
qexvt668odwt        mailserver_internal   overlay             swarm
e21b75fea00f        none                  null                local
5z7k8jarn7av        traefik_public        overlay             swarm

#4

Right. So this is a discrepancy between the original recipe, and the improvements I made in the pre-mix repository. If you’re building off pre-mix, then your traefik stack would have created an overlay network called “traefik_public” (I felt it was more descriptive than just “traefik”). I’ll tidy this up in the original recipe (ho ho, I sound like Colonel Sanders)


#5

The path /var/data/ discrepancy is due to other multiple iterations. Choose whatever suits you - In the pre-mix repo (i.e., what I use myself), I chose to name it docker-mailserver, which is slightly more verbose but also more descriptive :wink:


#6

Recipe updated, it should be consistent with pre-mix now :slight_smile:


#7

Hmm, the pre-mix doesn’t actually reference the traefik network. Is it necessary?


#8

I removed the reference to the traefik network and it seems to work fine.

I’m trying to sync my email over from my old server using mbsync. It’s super slow. Way slower than when I tried it before with a configured server rather than using docker. Have you seen performance problems using the cephfs? Any suggestions for how to pinpoint it?

CPU usage on all the nodes is at 3% or so. The nodes all have 16GB of RAM. NVME drives for boot and a separate NVME drive for storage. These things should be stupid fast.

top - 19:00:39 up 3 days, 22:02, 1 user, load average: 0.00, 0.02, 0.01
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16316016 total, 11687504 free, 1821032 used, 2807480 buff/cache
KiB Swap: 16662524 total, 16662524 free, 0 used. 14079380 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
23619 ceph      20   0 1147072 324496  28944 S   1.3  2.0  11:48.93 ceph-osd   
 1120 root      20   0 1622524  97136  33344 S   1.0  0.6  37:15.53 dockerd    
25133 ceph      20   0  520532 144456  17240 S   0.7  0.9   5:12.71 ceph-mds   
22597 ceph      20   0 1523088 1.086g  21560 S   0.3  7.0   8:34.26 ceph-mon   
26307 root      20   0       0      0      0 S   0.3  0.0   0:00.11 kworker/1:0
27062 root      20   0   41800   3628   3032 R   0.3  0.0   0:00.06 top        
    1 root      20   0   39424   7384   3892 S   0.0  0.0   0:06.88 systemd    
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kthreadd   
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.14 ksoftirqd/0
    7 root      20   0       0      0      0 S   0.0  0.0   0:38.60 rcu_sched  
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh     
    9 root      rt   0       0      0      0 S   0.0  0.0   0:00.06 migration/0
   10 root      rt   0       0      0      0 S   0.0  0.0   0:01.44 watchdog/0

#9

I agree it doesn’t seem to be ceph which is struggling. Maybe try benchmarking copying a 10GB (or a size > your available RAM, to avoid caching) file locally one one of the nodes, vs copying the same file into cephfs?

docker-mailserver does include fetchmail support, although I haven’t played with it. Maybe you could use setup.sh to suck the mail off your old host that way?

D


#10

On the local filesystem:

sudo dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.833802 s, 1.3 GB/s

On the cephfs filesystem:

sudo dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 30.706 s, 35.0 MB/s

That’s pretty terrible.


#11

Okay, I’m trying to get rainloop up and running. It looks like they support sieve now. This is my first time trying to get a service up in this context, so I’m looking for advice :slight_smile: Here’s my docker-compose.yml. I get an error that I can’t get a secure connection to rainloop.gerg.org, so I’m wondering what I’m doing wrong:

version: '3'

services:
  mail:
    image: tvial/docker-mailserver:latest
    ports:
      - "25:25"
      - "587:587"
      - "993:993"
    volumes:
      - /var/data/mailserver/maildata:/var/mail
      - /var/data/mailserver/mailstate:/var/mail-state
      - /var/data/mailserver/config:/tmp/docker-mailserver
      - /var/data/mailserver/letsencrypt:/etc/letsencrypt
    env_file: /var/data/mailserver/.env
    networks:
      - internal
    deploy:
      replicas: 1

  rainloop:
    image: hardware/rainloop
    networks:
      - internal
      - traefik_public
    deploy:
      labels:
        - traefik.frontend.rule=Host:rainloop.gerg.org
        - traefik.docker.network=traefik_public
        - traefik.port=80
     volumes:
      - /var/data/mailserver/rainloop:/rainloop/data

networks:
  traefik_public:
    external: true
  internal:
    driver: overlay
    ipam:
      config:
        - subnet: 172.16.2.0/24

#12

Looks like traefik.port should be 8888 instead of 80. Seems to work! Though I don’t have sieve working quite yet…


#13

Excellent :slight_smile: Yes, traefik_port should be whatever the “app” container listens on. Most containers listen on port 80, but if they don’t (and it’s not mentioned in their docs), you can find out by either:

  1. Inspecting their Dockerfile
  2. Inspecting the container using “docker inspect [container-id]”

I’m keen to know how sieve goes - you’re one step ahead of my recipe now :wink:

D


#14

I didn’t have starttls turned on for sieve in rainloop. Once I did that, it seems to be working.


#15

I ran into an odd issue with sieve rules. They make a bogus directory show up in the mail apps. The workaround is to put an override into config/dovecot.cf

plugin {
  sieve = /var/mail/sieve/%d/%n/.dovecot.sieve
  sieve_dir = /var/mail/sieve/%d/%n/sieve
}

I actually like this fix better than the one that’s working its way through mailserver.

Here’s a reference to the issue: https://github.com/tomav/docker-mailserver/issues/508


#16

I have a problem with my mailserver. It’s getting pounded (apparently by one of my devices?) I see this in the logs:

Oct 20 00:53:14 011c9f505ae6 dovecot: imap-login: Maximum number of connections from user+IP exceeded (mail_max_userip_connections=10): user=<ggilley@gerg.org>, method=PLAIN, rip=10.255.0.2, lip=10.255.0.14, TLS, session=<GgPL5e9bGfQK/wAC>

Top shows:

I can’t start up a shell on the docker container:

oci runtime error: exec failed: container_linux.go:265: starting container process caused "process_linux.go:84: executing setns process caused \"exit status 15\""

And I can’t find a way to restart the service. Am I at the docker stack rm, docker stack deploy stage?


#17

Remember, you’re using swarm ingress, so any inbound traffic will appear to come from that address. Maybe you’re being brute-forced? You should be able to shell into the container though, using something like this:

[root@ds3 ~]# docker exec -it 5947 bash
root@5947b30c889b:/#

Can you get logs using setup.sh?

Instead of deleting and redeploying the whole stack, you could just stop the specific container, and let swarm auto-recover it, but it will most likely auto-start it on a different node, which might break your inbound mail NAT.


#18

I couldn’t stop the docker processes. Couldn’t even reboot. I ended up having to cycle power on the machine to get things back.

Then mail wasn’t working. Turns out keepalived on one of the machines was in a weird state (I must have missed starting ipvs, but it was in /etc/rc.local), so the firewall was pointing at the wrong machine. The downside of having the mail process pinned to one machine. Guess I need to spend some more time investigating mail in a swarm (unless you’ve already figured it out :slight_smile:


#19

I wish I’d already figured it out, but sadly not :frowning:


#20

I ended up in a bad state again with "maximum number of connections exceeded. My “solution” is to add to the dovecot.cf:

protocol imap {
  # Space separated list of plugins to load (default is global mail_plugins).
  #mail_plugins = $mail_plugins

  # Maximum number of IMAP connections allowed for a user from each IP address.
  # NOTE: The username is compared case-sensitively.
  mail_max_userip_connections = 100
}

Then to restart the mailserver (I couldn’t find a way to restart a service from docker stack):

sudo docker service update --force mailserver_mail