It looks like a ISP DNS blockage caused one of our ps's to fall behind on and have around 5900 little sub-ps's pile up. Then a oom-killer triggered. oom-killer is actually wonderful in this case as it logged the state of everything to disk.
Jan 4 08:55:29 kernel: [5384435.328060] Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15744kB Jan 4 08:55:29 kernel: [5384435.329687] Node 0 DMA32: 14*4kB 43*8kB 62*16kB 35*32kB 7*64kB 16*128kB 67*256kB 21*512kB 2*1024kB 3*2048kB 20*4096kB = 123024kB Jan 4 08:55:29 kernel: [5384435.331315] Node 0 Normal: 15448*4kB 66*8kB 7*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62432kB
So it is fragmentation, or simple ram exhaustion, due to runaway small ps's due to blocked DNS. Time to rejig the app to handle DNS going down. :-)