DRBD recovery from split brain

· by Artem Sidorenko · Read in about 2 min · (255 words)

If you get degraded DRBD with log messages like “Split-Brain detected but unresolved, dropping connection!”, you have to manually resolve split brain situation.

Possible reason for such situation

  • You switch all cluster nodes in standby via cluster suite (i.e. pacemaker) at the same time, so you don’t have any active drbd instance running.
  • You switch the node, which was in Secondary status in drbd last time, online.
  • You switch other nodes online

Result:

#/proc/drbd on the node1
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-1
2 23:52:00
 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:76

#log message on the node1
Mar  7 15:38:05 node1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!

#/proc/drbd on the node2
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-1
2 23:52:00
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:144 dr:4205 al:5 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:100

Manually resolve this split brain situation

Start drbd manually on both nodes

Define one node as secondary and discard data on this

drbdadm secondary all
drbdadm disconnect all
drbdadm -- --discard-my-data connect all

Think about the right node to discard the data, otherwise you can lose some data

Define another node as primary and connect

drbdadm primary all
drbdadm disconnect all
drbdadm connect all

Configure drbd for auto split brain resolving

Place this configuration in drbd.conf on both nodes

net {
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
}

See too