
The ticket said “internet is down.” Then, four minutes later, “never mind, it’s back.” Then, twenty minutes after that, “it’s down again.”
That pattern is a special kind of hell. A clean outage you can chase. A flapping one that heals itself before you SSH in just laughs at you.
I got a client on the phone. Their whole site would blackhole — every desktop, every phone, the printer, all of it — for one to five minutes, then snap back like nothing happened. No schedule. No trigger anyone could name. The ISP swore the circuit was clean, and for once the ISP was right.
The investigation
I started where you start: at the edge. The gateway was up. WAN was up. The firewall logs were boring in the way you want them to be boring. No flapping interface, no CPU spike, no DHCP storm in the leases.
So I sat on a workstation and just hammered the gateway while I waited for the next outage.
ping -i 0.5 192.0.2.1
It ran fine for ten minutes. Then — dead air. Eight, nine, ten dropped replies. Total blackhole. Then it came back on its own.
The instant it died, I dumped the ARP table.
arp -a 192.0.2.1
And there it was. The gateway’s IP was mapped to a MAC address I did not recognize. Not the firewall’s MAC. Some OUI I had to look up by hand.
I watched it flap in real time:
┌─────────────────────────────────────────────────────────────┐
│ arp -a 192.0.2.1 (watched over ~30s) │
├─────────────────────────────────────────────────────────────┤
│ t+0s 192.0.2.1 ► aa:bb:cc:11:22:33 (firewall) ✓ │
│ t+6s 192.0.2.1 ► de:ad:be:ef:00:99 (???) ✗ │
│ t+9s 192.0.2.1 ► de:ad:be:ef:00:99 (???) ✗ │
│ t+14s 192.0.2.1 ► aa:bb:cc:11:22:33 (firewall) ✓ │
│ t+19s 192.0.2.1 ► de:ad:be:ef:00:99 (???) ✗ │
└─────────────────────────────────────────────────────────────┘
▼ when the IP pointed at de:ad:..:99, traffic vanished
When the gateway IP pointed at the firewall, the network worked. When it pointed at that mystery MAC, every packet bound for the internet got handed to a device that did absolutely nothing with it. Classic ARP poisoning. The only question was who.
The “aha”
I looked up the OUI. The mystery MAC belonged to a Tuya-based Wi-Fi module — the kind that ships inside cheap smart-home gadgets by the millions.
Then I cross-referenced the DHCP leases for that MAC and found a hostname that made me laugh out loud: a smart window blind controller. Somebody had put a $30 motorized blind on the corporate flat network.
That little NIC was broadcasting unsolicited ARP replies claiming to be 192.0.2.1. Every client on the segment believed it, updated its ARP cache, and started shipping its default route straight into a window covering.
The fix
First, stop the bleeding. I pinned the gateway’s real MAC on the affected machines so they’d stop trusting the lie while I worked.
# Linux
sudo ip neigh replace 192.0.2.1 lladdr aa:bb:cc:11:22:33 dev eth0 nud permanent
Then the real fix — get IoT off the flat network entirely and turn on the switch protections that should’ve been on from day one.
BEFORE AFTER
────── ─────
VLAN 1 (everything) VLAN 10 desktops/servers
├─ desktops VLAN 20 voice
├─ servers VLAN 90 IoT ◄── blind lives here,
├─ phones firewalled, no L2
└─ smart blind ◄── poisoner reach to clients
# Cisco-style switch hardening (concept)
# Trust only the uplink for DHCP; drop rogue offers elsewhere
ip dhcp snooping
ip dhcp snooping vlan 90
interface Gi1/0/24
ip dhcp snooping trust
# Dynamic ARP Inspection — validate ARP against snooping bindings
ip arp inspection vlan 90
interface Gi1/0/24
ip arp inspection trust
Last step: I yanked the blind, pushed its vendor firmware update, and parked it on the isolated IoT VLAN where it can lie about being the gateway all it wants — and nobody who matters will hear it.
Why it happened
Cheap Tuya firmware is a grab bag. Some builds have buggy network stacks that emit malformed or unsolicited ARP — gratuitous ARP gone feral, sometimes triggered by a Wi-Fi reconnect, which explains the random timing. Could also be straight-up malicious; with no-name IoT you genuinely can’t tell, and it doesn’t change the remediation.
Either way, a flat L2 network trusts every device equally. ARP has no authentication. Whatever shouts loudest and last, wins. Put one mouthy gadget on that wire and it can pretend to be anything — including the door to the internet.
Takeaways
- On weird, intermittent, self-healing outages, check the gateway’s ARP entry FIRST.
arp -a <gateway-ip>flapping to an unknown MAC is a five-second smoking gun. - Match a rogue MAC to its OUI and your DHCP leases. That’s how a “mystery device” became “the smart blind in the break room.”
- IoT does not belong on your flat network. Ever. Its own VLAN/SSID, firewalled off from clients, no L2 reach to anything that matters.
- Turn on DHCP Snooping and Dynamic ARP Inspection. They exist precisely to kill rogue gateways and ARP lies at the switch.
- Cheap IoT is an untrusted host by default — bug or malice, you treat it the same: isolate, update, and never let it speak for the gateway.