Beyond Polling: The Future of Real-Time Infrastructure

For years, polling has been the standard operating procedure. It’s time to admit it doesn’t work anymore.

When you’re managing a handful of servers, hitting an endpoint every few minutes feels harmless. But scale that to tens of thousands of devices and the math gets ugly fast. You’re burning CPU cycles, clogging network pipes, and still waiting minutes for commands to land. I’ve watched organizations throw hardware at this problem when the real issue was architectural.

There’s a better way. I’ve spent years building systems that command fleets of devices in near real-time without melting the infrastructure. The secret isn’t exotic technology.

The Polling Problem

Traditional device management works like this: every agent checks in on a schedule, asks “got anything for me?”, and the server responds. Multiply that by 100,000 devices polling every 5 minutes and you’re looking at over 300 requests per second, 24/7, even when nothing is happening.

Most of those requests return empty. The server says “no commands” and the agent goes back to sleep. You’ve paid the cost of a full HTTP round trip for zero value.

This scales linearly with your fleet size. Double your devices, double your baseline load. It’s predictable, which makes it feel safe. But predictable isn’t the same as efficient.

Push Architecture with MQTT

MQTT flips the model. Instead of devices asking for work, the server tells them when work exists. Devices maintain a persistent connection to a message broker and subscribe to topics relevant to them. When the server has a command, it publishes to that topic. The device gets notified instantly.

The protocol was designed for constrained environments (think IoT sensors on bad networks) so it’s lightweight by nature. A single broker can handle hundreds of thousands of concurrent connections. The overhead per connection is measured in bytes, not kilobytes.

For device management, this means your server only talks when it has something to say. No more constant “anything for me?” chatter.

The Doorbell Pattern: Decoupling Signal from Payload

Here’s where most implementations go wrong. They try to stuff command payloads into MQTT messages. This works until it doesn’t. Commands can be large. They might contain scripts, file paths, configuration blobs. MQTT wasn’t built for that.

The pattern I’ve found most reliable treats MQTT as a doorbell, not a mailbox. The broker sends a tiny “wake” signal (literally an empty message) to the device. The device then fetches the actual command over HTTPS.

This separation buys you several things:

Commands live in your database, not in transit. You can update or cancel them before the device picks them up. You get full HTTP semantics for retries, authentication, and logging. The MQTT layer stays simple and fast.

If MQTT fails entirely, the device falls back to polling on a longer interval. You lose the instant response but commands still land. Reliability without fragility.

Topic Design at Scale

With thousands of devices across multiple customers, topic structure matters. A flat namespace turns into a mess quickly.

I use a three-tier hierarchy:

Device-specific topics for targeted commands. Each device subscribes to its own channel. To prevent enumeration attacks (someone guessing device IDs and subscribing), the topic name is a hash of the device token rather than the token itself.

Site-level topics for bulk operations. When you need to push a command to every device at a single customer location, you publish once. All devices at that site receive it simultaneously.

Broadcast topics for emergencies. Security patches, critical updates, anything that needs to hit the entire fleet immediately.

This structure means a single publish can reach one device, one thousand, or one hundred thousand depending on the topic. The broker handles fan-out efficiently.

Authentication Without Per-Device Credentials

Managing unique credentials for each device in a large fleet is a nightmare. Provisioning, rotation, revocation (the operational overhead grows with every device you add).

A simpler approach: all agents share the same broker credentials. Those credentials are encrypted locally on each device using a key derived from machine-specific attributes. The broker authenticates the connection, but device identity is established at the application layer through tokens exchanged over HTTPS.

This shifts complexity from the MQTT broker to your application server, where you already have robust identity management. The broker becomes a dumb pipe, which is exactly what you want at scale.

The Reliability Guarantee

Push systems have a reputation for being finicky. Connections drop. Brokers restart. Networks hiccup. If your architecture assumes perfect uptime, you’re going to have a bad time.

The solution is layered reliability. MQTT with auto-reconnect handles transient failures. Persistent sessions mean subscriptions survive brief disconnects. And underneath it all, a polling fallback (on a long interval, say 60 minutes) ensures that even if push is completely broken, devices will eventually sync.

This hybrid approach gives you the speed of push with the reliability of pull. Near-instant in the common case, guaranteed delivery in the worst case.

What This Looks Like in Practice

I’ve applied these patterns to systems managing a six-figure endpoint fleet across more than a hundred separate customer environments. The before and after:

Before (polling every X minutes): Constant baseline load. Command latency measured in minutes. Servers sized for peak polling traffic that rarely did useful work.

After (MQTT push with polling fallback): Near-zero idle load. Commands land in seconds. Infrastructure costs dropped because we weren’t paying for empty requests.

The broker (running on modest hardware) handles the connection load without breaking a sweat. The real work happens at the application layer where it belongs.

Trade-offs Worth Knowing

MQTT isn’t free. You’re trading request-response simplicity for persistent connection management. Your operations team needs to monitor broker health. Firewalls and proxies that weren’t designed for long-lived connections can cause headaches.

Debugging gets harder too. With polling, every interaction is a discrete HTTP request you can trace. With push, you’re dealing with event streams and subscription state. Tooling matters more.

For small fleets (hundreds of devices), polling is probably fine. The operational simplicity outweighs the efficiency gains from push. But somewhere between a thousand and ten thousand devices, the balance tips. By the time you hit six figures, push isn’t optional.

The Larger Point

The shift from polling to push isn’t really about MQTT. It’s about designing systems that match reality. Devices don’t need constant attention. They need instant response when something happens and silence otherwise.

This principle applies beyond device management. Any system where you’re repeatedly asking “has anything changed?” is a candidate for inversion. Let the source of truth notify observers. Stop asking questions you already know the answer to.

Building infrastructure this way requires more upfront thought. You can’t just slap a REST endpoint on everything and call it done. But the payoff (lower costs, faster response, better scalability) compounds over time.

I’ve found that the organizations willing to make this shift end up with systems that feel qualitatively different to operate. Less firefighting, more building. That’s the goal.

These patterns solve real-world scale for six-figure device fleets.

jaysonbrush

I lead technology teams and think a lot about what makes organizations actually work. I started writing here to help me work through ideas and share what I’ve learned along the way.