Architecting a High-Throughput & Fault-Tolerant IoT Pipeline

View the complete source code and Docker setup on GitHub ↗

Tracking a handful of vehicles is a simple CRUD operation. Tracking 10,000+ moving fleets transmitting GPS coordinates every second is a distributed systems challenge.

In the logistics and heavy machinery industries, real-time visibility is the backbone of operations. However, processing raw telematics data at this scale introduces two major engineering bottlenecks: The Data Deluge (massive I/O load) and Spatial Jitter (noisy, jumping GPS signals).

Feeding dirty, high-frequency coordinates directly to a monitoring dashboard destroys ETA accuracy, triggers false alarms, and quickly crashes synchronous core databases.

The Decoupled Architecture

Instead of a monolithic approach that tries to handle connections, computations, and database writes all at once, I designed Argos: a polyglot, event-driven pipeline that assigns the right tool to the right job.

Fig 1. The Chaos Test: Node.js seamlessly absorbing telemetry data while the Golang worker is intentionally taken offline.

1. The Gateway: Non-Blocking Ingestion (Node.js)

The front door of the system needs to hold tens of thousands of open WebSocket connections without choking. Node.js, with its single-threaded Event Loop, is the undisputed champion of I/O-bound tasks.

It acts purely as a pass-through layer. It accepts the WebSocket payload and immediately hands it off to the message broker, refusing to do any heavy mathematical lifting.

// Node.js / Ingestion Gateway (Micro-Batching Implementation)
let messageBuffer = [];
const BATCH_SIZE = 5000;

wss.on('connection', (ws) => {
  ws.on('message', async (message) => {
    const rawData = JSON.parse(message);
    messageBuffer.push({
      key: rawData.truck_id,
      value: JSON.stringify(rawData),
    });

    // Fire and forget: Instantly offload a batch to Kafka to free up the Event Loop
    if (messageBuffer.length >= BATCH_SIZE) {
      const batchToSend = [...messageBuffer];
      messageBuffer = [];

      await producer.send({
        topic: 'raw-telematics-data',
        messages: batchToSend,
      });

      console.log(
        `[Ingestion] 📦 Dispatched batch of ${batchToSend.length} spatial coordinates to Kafka.`
      );
    }
  });
});

2. The Shock Absorber (Apache Kafka)

If the database goes down, or if the workers are overwhelmed, we cannot afford to drop vehicle locations. Kafka sits in the middle as the system's shock absorber. It queues the massive influx of spatial data, protecting downstream databases from sudden traffic spikes (rush hours) and ensuring absolute zero data loss during worker deployments.

3. The Worker: Concurrent Processing (Golang)

Applying spatial smoothing algorithms (like the Kalman Filter) to clean up GPS multipath errors across thousands of trucks is highly CPU-bound.

Golang picks up the queued data from Kafka. Utilizing lightweight Goroutines, it processes the mathematical algorithms concurrently. If the load becomes too heavy, we simply spin up another Golang instance; Kafka's Consumer Groups automatically rebalance the partitions.

// Golang / Stream Processor & Upsert Logic
func processTelemetry(ctx context.Context, payload []byte, coll *mongo.Collection, alpha float64) {
    var rawData TelematicsData
    json.Unmarshal(payload, &rawData)

    // Apply Mathematical Smoothing (Kalman Filter logic abstracted)
    cleanData := applySpatialSmoothing(rawData, alpha)

    // Real-Time State: Upsert instead of Append-Only
    filter := bson.M{"truck_id": cleanData.TruckID}
    update := bson.M{"$set": cleanData}
    opts := options.Update().SetUpsert(true)

    // Overwrite the existing document to maintain a lightning-fast state store
    coll.UpdateOne(ctx, filter, update, opts)

    // Thread-safe atomic counter for logging
    current := atomic.AddUint64(&processedCount, 1)
    if current%500 == 0 {
        log.Printf("[Worker] ✅ Sanitized & persisted %d total trajectories", current)
    }
}

The Real-Time State Strategy

Notice the SetUpsert(true) in the Golang snippet above. Instead of generating millions of append-only rows per day—which would bloat the database—the worker constantly overwrites the existing coordinates for each specific truck_id.

This creates a lightweight, highly optimized "Real-Time State" collection in MongoDB. The frontend map application can query this collection with sub-millisecond latency to get the absolute latest fleet positions without scanning historical clutter.

📊 Performance & Resilience Benchmarks

To prove this architecture is robust under extreme conditions, I subjected the pipeline to three distinct stress tests locally.

Hardware Context: All benchmarks and chaotic simulations below were executed locally on a MacBook Pro (M1 Pro chipset) utilizing Docker Desktop to simulate the containerized microservices environment.

1. The Load Test: Scaling to 10,000 Req/Sec

Fig 2. Processing 10,000 requests per second with micro-batching.

The Goal: Prove the I/O efficiency of the Node.js Gateway.
The Execution: Using a custom load simulator, I bombarded the Gateway with 10,000 concurrent truck coordinates per second. By implementing Micro-Batching (packaging 5,000 messages before sending them to Kafka), the network overhead dropped drastically.
The Result: The local Dockerized infrastructure effortlessly absorbed the impact. As seen in the stats, Kafka efficiently handled the 5,000-message micro-batches (spiking briefly before settling), while MongoDB processed the massive influx of atomic upserts utilizing only ~2 CPU cores (approx. 200% in Docker Mac) without a single crash or OOM error.

2. The Chaos Test: Fault Tolerance in Action

Fig 3. Intentional worker outage simulated via Chaos Testing.

The Goal: Prove that the architecture survives a partial outage without dropping a single packet of data.
The Execution: While the system was processing maximum load, I intentionally killed the Golang Worker node (Ctrl + C) to simulate a critical server crash.
The Result: The Node.js Gateway remained completely unaffected. It continued to receive data and safely buffered it into Kafka. Once the Golang worker was brought back online 15 seconds later, it instantly vacuumed up the queued backlog at lightning speed. Zero data dropped.

3. The Data Sanitization Test: Delivering Business Value

The Goal: Prove that the backend provides clean, usable data for frontend dashboards.
The Result: Without the Golang processor, the system would ingest "jittery" coordinates causing trucks to visually jump across maps.

Metric	Before (Raw Ingestion)	After (Golang + Kalman Filter)
Data Quality	Erratic, bouncing coordinates	Smooth, continuous trajectory
Business Impact	Triggers false alarms constantly	High ETA accuracy for logistics planning

Engineering Trade-offs

No architecture is a silver bullet. This system embraces the following trade-offs:

Eventual Consistency over Strict Real-time: By introducing Kafka as a buffer, there is a sub-second latency from the time a truck emits a signal to when it appears in the database. This is a deliberate and acceptable trade-off to guarantee database stability under massive loads.
Operational Complexity over Monolithic Simplicity: Managing a polyglot microservices stack (Node.js, Go, Kafka) increases deployment complexity. I mitigated this by fully containerizing the infrastructure using Docker Compose.
Throughput over Absolute Persistence at the Edge: To achieve 10,000 req/sec, the Node.js Gateway batches payloads in local memory (messageBuffer) before flushing to Kafka. This introduces a micro-window of potential data loss (up to 4,999 records) if the Node process suffers a catastrophic crash exactly before a flush. In high-velocity telemetry, dropping a fraction of a second of data is a calculated and acceptable trade-off to prevent severe disk I/O bottlenecks.

Dive Deeper

Interested in the nitty-gritty configuration of the Kafka brokers, the Micro-batching implementation, or the Go channel setup? The entire project, complete with a local RAM-diet Docker setup and a load simulator, is available for review.

Explore the Argos Data Pipeline repository on GitHub ↗