A modern connected vehicle generates between 25 and 40 gigabytes of data per hour of driving. Across a fleet of 100,000 vehicles, that translates to petabytes of raw telemetry every day. For automotive cybersecurity teams, this data is both the foundation of effective security monitoring and an engineering challenge of enormous scale. Without a well-architected telemetry ingestion pipeline, security-critical signals drown in noise, detection latency climbs from seconds to hours, and storage costs spiral out of control.

This guide provides a comprehensive architecture reference for building vehicle telemetry ingestion systems that support fleet-scale security monitoring. We cover the full data path from on-vehicle data sources through edge preprocessing, transport protocols, cloud ingestion, time-series storage, data lake architecture, retention policies, bandwidth optimization, edge-to-cloud security, and compliance with data sovereignty requirements across global markets.

Telemetry data pipeline: edge filtering, message brokering, and stream processing reduce raw fleet data from terabytes to manageable volumes Fleet N vehicles Edge Gateway Message Broker Kafka / MQTT Stream Processing Data Lake 10 TB/day 2 TB 500 GB 200 GB stored Data volume reduction across pipeline stages
Telemetry data pipeline: edge filtering, message brokering, and stream processing reduce raw fleet data from terabytes to manageable volumes.

Vehicle Telemetry Data Sources

Before designing an ingestion architecture, you must understand the diversity of data sources that feed into a vehicle security monitoring pipeline. Each source has distinct characteristics in terms of data rate, format, security relevance, and processing requirements:

CAN Bus Logs

The Controller Area Network (CAN) bus remains the primary communication backbone in most vehicles. CAN frames are small (8 bytes for classic CAN, up to 64 bytes for CAN FD) but arrive at extremely high frequency — a typical vehicle generates 2,000 to 5,000 CAN frames per second across multiple bus segments. For security monitoring, CAN bus data reveals unauthorized ECU communication, anomalous message frequencies, injected frames that do not match the vehicle’s DBC (database container) specification, and timing deviations that indicate bus-off attacks or ECU compromise. The challenge is that raw CAN logging at full rate generates approximately 10–20 MB per minute, making unfiltered transmission to the cloud infeasible for most cellular data budgets.

ECU Diagnostics and Logs

Electronic Control Units produce diagnostic trouble codes (DTCs), runtime logs, boot sequence records, and firmware version reports. Security-relevant ECU data includes unexpected reboots (potential indicator of exploitation), DTC patterns associated with tampering (such as immobilizer-related DTCs), diagnostic session activations outside of workshop contexts, and firmware version mismatches after unauthorized modification. ECU diagnostic data is typically lower frequency than CAN but higher value per event, making it well-suited for event-driven telemetry rather than continuous streaming.

Network Flows and Ethernet Traffic

Modern vehicles with Ethernet-based architectures (100BASE-T1, 1000BASE-T1) generate IP-level traffic between high-performance ECUs, domain controllers, and telematics control units. Network flow metadata — source and destination IP, port, protocol, packet count, byte count, and connection duration — is essential for detecting unauthorized network communication, lateral movement between vehicle domains, and data exfiltration attempts. Full packet capture is impractical at fleet scale, but flow summaries (similar to NetFlow or IPFIX records) provide rich security signals at manageable data volumes, typically 1–5 KB per flow record.

GPS and Location Telemetry

GPS position, heading, speed, and satellite constellation data support geofencing, route deviation detection, and correlation of security events with physical context. GPS data is also critical for detecting GPS spoofing attacks that target navigation and V2X systems. Typical GPS telemetry rates are 1–10 Hz, producing modest data volumes but requiring careful privacy handling, particularly under GDPR and Chinese data localization requirements where location data is classified as personal data requiring explicit legal basis for processing.

V2X Message Logs

Vehicles equipped with V2X capabilities generate logs of transmitted and received V2X messages (CAMs, BSMs, DENMs, SPaT, MAP), including misbehavior detection reports. V2X logs are high-frequency in dense traffic environments (potentially hundreds of received messages per second at busy intersections) but critical for fleet-level V2X threat correlation. V2X telemetry must capture the full signed message envelope, including certificate information, to support post-hoc forensic analysis and misbehavior reporting to PKI authorities.

Application and Infotainment Logs

The infotainment head unit and connected applications generate logs related to user interactions, app installations, Bluetooth and Wi-Fi connections, USB device events, and cellular modem activity. These logs provide visibility into attack vectors that target the vehicle through its human-facing interfaces — malicious USB devices, rogue Bluetooth pairings, compromised companion apps, and Wi-Fi-based attacks. Data volumes vary significantly by vehicle usage pattern, but a typical infotainment system generates 5–50 MB of logs per driving session.

Edge Preprocessing and Filtering

Transmitting raw telemetry from every vehicle in a fleet is neither economically feasible nor analytically useful. The edge preprocessing layer running on the vehicle’s telematics control unit (TCU) or a dedicated security gateway ECU is the single most important architectural component for achieving fleet scale. Effective edge preprocessing reduces transmitted data volume by 90–99% while preserving security-critical signals.

On-Vehicle Filtering Strategies

The first line of data reduction is filtering at the source. Not all CAN frames are security-relevant: periodic status messages from the climate control module or seat position actuators carry no security information and should be excluded from the security telemetry stream. A well-designed filter configuration selects only the CAN arbitration IDs that map to security-relevant ECU communications — powertrain, chassis, ADAS, gateway, and telematics domains. This filter alone typically reduces CAN telemetry volume by 60–80%.

Edge Aggregation and Summarization

Rather than transmitting every individual CAN frame, the edge layer can aggregate data into statistical summaries over configurable time windows (typically 1–60 seconds). For each monitored CAN ID, the edge computes message frequency, mean and standard deviation of signal values, min/max bounds, and a delta indicator showing whether values have changed from the previous window. These summaries capture anomalies (unexpected frequency, out-of-range values, sudden state changes) without transmitting the underlying high-frequency stream. Network flow data benefits from similar aggregation: instead of individual flow records, the edge produces periodic flow tables summarizing active connections, new connections, and terminated connections per time window.

Event-Driven Telemetry

The most bandwidth-efficient approach is event-driven telemetry, where the edge only transmits data when a security-relevant event is detected locally. On-vehicle IDS (intrusion detection systems) and anomaly detection engines process raw data in real time and generate structured security events — alert type, severity, affected ECU or network segment, timestamp, and a context snapshot of the surrounding telemetry data. Event-driven telemetry reduces baseline data transmission to near zero during normal operation, with bursts during security incidents. The trade-off is that fleet-level analytics that require continuous baseline data (such as population-based anomaly detection) may need a supplementary low-rate periodic reporting channel.

Configurable Edge Policies

Edge preprocessing must be configurable from the cloud without requiring a firmware update to the TCU. A policy engine on the vehicle accepts JSON or protobuf-encoded filter configurations that specify which data sources to monitor, aggregation window sizes, event detection thresholds, and reporting frequencies. This enables the fleet security team to dynamically adjust telemetry collection in response to emerging threats — for example, increasing CAN bus logging granularity across all vehicles of a specific model when a new vulnerability is disclosed, or enabling full V2X message capture in a geographic region experiencing a misbehavior campaign.

The edge is not just a data reduction layer — it is the first security sensor. The best telemetry architectures treat the vehicle as an active detection endpoint, not a passive data source.

Transport Protocols: MQTT, AMQP, and gRPC

The choice of transport protocol between the vehicle and cloud ingestion layer has profound implications for reliability, latency, bandwidth efficiency, and operational complexity. Three protocols dominate the automotive telemetry landscape:

MQTT (Message Queuing Telemetry Transport)

MQTT is the most widely adopted protocol for vehicle-to-cloud telemetry. Its lightweight binary protocol, publish/subscribe messaging model, and three quality-of-service (QoS) levels make it well-suited for the constrained and unreliable cellular connections that vehicles operate on. MQTT’s small packet overhead (as low as 2 bytes for the fixed header) minimizes bandwidth consumption. QoS 1 (at-least-once delivery) provides a good balance between reliability and performance for security telemetry — duplicate messages are tolerable and easily deduplicated, while message loss is not. MQTT 5.0 adds features particularly valuable for fleet telemetry: topic aliases reduce per-message overhead for vehicles publishing to the same topics repeatedly, shared subscriptions enable load-balanced consumption across multiple backend processors, and user properties allow attaching metadata (vehicle ID, edge version, policy version) to each message without inflating the payload.

AMQP (Advanced Message Queuing Protocol)

AMQP provides stronger message delivery guarantees than MQTT, including transactional messaging and fine-grained routing capabilities through exchanges and bindings. AMQP’s richer feature set comes at the cost of higher protocol overhead and connection establishment complexity, making it less ideal for direct vehicle-to-cloud communication over cellular. However, AMQP excels as the internal messaging backbone within the cloud ingestion layer, where its routing flexibility enables sophisticated message fan-out to multiple processing pipelines (real-time detection, batch analytics, compliance archival) based on message properties. Some architectures use MQTT for the vehicle-to-cloud leg and bridge to AMQP at the cloud edge for internal distribution.

gRPC (Google Remote Procedure Call)

gRPC uses HTTP/2 as its transport and Protocol Buffers for serialization, providing efficient binary encoding, bidirectional streaming, and strong typing through protobuf schemas. gRPC’s bidirectional streaming capability is particularly attractive for telemetry use cases where the cloud needs to push configuration updates or detection rules back to the vehicle over the same connection. The HTTP/2 foundation enables multiplexing multiple logical streams over a single TCP connection, reducing connection management overhead for vehicles that report multiple independent telemetry streams. The main drawback of gRPC for vehicle telemetry is its dependency on HTTP/2, which does not degrade as gracefully as MQTT on poor cellular connections and requires more sophisticated reconnection logic.

Transport Protocol Comparison

Aspect MQTT 5.0 AMQP 1.0 gRPC / HTTP/2
Protocol Characteristics
Encoding Binary (custom) Binary (AMQP framing) Binary (Protocol Buffers)
Minimum overhead per message 2 bytes fixed header 8 bytes frame header 9 bytes HTTP/2 frame + gRPC header
Messaging model Pub/Sub Queues, exchanges, bindings Unary and streaming RPC
Delivery guarantees QoS 0/1/2 (at most / at least / exactly once) At most / at least / exactly once, transactional At most once (application-level retry)
Fleet Telemetry Suitability
Cellular-friendly Excellent (low overhead, keep-alive) Moderate (higher connection cost) Good (HTTP/2 multiplexing)
Reconnection handling Built-in session resumption Link recovery with state Application-level reconnect
Bidirectional communication Pub/Sub on separate topics Full bidirectional links Native bidirectional streaming
Schema enforcement None (payload-agnostic) Content-type header only Strong (protobuf schemas)
Operational
Broker/server options EMQX, HiveMQ, Mosquitto, AWS IoT Core RabbitMQ, Azure Service Bus, Qpid Custom server (Envoy, gRPC server)
Fleet-scale maturity Proven at millions of devices Common in enterprise; less in IoT Growing adoption in automotive
Best fit in architecture Vehicle-to-cloud edge Cloud internal message routing Vehicle-to-cloud or service-to-service

Cloud Ingestion Patterns

Once telemetry reaches the cloud boundary, the ingestion layer must absorb bursts from hundreds of thousands of simultaneously connected vehicles, validate and enrich incoming data, and route it to the appropriate downstream processing pipelines. Two fundamental patterns dominate:

Streaming Ingestion

Streaming ingestion processes telemetry records individually or in micro-batches as they arrive, with end-to-end latency measured in seconds. This pattern is essential for real-time security detection — an anomalous CAN bus event detected on a vehicle must reach the fleet-level correlation engine within seconds to enable timely response. Streaming ingestion typically uses Apache Kafka or a managed equivalent (Amazon MSK, Azure Event Hubs, Confluent Cloud) as the central event backbone. Kafka’s partitioned log provides natural parallelism (partition by vehicle ID or fleet segment), configurable retention for replay capability, and exactly-once semantics for critical detection pipelines. Stream processing engines such as Apache Flink or Kafka Streams consume from Kafka topics to perform real-time aggregation, enrichment (joining telemetry with vehicle metadata, fleet configuration, and threat intelligence), and detection rule evaluation.

Batch Ingestion

Batch ingestion accumulates telemetry data over larger time windows (minutes to hours) and processes it in bulk. Batch processing is appropriate for analytics that do not require real-time latency: population-level baseline computation, historical trend analysis, compliance reporting, and ML model training. Batch telemetry is typically landed in object storage (S3, Azure Blob, GCS) in columnar formats (Parquet, ORC) partitioned by date, fleet segment, and data type. Batch processing engines (Apache Spark, Trino, DuckDB) query these partitioned datasets efficiently for ad-hoc investigation and scheduled analytics workloads.

Lambda and Kappa Architectures

Many fleet telemetry platforms adopt a hybrid approach. The Lambda architecture maintains parallel streaming and batch layers with a serving layer that merges results. While conceptually clean, Lambda architectures suffer from the operational burden of maintaining two separate processing codebases that must produce consistent results. The Kappa architecture simplifies this by using the streaming layer as the single processing path, with Kafka’s long retention enabling replay for historical reprocessing. For security monitoring specifically, we recommend a Kappa-inspired architecture with Kafka as the central backbone, Flink for real-time detection, and scheduled Spark jobs that read directly from the data lake for batch analytics — avoiding the complexity of dual processing paths while maintaining the flexibility to reprocess historical data when new detection rules are deployed.

Time-Series Storage

Vehicle telemetry is inherently time-series data: every data point is associated with a timestamp and a vehicle identifier. Choosing the right time-series storage strategy directly impacts query performance, storage cost, and the types of analytics your security team can perform.

Dedicated Time-Series Databases

Purpose-built time-series databases (TimescaleDB, InfluxDB, QuestDB, Apache IoTDB) are optimized for high-rate ingestion and time-range queries. They provide automatic time-based partitioning, compression algorithms tuned for time-series patterns (delta encoding, run-length encoding, dictionary compression), and built-in downsampling and retention policies. For vehicle security monitoring, time-series databases excel at storing pre-aggregated metrics (CAN message rates, network flow volumes, ECU health indicators) that support dashboard visualization and threshold-based alerting. TimescaleDB on PostgreSQL is particularly attractive because it combines time-series performance with full SQL query capability, enabling security analysts to write complex correlation queries without learning a specialized query language.

Data Lake with Columnar Storage

For raw telemetry storage at fleet scale, a data lake built on object storage with columnar file formats provides the best balance of cost and query performance. Apache Parquet partitioned by date/fleet_id/data_type enables efficient time-range and fleet-segment queries while achieving 5–10x compression ratios compared to raw JSON or CSV. Table formats such as Apache Iceberg or Delta Lake add ACID transactions, schema evolution, time travel (querying historical snapshots), and efficient upserts on top of Parquet files, which is essential for maintaining a consistent and queryable telemetry archive. Iceberg in particular supports partition evolution — changing the partition scheme without rewriting existing data — which is valuable as fleet segmentation strategies evolve over time.

Tiered Storage Strategy

A practical fleet telemetry architecture uses tiered storage to balance performance, cost, and retention requirements:

  • Hot tier (0–7 days): Recent telemetry in a time-series database or Kafka with long retention, supporting real-time dashboards, active investigation, and streaming detection. Stored on SSD-backed infrastructure with sub-second query latency.
  • Warm tier (7–90 days): Intermediate-term telemetry in Parquet/Iceberg on standard object storage, supporting retrospective investigation and trend analysis. Query latency in seconds to low minutes using Spark or Trino.
  • Cold tier (90 days–7 years): Long-term archival in compressed Parquet on infrequent-access storage classes (S3 Glacier, Azure Cool), supporting compliance retention requirements and rare forensic investigations. Query latency in minutes to hours with restore-before-query semantics.

Data Lake Architecture for Security Analytics

A well-designed data lake for vehicle security monitoring goes beyond simple telemetry storage. It must support the diverse analytical workloads that a fleet security team depends on: real-time threat detection, historical investigation, population baseline computation, ML model training, compliance reporting, and forensic evidence packaging.

Medallion Architecture

The medallion (bronze/silver/gold) architecture provides a proven data lake organizational pattern for fleet telemetry:

  • Bronze layer: Raw telemetry as received from vehicles, stored in its original format with minimal transformation. The bronze layer is the system of record and must be immutable — raw data is never modified or deleted except by retention policy. This layer supports forensic investigation where the analyst needs to see exactly what the vehicle reported.
  • Silver layer: Cleaned, validated, and enriched telemetry. Schema enforcement, deduplication, timestamp normalization, and enrichment with vehicle metadata (model, ECU inventory, software version, fleet assignment) are applied. The silver layer is the primary query target for security analysts and detection rule evaluation.
  • Gold layer: Pre-aggregated metrics, materialized views, and analysis-ready datasets. Population baselines (normal CAN message rates by vehicle model and driving context), fleet health scorecards, and compliance report datasets live in the gold layer. ML feature stores that power anomaly detection models are also maintained at this level.

Schema Management

Vehicle telemetry schemas are not static. New ECUs are added with software updates, CAN matrix definitions change across model years, and new telemetry sources (V2X, Ethernet) are introduced over time. The data lake must handle schema evolution gracefully. Apache Iceberg’s schema evolution support — adding, renaming, and reordering columns without rewriting existing data — is essential. A centralized schema registry (such as Confluent Schema Registry or AWS Glue Schema Registry) enforces compatibility between producers (vehicles) and consumers (processing pipelines), preventing breaking changes from silently corrupting downstream analytics.

Retention Policies and Compliance

Data retention for vehicle telemetry must balance security investigation needs, regulatory requirements, storage cost, and privacy obligations. These requirements often conflict, and the retention policy must navigate them explicitly.

Security Investigation Requirements

Effective incident investigation requires access to telemetry data from weeks or months before the incident was detected. Advanced persistent threats in vehicle fleets may operate for extended periods before triggering a detection. A minimum of 90 days of full-fidelity silver-layer telemetry is recommended for investigation support, with 1–2 years of aggregated gold-layer metrics for trend analysis and baseline comparison.

Regulatory Retention Requirements

UNECE R155 requires OEMs to monitor and respond to cybersecurity threats “throughout the lifetime of the vehicle type.” While R155 does not specify a minimum data retention period, the practical implication is that OEMs must be able to demonstrate that their monitoring was operational and effective at any point during the vehicle’s production lifecycle. Chinese GB/T 40861 (vehicle data security) specifies that vehicle operation data must be retained for at least two years within China. The EU General Data Protection Regulation (GDPR) imposes the opposite pressure: personal data (which includes location telemetry and any data linked to a specific vehicle owner) must not be retained longer than necessary for the stated purpose. The tension between UNECE R155’s indefinite monitoring obligation and GDPR’s data minimization principle requires careful architectural design, typically involving pseudonymization of telemetry data at the silver layer so that security analytics can operate on pseudonymized datasets while personal data mappings are stored separately with shorter retention and stronger access controls.

Bandwidth Optimization Techniques

Cellular data cost is often the single largest recurring expense in a fleet telemetry architecture. At scale, every kilobyte saved per vehicle per day translates to meaningful annual cost reduction. The following techniques are listed in order of typical impact:

  1. Edge filtering and aggregation: As discussed in the edge preprocessing section, reducing the data before it leaves the vehicle is the highest-impact optimization. A well-tuned edge policy achieves 90–99% data reduction compared to raw telemetry streaming.
  2. Binary serialization: Replacing JSON telemetry payloads with Protocol Buffers, FlatBuffers, or CBOR reduces payload size by 50–80%. Protobuf is the most common choice due to its strong tooling ecosystem and backward-compatible schema evolution. For extremely bandwidth-constrained environments, custom binary encodings tailored to the specific telemetry schema can achieve even higher compression ratios.
  3. Transport-layer compression: MQTT 5.0 does not natively support payload compression, but application-level compression using zstd or LZ4 on serialized payloads before MQTT publication typically achieves 40–60% additional size reduction. gRPC supports built-in gzip compression on HTTP/2 streams. The CPU cost of compression on the vehicle TCU must be evaluated against the bandwidth savings — zstd at low compression levels (1–3) provides excellent compression ratios with minimal CPU overhead.
  4. Delta encoding: For slowly-changing telemetry signals (ECU software versions, configuration states, network topology), transmitting only the delta from the last reported state eliminates redundant data. Delta encoding is most effective for structured data where the majority of fields remain unchanged between reports.
  5. Adaptive reporting frequency: Instead of fixed reporting intervals, the edge dynamically adjusts reporting frequency based on the vehicle’s current state. A parked vehicle with no ignition reports once per hour; a driving vehicle in normal conditions reports every 30 seconds; a vehicle experiencing a security event reports in real time. This adaptive approach matches data volume to actual security need.
  6. Opportunistic offload: Non-urgent telemetry data (batch diagnostics, historical logs, low-priority metrics) is queued on the vehicle and transmitted opportunistically when the vehicle connects to Wi-Fi (at home, at a dealer, or at a charging station) rather than over cellular. This offload strategy can reduce cellular data consumption by 30–50% for data categories that tolerate multi-hour latency.

Edge-to-Cloud Security

The telemetry pipeline itself is a high-value target. An attacker who can inject, modify, or suppress telemetry data can blind the fleet security operations center. Securing the edge-to-cloud data path is essential for maintaining the integrity and trustworthiness of security monitoring.

Transport Layer Security

All vehicle-to-cloud communication must be encrypted using TLS 1.3. The vehicle TCU authenticates the cloud ingestion endpoint using pinned server certificates or a dedicated PKI trust chain separate from the public web CA ecosystem. Mutual TLS (mTLS) provides the strongest authentication model: each vehicle presents a unique client certificate during the TLS handshake, enabling the cloud endpoint to verify vehicle identity before accepting any telemetry data. Vehicle client certificates should be provisioned during manufacturing and stored in a hardware security module (HSM) or trusted platform module (TPM) on the TCU.

Payload Integrity and Authenticity

Beyond transport encryption, each telemetry payload should be signed by the vehicle using a key stored in the HSM/TPM. This provides end-to-end integrity verification independent of the transport layer — if the TLS connection terminates at a load balancer or MQTT broker, the payload signature ensures that the data has not been modified in transit or by intermediate infrastructure. The signature also provides non-repudiation for forensic purposes: a signed telemetry record can be presented as evidence that a specific vehicle reported specific data at a specific time.

Anti-Replay and Freshness

Telemetry messages must include monotonically increasing sequence numbers and cryptographically validated timestamps to prevent replay attacks where previously captured telemetry is reinjected to mask ongoing malicious activity. The cloud ingestion layer maintains per-vehicle sequence state and rejects messages with duplicate or out-of-order sequence numbers, logging any replay attempts as security events in their own right.

Secure Key Lifecycle

Vehicle telemetry keys must support rotation without service interruption. A key rotation protocol pre-provisions the next key before the current key expires, with a configurable overlap period during which both keys are accepted. Key revocation must be possible for compromised vehicles, with the revocation taking effect within a bounded time window (typically under one hour for security-critical fleets). The key management backend must be geographically distributed to ensure that vehicles in any market can complete key operations without cross-region latency dependencies.

Reference Architecture

The following table illustrates the end-to-end telemetry ingestion architecture from vehicle to data lake. Each layer is described with its primary components, function, and recommended technologies:

Layer Components Function Key Technologies
Vehicle Edge
Data Collection CAN gateway, Ethernet tap, ECU log agents, GPS receiver Capture raw telemetry from all vehicle data sources AUTOSAR adaptive, Linux IPC
Edge Processing IDS engine, aggregator, filter, buffer Detect local anomalies, reduce data volume, buffer during disconnection Custom C/Rust, SQLite buffer
Transport Client MQTT client, TLS stack, credential store Establish secure connection, transmit telemetry, receive policy updates MQTT 5.0, TLS 1.3, HSM
Cloud Edge
Ingestion Gateway MQTT broker cluster, load balancer, auth service Terminate vehicle connections, authenticate, validate, route messages EMQX/HiveMQ, mTLS, OAuth
Event Backbone Kafka cluster, schema registry Buffer, partition, and distribute telemetry to consumers Apache Kafka, Confluent SR
Processing
Stream Processing Flink jobs, detection rules, enrichment Real-time anomaly detection, fleet correlation, alert generation Apache Flink, custom rules
Batch Processing Spark jobs, ML training, report generation Baseline computation, model training, compliance reporting Apache Spark, MLflow
Storage
Hot Storage TimescaleDB, Redis cache Real-time dashboards, active investigations, recent alerts TimescaleDB, Redis
Data Lake Iceberg tables on object storage Bronze/silver/gold layers, full telemetry archive Apache Iceberg, S3/Blob/GCS
Cold Archive Compressed Parquet on archive storage Long-term compliance retention, rare forensic access S3 Glacier, Azure Cool

Data Sovereignty and Regional Compliance

Vehicle fleets operate globally, but data sovereignty laws require that certain categories of telemetry data remain within specific geographic boundaries. Ignoring these requirements exposes the OEM to regulatory penalties and market access restrictions.

GDPR (European Union)

GDPR classifies vehicle telemetry linked to an identifiable person (via VIN, license plate, or account association) as personal data. Processing requires a lawful basis (typically legitimate interest for security monitoring or consent for telemetry analytics). Data minimization requires that only telemetry necessary for the stated security purpose is collected. The right to erasure must be implementable — the architecture must support deleting a specific vehicle’s telemetry from all storage tiers upon owner request, which is challenging in immutable data lake architectures and requires techniques such as crypto-shredding (encrypting per-vehicle data with a per-vehicle key that can be deleted to render the data unrecoverable).

China Data Localization

China’s Cybersecurity Law, Data Security Law, Personal Information Protection Law (PIPL), and the specific automotive regulation GB/T 40861 collectively require that vehicle telemetry generated in China is stored and processed within China. Cross-border transfer of “important data” (which includes precise vehicle location data, road surface data, and data about critical infrastructure) requires a government security assessment that is difficult to obtain in practice. This means fleet telemetry architectures must include a China-local deployment — a complete ingestion, processing, and storage stack operating within Chinese cloud infrastructure (Alibaba Cloud, Tencent Cloud, Huawei Cloud) with no telemetry replication to non-Chinese regions. Security monitoring dashboards and alert management can be accessed from outside China if the dashboard only displays aggregated, non-personal analytics rather than raw telemetry.

Multi-Region Architecture Pattern

For global OEMs, the recommended approach is a regional hub architecture: each major regulatory region (EU, China, North America, and optionally South Korea and Japan) operates an independent telemetry ingestion and storage stack. A global analytics layer receives only anonymized, aggregated metrics from each regional hub — fleet health scores, alert counts by category, detection rule performance metrics — that do not constitute personal data or important data under any regional regulation. This enables global fleet security management while respecting data sovereignty. The architectural trade-off is increased operational complexity: each regional hub must be independently deployed, monitored, upgraded, and capacity-planned.

How SentraX Handles Fleet-Scale Telemetry

SentraX FleetConnect implements the architecture patterns described in this guide as an integrated, managed platform for fleet security telemetry:

Configurable Edge Agent

The SentraX edge agent deploys on the vehicle TCU or security gateway and provides configurable CAN bus monitoring, Ethernet flow summarization, ECU log collection, and on-vehicle IDS with cloud-managed detection rules. Edge policies are pushed via the bidirectional MQTT 5.0 channel, enabling the fleet security team to adjust telemetry collection in real time without OTA firmware updates. The edge agent includes a persistent store-and-forward buffer that ensures no security events are lost during cellular connectivity gaps.

Managed Ingestion Pipeline

SentraX operates a multi-region ingestion infrastructure with MQTT broker clusters in the EU, North America, and China, each connected to a regional Kafka backbone. Schema validation, telemetry enrichment, and stream processing are handled by managed Flink jobs that evaluate SentraX detection rules and customer-defined custom rules in real time. The platform automatically scales ingestion capacity based on fleet size and telemetry volume, eliminating capacity planning burden from the customer.

Integrated Data Lake

SentraX maintains a medallion-architecture data lake per customer per region, with configurable retention policies that balance security investigation needs, compliance requirements, and storage cost. The platform provides SQL-based query access to all storage tiers, enabling security analysts to investigate incidents, compute baselines, and generate compliance reports without managing infrastructure. Crypto-shredding support enables GDPR-compliant per-vehicle data erasure across all storage tiers.

Key Takeaways

  • Vehicle telemetry volume requires aggressive edge preprocessing to achieve fleet scale — raw streaming is neither economically feasible nor analytically useful.
  • MQTT 5.0 is the strongest protocol choice for vehicle-to-cloud telemetry due to its low overhead, QoS flexibility, and proven scale in IoT deployments.
  • A Kappa-inspired architecture with Kafka as the central backbone provides the best balance of real-time detection capability and historical reprocessing flexibility.
  • Tiered storage (hot/warm/cold) with a medallion data lake organization enables cost-effective retention from real-time dashboards to multi-year compliance archives.
  • Edge-to-cloud security must include mTLS, payload signing, and anti-replay mechanisms to ensure telemetry integrity and trustworthiness.
  • Data sovereignty requirements (GDPR, China data localization) mandate regional telemetry infrastructure with no cross-border raw data replication.
  • The edge is not just a data reduction layer — it is the first security sensor in the fleet monitoring architecture.
  • Bandwidth optimization through binary serialization, compression, delta encoding, and adaptive reporting can reduce cellular data costs by 80–95% compared to naive approaches.

Ingest Fleet Telemetry at Scale with SentraX

SentraX FleetConnect provides a managed, multi-region telemetry ingestion pipeline with configurable edge agents, real-time stream processing, and integrated data lake storage.

Explore SentraX