4 min read

Why Data Engineers Use NDJSON — and How to Visualize It Without Writing Code

If you've worked with event streaming, log aggregation, or API exports recently, you've encountered NDJSON. It appears in Kafka consumer outputs, Elasticsearch dump files, Segment event logs, and a dozen other data pipeline outputs that data engineers deal with every day.

NDJSON is an excellent format for machines. It's terrible for humans who need to actually understand what the data says.

This article explains why NDJSON is so prevalent in modern data infrastructure, what makes it difficult to analyze manually, and how data engineers and analysts are using browser-native tools to go from raw NDJSON files to interactive dashboards — without writing a single Python script.

What is NDJSON, and why do pipelines produce it?

NDJSON — Newline Delimited JSON, also known as JSON Lines or JSONL — is a text format where each line is a self-contained, valid JSON object:

{"event":"page_view","user_id":"u_104","page":"/pricing","ts":"2025-05-14T09:21:00Z","session":"s_88"}
{"event":"button_click","user_id":"u_104","element":"cta_hero","ts":"2025-05-14T09:21:15Z","session":"s_88"}
{"event":"form_submit","user_id":"u_104","form":"waitlist","ts":"2025-05-14T09:21:44Z","session":"s_88"}

The format is used heavily in streaming contexts because it is appendable: new events can be written to the end of a file by multiple producers simultaneously without breaking the structure of the document. Standard JSON arrays cannot be appended to without rewriting the closing bracket.

NDJSON is the default output format for:

Kafka consumers writing event streams to object storage
Elasticsearch bulk export APIs
Segment and Mixpanel data exports
cloud logging services (AWS CloudWatch, GCP Cloud Logging)
web analytics platforms exporting raw event tables

The result is that virtually every modern data stack produces NDJSON files at some point — and someone has to make sense of them.

The problem: NDJSON is opaque without tooling

Inspecting a small NDJSON file with a text editor is manageable. But production NDJSON files are typically large (millions of lines), schema-inconsistent (not every event type has the same fields), and deeply nested.

The traditional toolkit for exploring NDJSON looks like this:

# Count events by type
cat events.ndjson | jq -r '.event' | sort | uniq -c

# Extract a subset
cat events.ndjson | jq 'select(.event == "form_submit")'

# Aggregate with Python
import json
from collections import Counter

with open('events.ndjson') as f:
    events = [json.loads(line) for line in f]

Counter(e['event'] for e in events)

This works. But it requires:

jq or Python in the data engineer's environment
Writing and running code for every query
No visual output — just terminal output or a notebook cell
No way to share the results with a non-technical stakeholder

For exploration and validation tasks — "what events are in this file?", "how are sessions distributed over time?", "are there any anomalous spikes in this log?" — writing code is often overkill.

Using Datastripes to explore NDJSON visually

Datastripes parses NDJSON files natively in the browser. Drop a .ndjson or .jsonl file into the interface and the platform:

Auto-detects the schema across all lines, including inconsistent or partially populated fields
Flattens nested objects into a typed column grid (dot notation for nested keys)
Groups and aggregates automatically using formula functions like =GROUP_BY() and =COUNT_BY()
Generates chart suggestions based on detected data types (timestamp → time series, categorical → bar chart)

The immediate output is a navigable, filterable grid where you can see every event in your file — plus automatic charts for the most analytically relevant dimensions.

What this changes for data engineers

The value of browser-native NDJSON exploration is not that it replaces Python for complex analysis — it doesn't and shouldn't. The value is that it makes the 80% of exploration tasks that don't require code dramatically faster.

Common workflows that benefit:

Validating a new pipeline output: "Does this Kafka consumer output look right?" is a question that takes 30 seconds to answer in Datastripes and 5 minutes with jq.

Debugging schema drift: When a field appears in some records but not others, the Datastripes grid makes it immediately visible without writing a custom schema inference script.

Handoff to non-technical teammates: A product manager who needs to understand which events fire during the onboarding flow cannot read jq output. They can read a bar chart.

Quick anomaly detection: Time series charts from a log file make spikes, drops, and gaps visible in seconds — without writing a custom aggregation query.

The deeper insight: visualization as a first-class step in data engineering

The traditional data engineering mindset treats visualization as the last step — something the "BI team" handles after the pipeline is built and the data is clean.

The reality is that visualization is most valuable earlier in the pipeline lifecycle, during validation and exploration. Catching a schema issue, a missing event type, or an unexpected data distribution is far cheaper at the exploration stage than after a dashboard has been built on top of a faulty pipeline.

Tools that make NDJSON visually explorable at the raw file level close this gap — and let data engineers work faster without adding infrastructure.

Explore your NDJSON files in Datastripes — no installation, no code, complete data privacy.