By dongwooshin in Global — 19 Mar 2026

DIP Platform TPC-DS Benchmark Test Plan

A test plan for quantitatively measuring the DIP platform's batch ELT, real-time CDC, and analytical query performance using the TPC-DS standard benchmark. It compares and validates Spark, StarRocks, OLake, and Kafka CDC pipelines.

DIP Platform TPC-DS Benchmark Test Plan

Test Objectives
Test Environment
- 2.1 Test Procedure
- 2.2 HW/SW Specifications
- 2.3 TPC-DS Data Scale Factors
- 2.4 TPC-DS Schema Overview
Test Scenarios
TPC-DS Data Generation and Loading
- 4.1 Data Generation
- 4.2 PostgreSQL Loading
- 4.3 StarRocks External Catalog Setup
Performance Measurement Tools and Automation
Test Execution Schedule
Report Template
- 7.1 Batch ELT Results Summary
- 7.2 CDC Initial Load Comparison
- 7.3 CDC Real-time Results Summary
- 7.4 Query Performance Results Summary
- 7.5 Concurrency Test Results
Pass/Fail Criteria
Risks and Considerations
References

1. Test Objectives

This plan quantitatively measures the data processing performance of the DIP platform using the TPC-DS standard benchmark. Tests are divided into three areas — batch loading, real-time CDC, and analytical queries — with the following key metrics defined for each.

Test Area	Measurement Target	DIP Components
ELT — Batch	Bulk data loading throughput	Spark → Iceberg → StarRocks (External Catalog)
ELT — Real-time (CDC)	Change data propagation latency and initial load throughput	Path A: OLake → Iceberg / Path B: Debezium → Kafka → Iceberg
Query	Analytical query response time	StarRocks (Iceberg External Catalog)

2. Test Environment

2.1 Test Procedure

The diagram below illustrates the data flow across the entire benchmark pipeline. TPC-DS data generated by DuckDB is loaded into PostgreSQL, then written to Iceberg via batch (Spark) and real-time (OLake/Kafka) paths. Analytical queries are executed through the StarRocks External Catalog.

┌──────────────────────────────────────────────────────────────────────────────┐
│                     DIP TPC-DS Benchmark Pipeline                            │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ① Data Generation    ② Source Loading      ③ Batch ELT                     │
│  ┌──────────┐        ┌──────────┐         ┌──────────────┐                  │
│  │  DuckDB  │Parquet │PostgreSQL│  JDBC   │ Spark-Iceberg│                  │
│  │ TPC-DS   │──────▶│  tpcds   │───────▶│   Batch ELT  │                  │
│  │ dsdgen() │ 24tbl  │  _sf1    │  Read   │              │                  │
│  └──────────┘        └────┬─────┘         └──────┬───────┘                  │
│                           │                       │ Write                    │
│                           │                       ▼                          │
│  ④ Query Benchmark       │               ┌──────────────┐                   │
│  ┌──────────────┐  Ext.  │               │   Iceberg    │                   │
│  │  StarRocks   │◀───────┼───────────────│  (MinIO S3)  │                   │
│  │  TPC-DS 99Q  │ Catalog│               │  Lakekeeper  │                   │
│  └──────────────┘        │               └──────────────┘                   │
│                           │                       ▲                          │
│  ⑤ CDC Real-time Load   │                       │                          │
│  ┌──────────────┐        │               ┌───────┴──────┐                   │
│  │ Path A: OLake │◀───────┤               │  Iceberg     │                   │
│  │ (Kafka-less)  │───────┼──────────────▶│  Write       │                   │
│  └──────────────┘   WAL  │               └──────────────┘                   │
│  ┌──────────────┐        │                       ▲                          │
│  │ Path B: Kafka │◀───────┘               ┌───────┴──────┐                   │
│  │ Debezium+Sink│────────────────────────▶│  Iceberg     │                   │
│  └──────────────┘                         │  Sink        │                   │
│                                           └──────────────┘                   │
└──────────────────────────────────────────────────────────────────────────────┘

2.2 HW/SW Specifications

2.2.1 Hardware (Google Cloud)

Node	Machine Type	vCPU	RAM	Boot Disk	Data Disk
rke2-node × 3	n2-standard-8	8	32 GB	100 GB pd-ssd	1,000 GB pd-ssd
pg × 1	e2-standard-8	8	32 GB	50 GB pd-ssd	1,000 GB pd-ssd

rke2-node data disk mount: /var/lib/longhorn (Longhorn distributed storage)
PostgreSQL data disk mount: /data

2.2.2 Software

Component	Version
PostgreSQL	16.x
Spark	3.5.x
StarRocks	3.5.2
Kafka	3.8.1 (KRaft, single broker)
Debezium	2.7 (PostgreSQL Source Connector)
Iceberg Sink	Tabular 0.6.19
OLake	v0.3.16 (olakego/source-postgres)
Lakekeeper	latest (Iceberg REST Catalog)

2.3 TPC-DS Data Scale Factors

Phase	Scale Factor	Data Size (Parquet)	Row Count (all 24 tables)	Purpose
SF-1	1	~264 MB	~19.6 million	Functional validation, pipeline debugging
SF-100	100	~100 GB	~billions	Primary performance measurement

SF-100 and above will be executed selectively based on available cluster resources.

2.4 TPC-DS Schema Overview

Fact tables (7): store_sales, catalog_sales, web_sales, inventory, store_returns, catalog_returns, web_returns
Dimension tables (17): customer, item, date_dim, store, warehouse, customer_demographics, etc.
Total tables: 24

3. Test Scenarios

3.1 ELT Performance — Batch ELT

Objective: Measure bulk data loading performance from PostgreSQL to Iceberg.

3.1.1 Test Path

PostgreSQL ──(Spark JDBC Read)──▶ Iceberg (MinIO S3) ──▶ StarRocks (External Catalog)

3.1.2 Metrics

Metric	Unit	Description
Total Load Time	seconds (s)	Total elapsed time to load all 24 tables
Throughput	rows/s	Rows processed per second
Table Load Time	seconds (s)	Per-table load time (especially fact tables)
Resource Utilization	%	CPU, memory, disk I/O, and network utilization

3.1.3 Execution Steps

Generate TPC-DS data (DuckDB dsdgen)
Create schema and load data into PostgreSQL (DuckDB postgres extension)
Execute Spark job — start timing
Verify load completion and validate source-target row counts
Stop timing, collect resource metrics

3.2 ELT Performance — Real-time CDC

Objective: Measure end-to-end latency for real-time change propagation from the source DB, and compare initial load throughput between the two CDC paths (OLake vs Kafka).

3.2.1 Test Paths

Path A: PostgreSQL (WAL) ──▶ OLake Worker ──▶ Iceberg (Lakekeeper)
Path B: PostgreSQL (WAL) ──▶ Debezium Source ──▶ Kafka ──▶ Iceberg Sink Connector ──▶ Iceberg (Lakekeeper)

Path	CDC Engine	Characteristics
Path A	OLake	Direct PostgreSQL → Iceberg loading without Kafka. Purpose-built for Iceberg targets with UI-based configuration
Path B	Debezium + Kafka	Kafka-based event streaming. Extensible to various targets but relatively complex to configure

3.2.2 Metrics

Metric	Unit	Description	Applicable Path
Initial Load Time	seconds (s)	Time to complete the full initial load	A, B
Initial Load Throughput	rows/s	Rows processed per second during initial load	A, B
End-to-End Latency	ms	Time from source INSERT/UPDATE to Iceberg reflection	A, B
P50 / P95 / P99 Latency	ms	Percentile-based latency	A, B
CDC Throughput	events/s	CDC events processed per second	A, B
Kafka Consumer Lag	events	Kafka consumer lag (Kafka path only)	B
Data Consistency	%	Source-target data match rate	A, B

3.2.3 Test Cases

Initial Load Comparison:

ID	Test Name	Description	Path	SF
R-01	OLake Initial Load	Full table initial load via OLake	A	100
R-02	Kafka CDC Initial Load	Snapshot-based initial load via Debezium	B	100
R-03	Throughput Comparison	Comparative analysis of initial load throughput based on R-01 and R-02	A, B	100

CDC Real-time Change Propagation:

ID	Test Name	Description	Path	Load
R-04	Single Row Latency	Latency for a single INSERT to reflect in Iceberg	A, B	1 row
R-05	Burst Insert	Total reflection time after bulk INSERT of 1M rows	A, B	1M rows
R-06	Sustained Load	Continuous INSERT (100 rows/s × 10 min)	A, B	60K rows
R-07	Mixed DML	Mixed workload: 70% INSERT / 20% UPDATE / 10% DELETE	A, B	10K rows
R-08	Peak Load	Incremental load increase until maximum throughput is reached	A, B	600K rows
R-09	Schema Evolution	Verify CDC continues to function correctly after column addition	A, B	—
R-10	Recovery Test	Validate data consistency after CDC process failure and recovery	A, B	—
R-11	Large Transaction	Process 100K changes within a single transaction	A, B	100K rows

3.2.4 Latency Measurement Method

A timestamp column (_benchmark_ts) is added to the source table, and latency is calculated as the difference between the source timestamp and the Iceberg reflection time.

-- PostgreSQL source: INSERT with timestamp
INSERT INTO store_sales_cdc (ss_sold_date_sk, ..., _benchmark_ts)
VALUES (..., clock_timestamp());

-- Iceberg target: Check reflection time (Spark SQL)
SELECT _benchmark_ts,
       current_timestamp() AS received_ts,
       (unix_timestamp(current_timestamp()) - unix_timestamp(_benchmark_ts)) * 1000 AS latency_ms
FROM lakekeeper.<namespace>.store_sales_cdc
WHERE _benchmark_ts > '2026-03-10 10:00:00';

Path A (OLake): The Iceberg namespace is automatically created by the OLake job configuration.
Path B (Kafka): The Iceberg namespace is determined by the Sink connector's dynamic routing (e.g., debezium_tpcds_sf1).

⚠️ Note: NTP time synchronization between source and target systems is essential for accurate latency measurement. Measuring from the same server is recommended whenever possible.

3.2.5 Execution Steps

Path A (OLake):

Register source in OLake UI — PostgreSQL, tpcds_sf1
Register destination — Iceberg REST Catalog + MinIO
Create job (Full Refresh + CDC, Upsert mode)
Verify initial load completion and validate source-target row counts
Execute load generation scripts (INSERT / UPDATE / DELETE)
Measure latency by checking Iceberg reflection time via Spark SQL
Collect metrics and record results

Path B (Kafka CDC):

Start Kafka + Connect cluster
Run register-connectors.sh (idempotent connector registration)
Wait for Debezium Source snapshot completion (initial load)
Validate Iceberg row counts (via StarRocks External Catalog)
Execute load generation scripts (INSERT / UPDATE / DELETE)
Measure latency by checking Iceberg reflection time
Monitor Consumer Lag via Kafka UI
Collect metrics and record results

3.3 Query Performance (Analytical Query)

Objective: Measure analytical query response times against TPC-DS data loaded into Iceberg, queried through the StarRocks External Catalog.

3.3.1 Test Target

Engine: StarRocks 3.5.2 (Iceberg External Catalog)
Query count: 99 TPC-DS queries (converted to StarRocks-compatible syntax)
Data: Iceberg tables loaded via Batch ELT or CDC

3.3.2 Query Classification

Category	Example TPC-DS Queries	Characteristics
Simple Scan/Filter	Q3, Q7, Q19, Q42, Q52, Q55	Single fact table, simple joins
Multi-way Join	Q13, Q17, Q25, Q29, Q34	Joins across 3+ tables
Aggregation/Grouping	Q1, Q6, Q11, Q27, Q46	GROUP BY, HAVING, window functions
Subquery/CTE	Q4, Q14, Q23, Q31, Q39	Complex subqueries, WITH clauses
Large-scale Join	Q64, Q72, Q78, Q80, Q82	Joins between large fact tables

3.3.3 Metrics

Metric	Unit	Description
Query Response Time	seconds (s)	Execution time per query
Geometric Mean	seconds (s)	TPC-DS standard geometric mean performance metric
Cold / Warm Run	seconds (s)	Without cache (Cold) vs with cache (Warm)
Throughput	queries/hr	Queries processed per hour (Power + Throughput test)
Concurrency	—	Performance impact as concurrent users increase
Query Profile	—	Execution plan, rows scanned, memory usage

3.3.4 Test Cases

ID	Test Name	Description	SF
Q-01	Power Test (Cold Run)	Sequential execution of 99 queries, 1st run after cache flush	100
Q-02	Warm Run	Sequential execution of 99 queries, 2nd run (cache utilized)	100
Q-03	Category Drill-down	Detailed profiling of representative queries per category	100
Q-04	Concurrency — 2 users	Query execution with 2 concurrent sessions	100
Q-05	Concurrency — 4 users	Query execution with 4 concurrent sessions	100
Q-06	Concurrency — 8 users	Query execution with 8 concurrent sessions	100
Q-07	Mixed Workload	Measure impact of concurrent batch loading during query execution	100
Q-08	Top-10 Slowest	Before/after comparison of tuning the 10 slowest queries	100

3.3.5 Execution Steps

Extract standard queries with DuckDB tpcds_queries() → Convert to StarRocks syntax
Validate StarRocks compatibility and fix incompatible queries
Flush cache (for Cold Run)
Execute queries sequentially and measure individual execution times
Collect query profiles (EXPLAIN ANALYZE)
Calculate Geometric Mean

4. TPC-DS Data Generation and Loading

4.1 Data Generation

TPC-DS data is generated using the DuckDB TPC-DS extension (dsdgen) and exported in Parquet format with ZSTD compression.

# Generate TPC-DS data (SF-1)
python benchmark/data/generate_tpcds.py 1

# Generate TPC-DS data (SF-100)
python benchmark/data/generate_tpcds.py 100

4.2 PostgreSQL Loading

Parquet files are loaded into PostgreSQL at high speed using the DuckDB postgres extension. The script is idempotent, safely replacing existing data on repeated runs.

# Load into PostgreSQL (idempotent)
python benchmark/data/load_to_postgres.py 1

4.3 StarRocks External Catalog Setup

StarRocks queries Iceberg tables directly through an External Catalog without additional data loading.

CREATE EXTERNAL CATALOG bmt_catalog
PROPERTIES (
    "type" = "iceberg",
    "iceberg.catalog.type" = "rest",
    "iceberg.catalog.uri" = "http://lakekeeper:8181/catalog",
    "iceberg.catalog.warehouse" = "iceberg",
    "aws.s3.endpoint" = "http://minio:9000",
    "aws.s3.region" = "us-east-1",
    "aws.s3.enable_path_style_access" = "true"
);

-- Example query
SELECT COUNT(*) FROM bmt_catalog.tpcds_sf1.store_sales;

5. Performance Measurement Tools and Automation

Purpose	Tool	Description
Data generation	DuckDB `dsdgen`	TPC-DS standard data generation (Python script)
Query generation	DuckDB `tpcds_queries()`	TPC-DS standard query extraction and StarRocks syntax conversion
Batch ELT	Spark + Lakekeeper	JDBC Read → Iceberg Write
CDC (Path A)	OLake	Direct PostgreSQL → Iceberg loading
CDC (Path B)	Debezium + Kafka + Iceberg Sink	Kafka-based CDC pipeline
CDC load generation	Python scripts / pgbench	INSERT / UPDATE / DELETE workloads
Query benchmark	Python + pymysql	Automated StarRocks query execution and timing
Kafka monitoring	Kafka UI (Provectus)	Consumer Lag, Topic, and Connector status monitoring
Resource collection	`sar`, `iostat`, `vmstat`	OS-level CPU / Memory / Disk I/O monitoring

6. Test Execution Schedule

Phase	Activity	Key Tasks
Step 1	Environment Setup	Infrastructure provisioning, TPC-DS data generation, PostgreSQL loading, component configuration
Step 2	Batch ELT	Execute and measure Spark → Iceberg loading performance
Step 3	Query Performance	Execute Q-01 through Q-08 (Cold/Warm Run), fix query compatibility issues
Step 4	Real-time CDC — Initial Load	Execute R-01 through R-03, compare OLake vs Kafka initial load throughput
Step 5	Real-time CDC — Change Propagation	Execute R-04 through R-11, measure latency and data consistency
Step 6	Analysis and Reporting	Consolidate results, analyze bottlenecks, produce final report

7. Report Template

The tables below serve as report templates. Actual values will be populated after test execution.

7.1 Batch ELT Results Summary

SF	Tables	Total Rows	Total Size	Load Time	Throughput
1	24	19,557,579	~264 MB	324s	57K rows/s
100	24	—	~100 GB	—	—

7.2 CDC Initial Load Comparison

Path	Method	Data	Elapsed Time	Throughput	Notes
A	OLake	10,550,966 rows (3 tbl)	125s	84K rows/s	Direct loading without Kafka
B	Kafka CDC (optimized)	19,557,579 rows (24 tbl)	~600s	~33K rows/s	Debezium + Kafka
B	Kafka CDC (default)	19,557,579 rows (24 tbl)	~1,740s	~11K rows/s	Default settings

⚠️ Path A (OLake) and Path B (Kafka CDC) target different numbers of tables (3 vs 24), so comparison should be based on per-row throughput (rows/s). For a fair comparison, the SF-100 test will target the same set of tables across both paths.

7.3 CDC Real-time Results Summary

Test	Row Count	P50 (ms)	P95 (ms)	P99 (ms)	Max (ms)	Path
Single Row	1	—	—	—	—	A, B
Burst 1M	1,000,000	—	—	—	—	A, B
Sustained	60,000	—	—	—	—	A, B
Mixed DML	10,000	—	—	—	—	A, B
Peak Load	600,000	—	—	—	—	A, B

7.4 Query Performance Results Summary

SF	Queries	Success Rate	Geometric Mean	Min	Max	Cold / Warm
1	99	100%	1.044s	0.150s	21.793s	1.044s / —
100	99	—	—	—	—	— / —

7.5 Concurrency Test Results

Concurrent Users	Geometric Mean	Throughput (queries/hr)	Performance Degradation
1	—	—	baseline
2	—	—	—
4	—	—	—
8	—	—	—

8. Pass/Fail Criteria

8.1 Batch ELT

Criteria	Threshold
Data consistency	100% source-target row count match
SF-100 load time	Under 3 hours
Throughput	Minimum 10K rows/s

8.2 Real-time CDC

Criteria	Threshold	Applicable Path
Initial load data consistency	100% source-target row count match	A, B
Initial load throughput	Confirm Path A (OLake) outperforms Path B (Kafka CDC)	A, B
Single-row E2E latency (P95)	Under 3 seconds	A, B
Sustained load latency (P95)	Under 10 seconds	A, B
CDC data consistency	100% source-target match for INSERT / UPDATE / DELETE	A, B
Post-recovery consistency	Zero data loss	A, B

8.3 Query Performance

Criteria	Threshold
SF-1 Geometric Mean	Under 5 seconds
SF-100 Geometric Mean	Under 60 seconds
4-user concurrency degradation	Within 50% of single-user baseline

Pass/fail thresholds may be adjusted based on actual hardware specifications and cluster resource availability.

9. Risks and Considerations

Risk	Impact	Mitigation
TPC-DS query incompatibility with StarRocks	Some queries cannot be measured	Manual query rewrite (Q49, Q70, Q86 already fixed)
Insufficient resources (SF-100)	Test failure or OOM	Fall back to SF-10, then evaluate resource scaling
OLake non-ASCII column name limitation	Schema conflicts with non-Latin column names	Pre-convert to ASCII column names
OLake community maturity	Potential unexpected bugs	Fall back to Kafka CDC path if issues arise
Kafka CDC configuration complexity	Setup delays	Use idempotent scripts and connector JSON templates
Debezium `tasks.max=1` constraint	Source Connector cannot parallelize	Use `snapshot.max.threads` for snapshot-phase parallelism
Kafka Sink topic discovery delay	Cannot consume new topics during initial load	Set `metadata.max.age.ms=5000` to shorten metadata refresh interval
Cross-system time synchronization	Increased latency measurement error	Enforce NTP synchronization; prefer same-server measurement
Network bottleneck	Skewed test results	Secure dedicated network or use same-host configuration

DIP Platform TPC-DS Benchmark Test Plan

Table of Contents

1. Test Objectives

2. Test Environment

2.1 Test Procedure

2.2 HW/SW Specifications

2.2.1 Hardware (Google Cloud)

2.2.2 Software

2.3 TPC-DS Data Scale Factors

2.4 TPC-DS Schema Overview

3. Test Scenarios

3.1 ELT Performance — Batch ELT

3.1.1 Test Path

3.1.2 Metrics

3.1.3 Execution Steps

3.2 ELT Performance — Real-time CDC

3.2.1 Test Paths

3.2.2 Metrics

3.2.3 Test Cases

3.2.4 Latency Measurement Method

3.2.5 Execution Steps

3.3 Query Performance (Analytical Query)

3.3.1 Test Target

3.3.2 Query Classification

3.3.3 Metrics

3.3.4 Test Cases

3.3.5 Execution Steps

4. TPC-DS Data Generation and Loading

4.1 Data Generation

4.2 PostgreSQL Loading

4.3 StarRocks External Catalog Setup

5. Performance Measurement Tools and Automation

6. Test Execution Schedule

7. Report Template

7.1 Batch ELT Results Summary

7.2 CDC Initial Load Comparison

7.3 CDC Real-time Results Summary

7.4 Query Performance Results Summary

7.5 Concurrency Test Results

8. Pass/Fail Criteria

8.1 Batch ELT

8.2 Real-time CDC

8.3 Query Performance

9. Risks and Considerations

10. References

Subscribe to PAASUP IDEAS