Understanding StarRocks Architecture
StarRocks is an analytical database with a simple FE/BE architecture offering two modes: Shared-nothing (local storage, high performance) and Shared-data (object storage, cost-efficient). It provides excellent MPP performance and seamless horizontal scaling without external dependencies.
 
    1. Introduction to StarRocks
Why StarRocks?
StarRocks has been gaining attention in the data analytics market recently. What's the reason behind this?
StarRocks is an analytical database that pursues both simplicity and performance simultaneously. It delivers exceptional performance without complex external dependencies and enables horizontal scaling without operational downtime, which are its most distinctive features.
Core Features and Advantages
๐ฏ Simple Architecture
- Composed of only two types of components: FE(Frontend) + BE(Backend)/CN(Compute Nodes)
- Independent system with no external component dependencies
โก Exceptional Performance
- Large-scale data processing with MPP (Massively Parallel Processing) architecture
- Optimized query performance through direct local data access
๐ง Flexible Scalability
- Horizontal scaling without operational downtime
- Support for two deployment modes based on environment
2. Basic Architecture Structure
StarRocks' architecture is surprisingly simple. It consists of only three types of components.
FE (Frontend) - The Brain
Main Roles:
- ๐ Metadata Management: Store and manage metadata in memory using BDB JE
- ๐ Client Connection: Receive and process user requests
- ๐ Query Planning: Analyze SQL and generate execution plans
- โ๏ธ Scheduling: Distribute tasks to BE/CN nodes
High Availability Structure:
- Leader: Performs actual metadata modification operations
- Follower: Synchronized with Leader, can replace Leader in case of failure
- Observer: Read-only for query load balancing
FE nodes use the Raft protocol to elect leaders and synchronize data.
BE/CN - The Actual Workers
BE (Backend) - Shared-nothing Mode
- ๐พ Data Storage: Store data in local storage
- โ๏ธ SQL Execution: Execute actual queries based on FE's plans
- ๐ Data Replication: Ensure high availability through MPP structure
CN (Compute Nodes) - Shared-data Mode
- ๐งฎ Computation Only: Perform computations without data storage
- ๐ฆ Cache Management: Cache frequently used data locally
- ๐ Fast Scaling: Very fast addition/removal due to stateless nature
3. Complete Comparison: Shared-nothing vs Shared-data
One of StarRocks' biggest features is the ability to choose the optimal architecture based on the environment.
Shared-nothing: The Power of Local Storage

Structure:
- BE nodes store data in local storage
- Each node directly processes queries independently
Advantages:
- โก Ultra-fast Queries: Maximum performance through direct local data access
- ๐ก๏ธ Stability: Data replication and high availability through MPP structure
- ๐ฏ Real-time Processing: Optimized for low-latency query processing
Suitable Situations:
- When real-time dashboards or OLAP queries are critical
- Environments requiring minimal network latency
- Predictable workload patterns
Shared-data: Cloud-Native Flexibility

Structure:
- Uses object storage (AWS S3, GCS, Azure Blob, MinIO, etc.) or HDFS as storage
- CN nodes handle only caching and computation
Core Technology - Multi-tier Cache System:
Memory Cache โ Local Disk Cache โ Remote Object Storage
    โ                 โ                    โ
  Hot data        Warm data            Cold data
- Hot data: Direct access from memory (maximum performance)
- Warm data: Access from local disk cache (fast performance)
- Cold data: Load from object storage then cache (progressive performance improvement)
Advantages:
- ๐ฐ Cost Reduction: Utilize inexpensive object storage
- ๐ Resource Isolation: Independent scaling of computing and storage
- ๐ Fast Scaling: Very fast CN node addition/removal
- โ๏ธ Cloud-friendly: Optimized for cloud-native environments
Suitable Situations:
- When cost efficiency is important in cloud environments
- Environments with unpredictable workloads
- When storage and computing need to be scaled independently
When to Choose Which Mode?
| Comparison Item | Shared-nothing | Shared-data | 
|---|---|---|
| Performance | โญโญโญโญโญ (Maximum) | โญโญโญโญ (Excellent when cache hits) | 
| Cost | โญโญโญ (Medium) | โญโญโญโญโญ (Very Low) | 
| Scalability | โญโญโญ (Data redistribution required) | โญโญโญโญโญ (Instant scaling) | 
| Operational Complexity | โญโญโญโญ (Simple) | โญโญโญ (Cache management required) | 
| Cloud Friendliness | โญโญโญ (Average) | โญโญโญโญโญ (Optimal) | 
Selection Guide:
Choose Shared-nothing when:
- ๐ฏ Real-time dashboards or low-latency queries are core requirements
- ๐ Predictable workload patterns
- ๐ข On-premises environments or dedicated infrastructure
Choose Shared-data when:
- โ๏ธ Cost efficiency is important in cloud environments
- ๐ Workloads are rapidly changing or unpredictable
- ๐ Storage and computing need to be scaled independently
Conclusion
StarRocks' architecture demonstrates power within simplicity. Through two deployment modes, it provides optimal solutions for various environments and requirements.
In the next Part 2, we'll learn step-by-step how to actually deploy StarRocks in a Kubernetes environment. Connect theory to practice and experience the true value of StarRocks!