QuantDB Architecture Deep Dive: How We Achieved 98% Performance Improvement¶

Published: January 11, 2025 | Author: QuantDB Team | Category: Technical Deep Dive

🎯 Core Problem¶

In quantitative trading and financial data analysis, data retrieval is often the performance bottleneck. While AKShare is powerful, it has obvious performance issues:

Long Response Times: Single requests typically take 1-2 seconds
Duplicate Requests: Repeated retrieval of the same data wastes resources
Network Dependency: Every request requires network access, offline environments don't work
Slow Batch Processing: Large numbers of API calls lead to poor overall efficiency

QuantDB achieves 98.1% performance improvement through intelligent caching architecture while maintaining 100% AKShare API compatibility.

🏗️ Overall Architecture Design¶

Layered Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                    User Interface Layer (qdb.*)             │
├─────────────────────────────────────────────────────────────┤
│                    Service Layer                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │ Stock Service│  │Cache Service│  │ Real Service│         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│                    Cache Layer                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │ SQLite Cache│  │ Memory Cache│  │ TTL Manager │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│                    Data Layer                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   AKShare   │  │Trading Cal. │  │ Data Models │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘

Core Components¶

User Interface Layer: Provides fully AKShare-compatible APIs
Service Layer: Business logic processing and cache strategy management
Cache Layer: Multi-level caching mechanism including SQLite persistence and memory cache
Data Layer: AKShare data source and trading calendar integration

🧠 Smart Caching Strategy¶

1. Trading Calendar-Aware Caching¶

QuantDB integrates real trading calendars to implement intelligent cache invalidation:

def should_update_cache(symbol: str, date: str) -> bool:
    """Determine if cache needs updating based on trading calendar"""
    if not is_trading_day(date):
        return False  # No updates on non-trading days

    if date == get_current_trading_day():
        return True   # Current trading day data needs updates

    return not cache_exists(symbol, date)

Advantages: - Avoids invalid data requests on non-trading days - Long-term caching for historical data, smart updates for current day data - Reduces 90%+ invalid API calls

2. Incremental Data Retrieval¶

Smart detection of missing data segments, retrieving only necessary data:

def get_missing_date_ranges(symbol: str, start_date: str, end_date: str):
    """Detect missing data segments"""
    cached_dates = get_cached_dates(symbol, start_date, end_date)
    trading_days = get_trading_days(start_date, end_date)

    missing_dates = set(trading_days) - set(cached_dates)
    return optimize_date_ranges(missing_dates)

Performance Optimization: - Merges consecutive missing dates into single API calls - Avoids re-fetching already cached data - Supports partial data updates without full refresh

3. Multi-Level Caching Mechanism¶

class CacheManager:
    def __init__(self):
        self.memory_cache = {}      # L1: Memory cache (millisecond level)
        self.sqlite_cache = None    # L2: SQLite cache (10ms level)
        self.akshare_source = None  # L3: AKShare source (1000ms level)

    def get_data(self, key):
        # L1: Check memory cache
        if key in self.memory_cache:
            return self.memory_cache[key]

        # L2: Check SQLite cache
        data = self.sqlite_cache.get(key)
        if data:
            self.memory_cache[key] = data  # Backfill L1
            return data

        # L3: Fetch from AKShare
        data = self.akshare_source.get(key)
        self.sqlite_cache.set(key, data)   # Store to L2
        self.memory_cache[key] = data      # Store to L1
        return data

📊 Performance Optimization Techniques¶

1. SQLite Optimization Configuration¶

-- Performance optimization settings
PRAGMA journal_mode = WAL;          -- Write-Ahead Logging mode
PRAGMA synchronous = NORMAL;        -- Balance safety and performance
PRAGMA cache_size = 10000;          -- Increase cache size
PRAGMA temp_store = memory;         -- Store temp tables in memory

2. Database Index Strategy¶

-- Core index design
CREATE INDEX idx_stock_data_symbol_date ON stock_data(symbol, date);
CREATE INDEX idx_stock_data_date ON stock_data(date);
CREATE INDEX idx_cache_metadata_key ON cache_metadata(cache_key);

3. Batch Operation Optimization¶

def batch_insert_stock_data(data_list):
    """Optimized batch insert"""
    with sqlite3.connect(db_path) as conn:
        conn.execute("BEGIN TRANSACTION")
        try:
            conn.executemany(INSERT_SQL, data_list)
            conn.execute("COMMIT")
        except Exception:
            conn.execute("ROLLBACK")
            raise

📈 Performance Test Results¶

Test Environment¶

Hardware: Intel i7-10700K, 16GB RAM, SSD
Network: 100Mbps broadband
Python: 3.9.7

Detailed Performance Data¶

Operation Type	AKShare Direct	QuantDB Cache Hit	Improvement	Cache Size
Single stock 30 days	1,247ms	18ms	98.6%	~50KB
Batch 10 stocks	12,340ms	156ms	98.7%	~500KB
Repeated requests	1,180ms	15ms	98.7%	Cache hit
Incremental update	1,090ms	45ms	95.9%	Incremental data

Memory Usage Optimization¶

Typical memory usage: - Base memory: ~20MB - Cache 1000 stocks: ~50MB - Memory cache: ~10MB

🛠️ Implementation Details¶

Key Code Example¶

class StockDataService:
    def get_stock_data(self, symbol: str, **kwargs) -> pd.DataFrame:
        # 1. Parameter normalization
        params = self._normalize_params(symbol, **kwargs)
        cache_key = self._generate_cache_key(params)

        # 2. Check cache
        cached_data = self.cache_manager.get(cache_key)
        if cached_data and self._is_cache_valid(cached_data):
            return cached_data['data']

        # 3. Detect missing data segments
        missing_ranges = self._get_missing_ranges(params)
        if not missing_ranges:
            return cached_data['data']

        # 4. Fetch missing data
        new_data = self._fetch_from_akshare(missing_ranges)

        # 5. Merge and cache
        merged_data = self._merge_data(cached_data, new_data)
        self.cache_manager.set(cache_key, merged_data)

        return merged_data

🔮 Future Optimization Directions¶

Distributed Caching: Support Redis clusters for multi-instance cache sharing
Predictive Caching: Pre-cache likely needed data based on user behavior patterns
Compression Optimization: Implement data compression to reduce storage space
Async Processing: Support asynchronous data retrieval for improved concurrency

💡 Summary¶

QuantDB achieves 98%+ performance improvement while maintaining 100% AKShare compatibility through carefully designed multi-layer caching architecture. Core technologies include:

Smart Caching Strategy: Trading calendar-based cache invalidation mechanism
Incremental Data Retrieval: Fetch only missing data segments
Multi-Level Caching: Efficient memory + SQLite caching system
Performance Optimization: Database optimization, batch operations, indexing strategies

These combined technologies make quantitative trading and financial data analysis much more efficient, providing developers with production-grade data processing solutions.

Technical Support: - GitHub Issues - Project Documentation