Skip to content

QuantDB Architecture Deep Dive: How We Achieved 98% Performance Improvement

Published: January 11, 2025 | Author: QuantDB Team | Category: Technical Deep Dive

๐ŸŽฏ Core Problem

In quantitative trading and financial data analysis, data retrieval is often the performance bottleneck. While AKShare is powerful, it has obvious performance issues:

  • Long Response Times: Single requests typically take 1-2 seconds
  • Duplicate Requests: Repeated retrieval of the same data wastes resources
  • Network Dependency: Every request requires network access, offline environments don't work
  • Slow Batch Processing: Large numbers of API calls lead to poor overall efficiency

QuantDB achieves 98.1% performance improvement through intelligent caching architecture while maintaining 100% AKShare API compatibility.

๐Ÿ—๏ธ Overall Architecture Design

Layered Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    User Interface Layer (qdb.*)             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    Service Layer                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚
โ”‚  โ”‚ Stock Serviceโ”‚  โ”‚Cache Serviceโ”‚  โ”‚ Real Serviceโ”‚         โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    Cache Layer                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚
โ”‚  โ”‚ SQLite Cacheโ”‚  โ”‚ Memory Cacheโ”‚  โ”‚ TTL Manager โ”‚         โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    Data Layer                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚
โ”‚  โ”‚   AKShare   โ”‚  โ”‚Trading Cal. โ”‚  โ”‚ Data Models โ”‚         โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Components

  1. User Interface Layer: Provides fully AKShare-compatible APIs
  2. Service Layer: Business logic processing and cache strategy management
  3. Cache Layer: Multi-level caching mechanism including SQLite persistence and memory cache
  4. Data Layer: AKShare data source and trading calendar integration

๐Ÿง  Smart Caching Strategy

1. Trading Calendar-Aware Caching

QuantDB integrates real trading calendars to implement intelligent cache invalidation:

def should_update_cache(symbol: str, date: str) -> bool:
    """Determine if cache needs updating based on trading calendar"""
    if not is_trading_day(date):
        return False  # No updates on non-trading days

    if date == get_current_trading_day():
        return True   # Current trading day data needs updates

    return not cache_exists(symbol, date)

Advantages: - Avoids invalid data requests on non-trading days - Long-term caching for historical data, smart updates for current day data - Reduces 90%+ invalid API calls

2. Incremental Data Retrieval

Smart detection of missing data segments, retrieving only necessary data:

def get_missing_date_ranges(symbol: str, start_date: str, end_date: str):
    """Detect missing data segments"""
    cached_dates = get_cached_dates(symbol, start_date, end_date)
    trading_days = get_trading_days(start_date, end_date)

    missing_dates = set(trading_days) - set(cached_dates)
    return optimize_date_ranges(missing_dates)

Performance Optimization: - Merges consecutive missing dates into single API calls - Avoids re-fetching already cached data - Supports partial data updates without full refresh

3. Multi-Level Caching Mechanism

class CacheManager:
    def __init__(self):
        self.memory_cache = {}      # L1: Memory cache (millisecond level)
        self.sqlite_cache = None    # L2: SQLite cache (10ms level)
        self.akshare_source = None  # L3: AKShare source (1000ms level)

    def get_data(self, key):
        # L1: Check memory cache
        if key in self.memory_cache:
            return self.memory_cache[key]

        # L2: Check SQLite cache
        data = self.sqlite_cache.get(key)
        if data:
            self.memory_cache[key] = data  # Backfill L1
            return data

        # L3: Fetch from AKShare
        data = self.akshare_source.get(key)
        self.sqlite_cache.set(key, data)   # Store to L2
        self.memory_cache[key] = data      # Store to L1
        return data

๐Ÿ“Š Performance Optimization Techniques

1. SQLite Optimization Configuration

-- Performance optimization settings
PRAGMA journal_mode = WAL;          -- Write-Ahead Logging mode
PRAGMA synchronous = NORMAL;        -- Balance safety and performance
PRAGMA cache_size = 10000;          -- Increase cache size
PRAGMA temp_store = memory;         -- Store temp tables in memory

2. Database Index Strategy

-- Core index design
CREATE INDEX idx_stock_data_symbol_date ON stock_data(symbol, date);
CREATE INDEX idx_stock_data_date ON stock_data(date);
CREATE INDEX idx_cache_metadata_key ON cache_metadata(cache_key);

3. Batch Operation Optimization

def batch_insert_stock_data(data_list):
    """Optimized batch insert"""
    with sqlite3.connect(db_path) as conn:
        conn.execute("BEGIN TRANSACTION")
        try:
            conn.executemany(INSERT_SQL, data_list)
            conn.execute("COMMIT")
        except Exception:
            conn.execute("ROLLBACK")
            raise

๐Ÿ“ˆ Performance Test Results

Test Environment

  • Hardware: Intel i7-10700K, 16GB RAM, SSD
  • Network: 100Mbps broadband
  • Python: 3.9.7

Detailed Performance Data

Operation Type AKShare Direct QuantDB Cache Hit Improvement Cache Size
Single stock 30 days 1,247ms 18ms 98.6% ~50KB
Batch 10 stocks 12,340ms 156ms 98.7% ~500KB
Repeated requests 1,180ms 15ms 98.7% Cache hit
Incremental update 1,090ms 45ms 95.9% Incremental data

Memory Usage Optimization

Typical memory usage: - Base memory: ~20MB - Cache 1000 stocks: ~50MB - Memory cache: ~10MB

๐Ÿ› ๏ธ Implementation Details

Key Code Example

class StockDataService:
    def get_stock_data(self, symbol: str, **kwargs) -> pd.DataFrame:
        # 1. Parameter normalization
        params = self._normalize_params(symbol, **kwargs)
        cache_key = self._generate_cache_key(params)

        # 2. Check cache
        cached_data = self.cache_manager.get(cache_key)
        if cached_data and self._is_cache_valid(cached_data):
            return cached_data['data']

        # 3. Detect missing data segments
        missing_ranges = self._get_missing_ranges(params)
        if not missing_ranges:
            return cached_data['data']

        # 4. Fetch missing data
        new_data = self._fetch_from_akshare(missing_ranges)

        # 5. Merge and cache
        merged_data = self._merge_data(cached_data, new_data)
        self.cache_manager.set(cache_key, merged_data)

        return merged_data

๐Ÿ”ฎ Future Optimization Directions

  1. Distributed Caching: Support Redis clusters for multi-instance cache sharing
  2. Predictive Caching: Pre-cache likely needed data based on user behavior patterns
  3. Compression Optimization: Implement data compression to reduce storage space
  4. Async Processing: Support asynchronous data retrieval for improved concurrency

๐Ÿ’ก Summary

QuantDB achieves 98%+ performance improvement while maintaining 100% AKShare compatibility through carefully designed multi-layer caching architecture. Core technologies include:

  • Smart Caching Strategy: Trading calendar-based cache invalidation mechanism
  • Incremental Data Retrieval: Fetch only missing data segments
  • Multi-Level Caching: Efficient memory + SQLite caching system
  • Performance Optimization: Database optimization, batch operations, indexing strategies

These combined technologies make quantitative trading and financial data analysis much more efficient, providing developers with production-grade data processing solutions.


Related Articles: - From AKShare to QuantDB: Complete Migration Guide - Performance Comparison Study

Technical Support: - GitHub Issues - Project Documentation