QuantDB Architecture Deep Dive: How We Achieved 98% Performance Improvement¶
Published: January 11, 2025 | Author: QuantDB Team | Category: Technical Deep Dive
๐ฏ Core Problem¶
In quantitative trading and financial data analysis, data retrieval is often the performance bottleneck. While AKShare is powerful, it has obvious performance issues:
- Long Response Times: Single requests typically take 1-2 seconds
- Duplicate Requests: Repeated retrieval of the same data wastes resources
- Network Dependency: Every request requires network access, offline environments don't work
- Slow Batch Processing: Large numbers of API calls lead to poor overall efficiency
QuantDB achieves 98.1% performance improvement through intelligent caching architecture while maintaining 100% AKShare API compatibility.
๐๏ธ Overall Architecture Design¶
Layered Architecture¶
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ User Interface Layer (qdb.*) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Service Layer โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Stock Serviceโ โCache Serviceโ โ Real Serviceโ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Cache Layer โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ SQLite Cacheโ โ Memory Cacheโ โ TTL Manager โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Data Layer โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ AKShare โ โTrading Cal. โ โ Data Models โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Core Components¶
- User Interface Layer: Provides fully AKShare-compatible APIs
- Service Layer: Business logic processing and cache strategy management
- Cache Layer: Multi-level caching mechanism including SQLite persistence and memory cache
- Data Layer: AKShare data source and trading calendar integration
๐ง Smart Caching Strategy¶
1. Trading Calendar-Aware Caching¶
QuantDB integrates real trading calendars to implement intelligent cache invalidation:
def should_update_cache(symbol: str, date: str) -> bool:
"""Determine if cache needs updating based on trading calendar"""
if not is_trading_day(date):
return False # No updates on non-trading days
if date == get_current_trading_day():
return True # Current trading day data needs updates
return not cache_exists(symbol, date)
Advantages: - Avoids invalid data requests on non-trading days - Long-term caching for historical data, smart updates for current day data - Reduces 90%+ invalid API calls
2. Incremental Data Retrieval¶
Smart detection of missing data segments, retrieving only necessary data:
def get_missing_date_ranges(symbol: str, start_date: str, end_date: str):
"""Detect missing data segments"""
cached_dates = get_cached_dates(symbol, start_date, end_date)
trading_days = get_trading_days(start_date, end_date)
missing_dates = set(trading_days) - set(cached_dates)
return optimize_date_ranges(missing_dates)
Performance Optimization: - Merges consecutive missing dates into single API calls - Avoids re-fetching already cached data - Supports partial data updates without full refresh
3. Multi-Level Caching Mechanism¶
class CacheManager:
def __init__(self):
self.memory_cache = {} # L1: Memory cache (millisecond level)
self.sqlite_cache = None # L2: SQLite cache (10ms level)
self.akshare_source = None # L3: AKShare source (1000ms level)
def get_data(self, key):
# L1: Check memory cache
if key in self.memory_cache:
return self.memory_cache[key]
# L2: Check SQLite cache
data = self.sqlite_cache.get(key)
if data:
self.memory_cache[key] = data # Backfill L1
return data
# L3: Fetch from AKShare
data = self.akshare_source.get(key)
self.sqlite_cache.set(key, data) # Store to L2
self.memory_cache[key] = data # Store to L1
return data
๐ Performance Optimization Techniques¶
1. SQLite Optimization Configuration¶
-- Performance optimization settings
PRAGMA journal_mode = WAL; -- Write-Ahead Logging mode
PRAGMA synchronous = NORMAL; -- Balance safety and performance
PRAGMA cache_size = 10000; -- Increase cache size
PRAGMA temp_store = memory; -- Store temp tables in memory
2. Database Index Strategy¶
-- Core index design
CREATE INDEX idx_stock_data_symbol_date ON stock_data(symbol, date);
CREATE INDEX idx_stock_data_date ON stock_data(date);
CREATE INDEX idx_cache_metadata_key ON cache_metadata(cache_key);
3. Batch Operation Optimization¶
def batch_insert_stock_data(data_list):
"""Optimized batch insert"""
with sqlite3.connect(db_path) as conn:
conn.execute("BEGIN TRANSACTION")
try:
conn.executemany(INSERT_SQL, data_list)
conn.execute("COMMIT")
except Exception:
conn.execute("ROLLBACK")
raise
๐ Performance Test Results¶
Test Environment¶
- Hardware: Intel i7-10700K, 16GB RAM, SSD
- Network: 100Mbps broadband
- Python: 3.9.7
Detailed Performance Data¶
Operation Type | AKShare Direct | QuantDB Cache Hit | Improvement | Cache Size |
---|---|---|---|---|
Single stock 30 days | 1,247ms | 18ms | 98.6% | ~50KB |
Batch 10 stocks | 12,340ms | 156ms | 98.7% | ~500KB |
Repeated requests | 1,180ms | 15ms | 98.7% | Cache hit |
Incremental update | 1,090ms | 45ms | 95.9% | Incremental data |
Memory Usage Optimization¶
Typical memory usage: - Base memory: ~20MB - Cache 1000 stocks: ~50MB - Memory cache: ~10MB
๐ ๏ธ Implementation Details¶
Key Code Example¶
class StockDataService:
def get_stock_data(self, symbol: str, **kwargs) -> pd.DataFrame:
# 1. Parameter normalization
params = self._normalize_params(symbol, **kwargs)
cache_key = self._generate_cache_key(params)
# 2. Check cache
cached_data = self.cache_manager.get(cache_key)
if cached_data and self._is_cache_valid(cached_data):
return cached_data['data']
# 3. Detect missing data segments
missing_ranges = self._get_missing_ranges(params)
if not missing_ranges:
return cached_data['data']
# 4. Fetch missing data
new_data = self._fetch_from_akshare(missing_ranges)
# 5. Merge and cache
merged_data = self._merge_data(cached_data, new_data)
self.cache_manager.set(cache_key, merged_data)
return merged_data
๐ฎ Future Optimization Directions¶
- Distributed Caching: Support Redis clusters for multi-instance cache sharing
- Predictive Caching: Pre-cache likely needed data based on user behavior patterns
- Compression Optimization: Implement data compression to reduce storage space
- Async Processing: Support asynchronous data retrieval for improved concurrency
๐ก Summary¶
QuantDB achieves 98%+ performance improvement while maintaining 100% AKShare compatibility through carefully designed multi-layer caching architecture. Core technologies include:
- Smart Caching Strategy: Trading calendar-based cache invalidation mechanism
- Incremental Data Retrieval: Fetch only missing data segments
- Multi-Level Caching: Efficient memory + SQLite caching system
- Performance Optimization: Database optimization, batch operations, indexing strategies
These combined technologies make quantitative trading and financial data analysis much more efficient, providing developers with production-grade data processing solutions.
Related Articles: - From AKShare to QuantDB: Complete Migration Guide - Performance Comparison Study
Technical Support: - GitHub Issues - Project Documentation