Migration Guide
This guide helps you migrate existing NumPy and Pandas code to HPCSeries Core for better performance.
From NumPy
Basic Reductions
NumPy → HPCSeries (2-5x faster, drop-in replacement):
# NumPy
np.sum(x)
np.mean(x)
np.var(x)
np.std(x)
np.min(x)
np.max(x)
# HPCSeries (same signature)
hpcs.sum(x)
hpcs.mean(x)
hpcs.var(x)
hpcs.std(x)
hpcs.min(x)
hpcs.max(x)
Quantiles & Median
# NumPy
np.median(x)
np.percentile(x, 25)
# HPCSeries
hpcs.median(x)
hpcs.quantile(x, 0.25) # Note: 0-1 scale, not 0-100
Z-Score Normalization
# NumPy (multi-pass)
z_scores = (x - x.mean()) / x.std()
# HPCSeries (single-pass, faster)
z_scores = hpcs.zscore(x)
Axis Operations
# NumPy
col_means = np.mean(data, axis=0)
col_medians = np.median(data, axis=0)
# HPCSeries (faster)
col_means = hpcs.axis_mean(data)
col_medians = hpcs.axis_median(data)
Masked Arrays
# NumPy masked arrays
import numpy.ma as ma
masked = ma.masked_invalid(x)
mean_np = ma.mean(masked)
# HPCSeries
mask = ~np.isnan(x)
mask_int = mask.astype(np.int32)
mean_hpcs = hpcs.mean_masked(x, mask_int)
From Pandas
Rolling Operations
Pandas → HPCSeries (50-100x faster):
# Pandas (slow)
df['value'].rolling(window=50).mean()
df['value'].rolling(window=50).std()
df['value'].rolling(window=100).median()
# HPCSeries (50-100x faster)
hpcs.rolling_mean(data, window=50)
hpcs.rolling_std(data, window=50)
hpcs.rolling_median(data, window=100)
Important differences:
Return type: Pandas returns
Series(with index), HPCSeries returnsndarrayWindow alignment: Both use right-aligned windows by default
min_periods: HPCSeries always uses full window (returns NaN for first
window-1elements)
GroupBy Operations
# Pandas groupby
df.groupby('group')['value'].mean()
# HPCSeries: Use axis operations on reshaped array
data_2d = df['value'].values.reshape(n_groups, group_size)
group_means = hpcs.axis_mean(data_2d)
Hybrid Pandas + HPCSeries
Best practice: Use Pandas for data manipulation, HPCSeries for heavy computation:
# Use Pandas for I/O and data prep
df = pd.read_csv('data.csv')
df = df.dropna()
# Use HPCSeries for computation
df['rolling_mean'] = hpcs.rolling_mean(df['price'].values, window=50)
df['anomaly'] = hpcs.detect_anomalies_robust(df['price'].values)
# Use Pandas for output
df.to_csv('results.csv')
When to Use Each
- Use Pandas when you need:
Time-aware operations (resample, date offsets)
GroupBy with string keys
Join/merge operations
DataFrame manipulation (pivot, melt)
Label alignment
- Use HPCSeries when you need:
Maximum performance for numerical operations
Large-scale rolling operations (> 10K elements)
Real-time/streaming computation
Low-latency requirements (< 1ms)
Memory-efficient computation
API Compatibility Matrix
Operation |
NumPy/Pandas |
HPCSeries |
Notes |
|---|---|---|---|
|
|
|
✓ Drop-in |
|
|
|
✓ Drop-in |
|
|
|
✓ Drop-in |
|
|
|
✓ Drop-in |
|
|
|
✓ Drop-in |
|
|
|
✓ Drop-in |
|
|
|
✓ Drop-in |
|
|
|
⚠ 0-1 scale |
|
|
|
⚠ Returns array |
|
|
|
⚠ Returns array |
|
|
|
⚠ Returns array |
|
|
|
✓ axis=0 default |
|
|
|
⚠ Different API |
Common Migration Patterns
Pattern 1: Time Series Analysis
import pandas as pd
import hpcs
# Load time series
df = pd.read_csv('stock_prices.csv', parse_dates=['date'])
prices = df['close'].values
# Compute features with HPCSeries (fast)
df['sma_20'] = hpcs.rolling_mean(prices, window=20)
df['sma_50'] = hpcs.rolling_mean(prices, window=50)
df['volatility'] = hpcs.rolling_std(prices, window=20)
df['anomaly'] = hpcs.detect_anomalies_robust(prices, threshold=3.0)
Pattern 2: Feature Engineering
import pandas as pd
import hpcs
df = pd.read_csv('data.csv')
values = df['value'].values
# Multiple rolling features (fast)
df['rm_10'] = hpcs.rolling_mean(values, window=10)
df['rm_50'] = hpcs.rolling_mean(values, window=50)
df['rs_10'] = hpcs.rolling_std(values, window=10)
df['rmed_20'] = hpcs.rolling_median(values, window=20)
df['zscore'] = hpcs.zscore(values)
df['robust_zscore'] = hpcs.robust_zscore(values)
Pattern 3: Multi-Sensor Processing
import pandas as pd
import hpcs
# Load multi-sensor data
df = pd.read_csv('sensors.csv')
sensor_cols = ['sensor1', 'sensor2', 'sensor3', 'sensor4']
data_2d = df[sensor_cols].values # Shape: (n_samples, 4)
# Per-sensor statistics (vectorized)
sensor_means = hpcs.axis_mean(data_2d)
sensor_medians = hpcs.axis_median(data_2d)
sensor_mad = hpcs.axis_mad(data_2d)
# Detect anomalies per sensor
anomalies_2d = hpcs.anomaly_robust_axis(data_2d, threshold=3.0)
Migration Checklist
When migrating NumPy/Pandas code:
Check array dtype - HPCSeries uses
float64- Convert if needed:x.astype(np.float64)Ensure C-contiguous layout - Check:
x.flags['C_CONTIGUOUS']- Fix:np.ascontiguousarray(x)Extract values from Pandas - Use
.valuesto get NumPy array from Series/DataFrame - Example:df['column'].valuesHandle return types - HPCSeries returns NumPy arrays, not Pandas Series - Assign back to DataFrame if needed:
df['new_col'] = resultAdjust quantile scale - NumPy: 0-100 scale (
np.percentile(x, 25)) - HPCSeries: 0-1 scale (hpcs.quantile(x, 0.25))Handle NaN differences - Pandas
min_periodsnot supported - Use masked operations for missing data
Key Differences Summary
Return Types
Pandas: Returns
SeriesorDataFrame(preserves index/labels)HPCSeries: Returns
ndarray(raw arrays for performance)
NaN Handling
Pandas: Configurable with
min_periodsparameterHPCSeries: Always uses full window; first
(window-1)elements are NaNFor missing data: Use
*_masked()functions
Quantile Scale
NumPy:
percentile(x, 25)uses 0-100 scaleHPCSeries:
quantile(x, 0.25)uses 0-1 scale
Window Alignment
Pandas: Supports
center=True/FalseHPCSeries: Right-aligned by default (same as Pandas default)
For centered windows: Manually shift results
Performance Expectations
Typical speedups when migrating:
Operation |
Array Size |
Typical Speedup |
|---|---|---|
Basic reductions (sum, mean, std) |
1M elements |
2-5x |
Rolling mean/std |
100K elements |
50-100x |
Rolling median |
100K elements |
100-200x |
Axis operations |
1000×100 |
3-8x |
Robust statistics (median, MAD) |
1M elements |
1.5-3x |
Best speedups: Rolling operations on large datasets (> 10K elements)
Migration Strategy
Incremental Approach
You don’t need to replace everything at once:
Profile: Identify performance bottlenecks using
cProfileor timingReplace hot paths: Migrate the slowest operations first
Validate: Compare results between Pandas and HPCSeries (
np.allclose())Benchmark: Measure speedup on realistic data sizes
Gradual Migration
Combine Pandas and HPCSeries strengths:
import pandas as pd
import hpcs
# Keep Pandas for what it does best
df = pd.read_csv('data.csv')
df = df.dropna()
# Use HPCSeries for performance-critical operations
df['rolling_mean'] = hpcs.rolling_mean(df['value'].values, window=50)
# Keep Pandas for output and time-based operations
df_hourly = df.resample('1H').last()
df_hourly.to_csv('output.csv')
See Also
Examples: Examples & Tutorials - Notebook 08 has detailed migration examples
API Reference: API Reference - Complete function signatures
Performance Guide: Performance Guide - Optimization tips