FOXAI Litepaper

A market forecasting app for major cryptocurrencies.

📋 1. Executive Summary

FOXAI is a self-developed financial AI foundation model, focusing on trend prediction and signal generation for cryptocurrency and traditional financial markets. Unlike existing time series models, FOXAI is fully self-developed from the data layer, model layer to application layer, creating a truly financial-scenario-oriented AI infrastructure.

Core Features

  • Multi-Asset Prediction: Supports trend modeling for mainstream tokens such as BTC, ETH, SOL, and more assets.

  • BTC Top Indicator: Self-developed top analysis algorithm to assist in identifying potential market tops and enhance risk control capabilities.

  • CEX Data Support: Integrated with mainstream centralized exchanges (CEX), covering multiple trading pairs and real-time market data.

  • Self-Developed Technology Stack: From Tokenizer to model architecture, all are independently developed and deeply optimized for financial markets.

  • Multi-Scale Adaptation: From lightweight research versions to enterprise-grade prediction versions, meeting different business and computational requirements.

Performance Highlights

  • In BTC/USDT 24h trend prediction, directional prediction accuracy significantly outperforms traditional LSTM.

  • In cryptocurrency market backtesting, signal generators fine-tuned based on FOXAI achieve higher Sharpe ratios.

🎯 2. Background and Pain Points

Financial market data differs from general time series and possesses the following characteristics:

  • High Noise and Non-Stationarity: Price movements are easily affected by unexpected events and macroeconomic factors, making direct modeling difficult.

  • Multi-Scale Complexity: Complex dependencies exist between minute, hourly, and daily data.

  • Unstructured Semantics: Candlestick charts carry market behavior and sentiment, but are difficult to express directly in digital form.

Existing time series models (such as LSTM, Transformer) or general large models often face the following challenges when applied to financial markets:

  • Insufficient generalization capability, prone to overfitting historical data

  • Predictions fail during high volatility periods

  • Lack of a unified modeling language

Therefore, the financial domain requires a "Financial Language Model" that transforms complex market behavior into structured, predictable "language."

🦊 3. FOXAI Introduction

FOXAI is a self-developed candlestick language foundation model, focusing on modeling and prediction for cryptocurrencies. Through an innovative candlestick Tokenizer and a deeply optimized decoder-only Transformer architecture, FOXAI transforms complex market volatility into interpretable, predictable financial signals.

Vision

  • Unify time series modeling approaches across different markets (stocks, cryptocurrencies, futures, etc.)

  • Provide high-performance prediction and signal generation capabilities

  • Promote the popularization and standardization of "AI + Finance"

Core Highlights

  1. Financial Tokenizer: An innovative quantization method that converts continuous market data into discrete tokens, endowing them with financial semantics.

  2. Multi-Scale Models: From millions to hundreds of millions of parameters, covering research to production scenarios.

  3. Easy Fine-Tuning: Supports rapid fine-tuning based on local data, adapting to different markets.

⚙️ 4. Technical Architecture and Methodology

4.1. Core Design: Hierarchical Tokens

Rather than creating a single token for each candlestick data point, it quantizes it into a structured token with two components:

  • Coarse-grained Subtoken (bcb^{c}): This token captures the main, macro structure of candlestick data.

  • Fine-grained Subtoken (bfb^{f}): This token encodes residual information or details used to refine the coarse estimate.

4.2. Implementation: Specialized Tokenizer

This is achieved through a Transformer-based autoencoder and a technique called "Binary Spherical Quantization" (BSQ).

  • Why Hierarchical? Directly quantizing 6 continuous variables (OHLCVA) into a single high-precision token would cause the vocabulary size to explode exponentially (e.g., 2202^{20}), which is computationally intractable for subsequent models. By decomposing a kk-bit encoding into two k/2k/2-bit subtokens (bt=[btc,btf]b_t = [b_t^c, b_t^f]), the vocabulary size is dramatically reduced (from 2k2^k to 2×2k/22 \times 2^{k/2}), making computation more feasible.

The tokenizer's goal is to achieve a mapping:

Tokenizer: (open,high,low,close,volume)→[token1,token2,...]

Core steps:

  1. Numerical Discretization (Quantization) Map continuous price changes or ratios to a finite number of intervals (bins), where each interval corresponds to a token. For example, divide price changes into:

    [-∞, -5%), [-5%, -3%), [-3%, -1%), [-1%, 0%), [0%, +1%), [+1%, +3%), [+3%, +5%), [+5%, +∞)

    Each interval then has a token ID.

  2. Relative Change Encoding (Relative Encoding) Instead of using absolute prices, use relative changes (e.g., log-return or percentage change). Since financial markets have vastly different price scales across assets (e.g., BTC in tens of thousands of dollars, A-shares in tens of dollars), relative changes ensure more stable data distributions.

  3. Multi-Dimensional Concatenation (Multi-feature Tokenization) Each candlestick not only has price changes but may also include features such as volume, amplitude, and body ratio. Concatenate tokens from these dimensions into composite token sequences:

    [price_token, volume_token, volatility_token, ...]

    Or input them through multiple heads into the model's embedding layer.

  4. Temporal Flattening Expand a period of candlesticks (e.g., the past 100 candles) into a token sequence:

    [t1_open_token, t1_close_token, ..., t2_open_token, ...]

    The model inputs these tokens sequentially, learning temporal dependencies.

4.3. Key Mechanism: Hierarchical Reconstruction Loss

To ensure that these two subtokens truly learn information at different levels, the tokenizer uses a composite objective function during training:

  • Lcoarse\mathcal{L}_{coarse} (Coarse-grained Loss): This loss function uses only the coarse-grained subtoken (bcb^{c}) to attempt to reconstruct the original candlestick data. This forces bcb^{c} to learn how to capture the core structure of the data in a low-fidelity manner.

  • Lfine\mathcal{L}_{fine} (Fine-grained Loss): This loss function uses the complete token (bcb^{c} and bfb^{f}) to perform high-fidelity reconstruction. Since bcb^{c} already provides a rough framework, bfb^{f} is forced to learn the "residual information" necessary for refined reconstruction.

In this way, the "coarse-fine" hierarchical structure is intentionally embedded into the discrete tokens themselves during the tokenization stage.


4.4. Hierarchical Autoregressive Pre-training

Financial time series are naturally "sequential data" with the following characteristics:

  • Future depends on the past (with certain correlations)

  • Each step's output can serve as the next step's input

  • The model can continuously predict the state at the next moment

Autoregressive models perfectly satisfy these properties. Transformers (especially Decoder-only architectures like GPT) are essentially autoregressive neural networks. Therefore, after creating "financial words" (hierarchical tokens) in the first stage, the second stage aims to use a large Transformer model (a decoder-only Transformer, similar to GPT) to learn how these words combine over time into "sentences" and "paragraphs" - the grammar of market dynamics.

1. Core Design: Coarse-to-Fine Prediction

The autoregressive model leverages the hierarchical structure of tokens. Instead of predicting the next complete token at once, it predicts the two parts of the next token sequentially in two steps.

2. Implementation: Chain Probability Decomposition

The model's goal is to predict p(btb<t)p(b_t | b_{<t}) (i.e., given history b<tb_{<t}, predict the next token btb_t). Utilizing its hierarchical tokens, this probability is decomposed as:

p(btb<t)=p(btcb<t)p(btfb<t,btc)p(b_t | b_{<t}) = p(b_t^c | b_{<t}) \cdot p(b_t^f | b_{<t}, b_t^c)

This means the prediction process is as follows:

  1. Predict Coarse-grained Token: The model first uses historical information (b<tb_{<t}) to predict the coarse-grained subtoken for the next time step (p(btcb<t)p(b_t^c | b_{<t})). This is equivalent to deciding "what the next word is roughly about" before writing a sentence.

  2. Predict Fine-grained Token: Then, the model uses this predicted coarse-grained token (b^tc\hat{b}_t^c) as a new condition, combined with historical information, to predict the fine-grained subtoken (p(btfb<t,b^tc)p(b_t^f | b_{<t}, \hat{b}_t^c)). This is equivalent to filling in specific word details after determining the general meaning.

3. Key Mechanism: Eliminating Teacher-Forcing

When training the model to predict fine-grained tokens, a crucial detail is that the condition btcb_t^c it relies on is not from the ground-truth labels in the training data, but from the model's own predicted and sampled b^tc\hat{b}_t^c from the previous step.

  • Why do this? This is called "mitigating exposure bias." In real prediction (inference), the model does not have "standard answers" available. By using the model's own (potentially imperfect) predictions as inputs for the next step during training, the model becomes more robust to its own errors, better calibrating the training distribution and inference distribution.

📊 5. Model Versions and Performance

Version

Parameter Scale

Context Length

Application Scenarios

Mini

~4M

2048

Education / Quick Experiments

Small

~25M

512

Lightweight Research

Base

~100M

512

Mainstream Prediction Tasks

Large

~500M

512

High-Performance, Enterprise-Grade

🔧 6. Fine-Tuning and Application Scenarios

Fine-Tuning Process

  1. Prepare data (historical candlestick sequences)

  2. Preprocess using self-developed models

  3. Select appropriate model scale (Small/Base/Large)

  4. Launch fine-tuning training

  5. Deploy to trading/research systems

Application Scenarios

  • Short-term Price Prediction: Capture future trends to assist trading

  • Signal Generation: Provide buy/sell point references

  • Risk Control: Detect extreme market conditions and provide early warnings

  • Portfolio Prediction: Support multi-asset, multi-market prediction

⚠️ 7. Risks and Limitations

  • Market Unpredictability: Black swan events cannot be modeled

  • Data Bias: Significant quality differences in data across different exchanges/markets

  • Model Overfitting: Over-reliance on historical patterns may fail

  • Real Trading Challenges: Trading costs, slippage, and execution delays all affect performance

🏆 8. Competitive Landscape and Differentiation

Compared to general time series models:

  • FOXAI uses a financial-specific Tokenizer with stronger semantics

  • Decoder-only architecture optimized specifically for prediction tasks

Compared to other financial AI projects:

  • Provides multiple scale versions, adapting to different computational resources and needs

🗺️ 9. Development Roadmap

Near-term (6 months)

  • Optimize model performance and improve prediction accuracy

  • Add more exchange and asset category data

Medium-term (1-2 years)

  • Support cross-market, multi-frequency mixed inputs

  • Add interpretability modules (attention visualization, factor interpretation)

Long-term (3-5 years)

  • Become the standard infrastructure for financial AI

  • Support automated research, intelligent investment research, and intelligent trading systems

💼 10. User Value and Business Scenarios

10.1 Professional Traders

User Characteristics

  • Typically employed at brokerages, funds, or proprietary trading desks;

  • Highly focused on trading rhythm, market structure, and risk control;

  • Have their own market intuition and strategies, but decisions rely heavily on data and intuition.

Key Pain Points

  • Severe market volatility makes short-term prediction difficult;

  • Manual signal analysis is lagging and easily affected by emotions;

  • Lack of high-dimensional market sentiment and volatility early warnings.

Value

  • Provide "next period trend probability" through price prediction tasks;

  • Provide "severe market volatility warnings" through volatility prediction tasks;

  • Simulate future market conditions through generation tasks to help validate market intuition.


10.2 Quantitative Trading Teams

User Characteristics

  • Have algorithm engineers and quantitative researchers;

  • Develop quantitative systems using Python / Rust / C++ etc.;

  • Rely on model-driven signal generation and backtesting systems.

Key Pain Points

  • High strategy model training costs and poor generalization;

  • Inconsistent data distributions across different markets;

  • Lack of a unified time series foundation model framework;

  • New market deployment requires retraining.

Value

  • Unified time series modeling foundation: One model can be transferred across multiple markets;

  • Multi-task capabilities: Price signals + volatility + anomaly detection;

  • Generative models: Can create simulated markets for backtesting and risk testing;

  • Open architecture: Can be integrated with proprietary strategy engines.

10.3 Retail / Discretionary Traders

User Characteristics

  • Individual investors or small teams;

  • Rely on trading apps, market charts, and indicator analysis;

  • Primarily use technical analysis or subjective decision-making;

  • Limited understanding of AI models but hope to gain auxiliary judgment.

Key Pain Points

  • High market information noise, judgments often emotional;

  • Lack of systematic risk warnings;

  • Unable to establish high-quality strategy systems;

  • Rely on inefficient indicators (MACD, RSI, etc.).

Value

  • Provide "AI-driven trend/risk signals" to let users "see future tendencies";

  • Help understand market structure, such as "ranging or trending markets";

  • Generate simulated market conditions to improve trading learning and experience accumulation;

  • Integrate into trading terminals through simplified interfaces/APIs.

10.4 Brokerage / API Trading Platforms

User Characteristics

  • Provide trading infrastructure, signal services, and API interfaces;

  • Serve customers (B2B / B2C);

  • Require stable signal sources and data augmentation solutions.

Key Pain Points

  • Existing signal models are highly homogeneous;

  • Cannot quickly adapt to different markets;

  • Lack high-quality generative market data;

  • Need new differentiated AI services.

Value

  • Can serve as an AI signal engine embedded in proprietary platforms;

  • Provide cross-market unified prediction APIs;

  • Can utilize generation modules to provide "simulated market/teaching modes";

  • Provide AI capabilities as a service (AIaaS) to downstream customers (brokerages, quantitative companies, trading academies).

👥 11. Team and Partnerships

  • Team: Composed of blockchain and financial AI researchers, full-stack engineers

  • Partners: Academic institutions, trading platforms, quantitative teams

  • Open Collaboration: Welcome researchers, developers, and investors to participate together

Last updated