Our benchmark results are available under NDA. Every number comes from trained models evaluated on standard benchmarks. No cherry-picking.
If you have an access code, enter it below. Otherwise, reach out to request access.
Incorrect access code.
Every number below comes from trained models evaluated on standard benchmarks. Same evaluation pipeline across all comparisons. No cherry-picking.
7B parameter model, 30x compression. Evaluated on standard NLP benchmarks.
| Benchmark | Our 7B (compressed) | Standard 1.5B | Our 7B + SFT |
|---|---|---|---|
| WikiText PPL | 21.41 | 21.04 | 26.24 |
| HellaSwag | 37.4% | 38.0% | 38.8% |
| ARC-Easy | 42.8% | 43.0% | 46.0% |
| ARC-Challenge | 22.4% | 20.4% | 23.6% |
| MMLU | 25.1% | 25.3% | 26.4% |
Same architecture at different compression levels. All trained from scratch on identical data.
| Configuration | Compression | Stored Params | ARC-C | MMLU |
|---|---|---|---|---|
| Standard | 1x | 159M | 20.4% | 25.3% |
| Level 3 (r=128) | 8.3x | 19.2M | 20.4% | 25.0% |
| Level 4 (HyperNet) | 24.8x | 6.4M | 20.4% | 22.9% |
| Level 5 (Multi-Scale) | 57.3x | 2.8M | 19.2% | 25.1% |
| Model | Compression | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|
| Standard | 1x | 73.2% | ~91% |
| Compressed (5x) | 5.0x | 72.5% | ~91% |
| Compressed (8.7x) | 8.7x | 71.7% | ~91% |
Linear probe on InstaDeep Nucleotide Transformer benchmark. Same pipeline for all models.
| Task Category | Our Model (1.2M params) | Standard (19M) | NT-v2-50M (56M) |
|---|---|---|---|
| Promoter detection (3 tasks) | 0.82-0.84 MCC | 0.82-0.84 | 0.83-0.87 |
| Enhancer classification (2 tasks) | 0.48-0.50 | 0.46-0.48 | 0.44-0.49 |
| Splice sites (3 tasks) | 0.31-0.51 | 0.33-0.53 | 0.51-0.61 |
| Histone marks (10 tasks) | 0.18-0.72 | 0.18-0.73 | 0.20-0.73 |
| Average (18 tasks) | 0.476 | 0.480 | 0.528 |
| Metric | Our Model | ESM-2 650M |
|---|---|---|
| Parameters (stored) | 4.2M | 651M |
| Compression vs ESM-2 | 153x | 1x |
| Amino acid diversity | 20/20 | 20/20 |
| Validation perplexity | 15.1 | - |
| Training data | UniRef90 (130M seqs) | UniRef50 |
Real trained weights with simulated crossbar effects (quantization + conductance noise).
| Precision | LLM (PPL) | ViT (Accuracy) | Quality |
|---|---|---|---|
| 16-bit (digital) | 40.2 | 73.7% | Baseline |
| 8-bit crossbar | 40.1 | 73.7% | Identical |
| 6-bit crossbar | 41.1 | 73.4% | Excellent |
| 4-bit crossbar | 47.8 | 73.3% | Good |
| 3-bit crossbar | 159.2 | 71.7% | Degraded |
| 2-bit crossbar | 27,663 | 56.6% | Broken |
| Metric | Our Architecture | Standard (LoRA) |
|---|---|---|
| Adapter size | ~10 MB | ~100 MB |
| Domain switch time | 2 ms | 50-200 ms |
| 100 specialists in memory | 1.3 GB | 1,400 GB |
| Medical specialist (MedMCQA) | +8.4 pp | - |
| Math specialist (GSM8K) | 8x improvement | - |
| Model | Compression | FID Score |
|---|---|---|
| Standard | 1x | 13.00 |
| Compressed (5x) | 5.0x | 16.30 |
| Compressed (8.7x) | 8.7x | 16.61 |
All results from models trained and evaluated by our team. Evaluation code and methodology available on request.
Request technical details → gaurav@nonlinear.technology