Confidential

Numbers, not promises.

Our benchmark results are available under NDA. Every number comes from trained models evaluated on standard benchmarks. No cherry-picking.

If you have an access code, enter it below. Otherwise, reach out to request access.

Request NDA access → gaurav@nonlinear.technology

Results & Benchmarks

Numbers, not promises.

Every number below comes from trained models evaluated on standard benchmarks. Same evaluation pipeline across all comparisons. No cherry-picking.

Language Models

7B parameter model, 30x compression. Evaluated on standard NLP benchmarks.

Benchmark	Our 7B (compressed)	Standard 1.5B	Our 7B + SFT
WikiText PPL	21.41	21.04	26.24
HellaSwag	37.4%	38.0%	38.8%
ARC-Easy	42.8%	43.0%	46.0%
ARC-Challenge	22.4%	20.4%	23.6%
MMLU	25.1%	25.3%	26.4%

Compressed 7B matches or beats full-size 1.5B Standard on all benchmarks at 30x compression. After fine-tuning (SFT), it improves further across every metric.

Compression vs Quality (1.5B scale)

Same architecture at different compression levels. All trained from scratch on identical data.

Configuration	Compression	Stored Params	ARC-C	MMLU
Standard	1x	159M	20.4%	25.3%
Level 3 (r=128)	8.3x	19.2M	20.4%	25.0%
Level 4 (HyperNet)	24.8x	6.4M	20.4%	22.9%
Level 5 (Multi-Scale)	57.3x	2.8M	19.2%	25.1%

The compression curve is remarkably flat. 8x to 57x compression costs only 1-2 percentage points on reasoning benchmarks. Level 5 at 57x compression matches Standard on MMLU.

Vision (ViT-Small, CIFAR-100)

Model	Compression	Top-1 Accuracy	Top-5 Accuracy
Standard	1x	73.2%	~91%
Compressed (5x)	5.0x	72.5%	~91%
Compressed (8.7x)	8.7x	71.7%	~91%

DNA Genomics (18 Downstream Tasks)

Linear probe on InstaDeep Nucleotide Transformer benchmark. Same pipeline for all models.

Task Category	Our Model (1.2M params)	Standard (19M)	NT-v2-50M (56M)
Promoter detection (3 tasks)	0.82-0.84 MCC	0.82-0.84	0.83-0.87
Enhancer classification (2 tasks)	0.48-0.50	0.46-0.48	0.44-0.49
Splice sites (3 tasks)	0.31-0.51	0.33-0.53	0.51-0.61
Histone marks (10 tasks)	0.18-0.72	0.18-0.73	0.20-0.73
Average (18 tasks)	0.476	0.480	0.528

1.2M parameter model beats 56M parameter NT-v2 on 5 of 18 tasks - including promoter detection and enhancer classification. Trained on a single human genome vs NT-v2's 850 genomes.

Protein Language Model

Metric	Our Model	ESM-2 650M
Parameters (stored)	4.2M	651M
Compression vs ESM-2	153x	1x
Amino acid diversity	20/20	20/20
Validation perplexity	15.1	-
Training data	UniRef90 (130M seqs)	UniRef50

Analog Hardware Tolerance

Real trained weights with simulated crossbar effects (quantization + conductance noise).

Precision	LLM (PPL)	ViT (Accuracy)	Quality
16-bit (digital)	40.2	73.7%	Baseline
8-bit crossbar	40.1	73.7%	Identical
6-bit crossbar	41.1	73.4%	Excellent
4-bit crossbar	47.8	73.3%	Good
3-bit crossbar	159.2	71.7%	Degraded
2-bit crossbar	27,663	56.6%	Broken

At 4-bit crossbar precision, ViT loses only 0.4% accuracy. Compressed ViT at 4-bit (73.3%) beats standard digital ViT (72.5%). The precision cliff is between 4-bit and 3-bit.

Multi-Tenant Serving

Metric	Our Architecture	Standard (LoRA)
Adapter size	~10 MB	~100 MB
Domain switch time	2 ms	50-200 ms
100 specialists in memory	1.3 GB	1,400 GB
Medical specialist (MedMCQA)	+8.4 pp	-
Math specialist (GSM8K)	8x improvement	-

Diffusion (DiT, CIFAR-10)

Model	Compression	FID Score
Standard	1x	13.00
Compressed (5x)	5.0x	16.30
Compressed (8.7x)	8.7x	16.61

All results from models trained and evaluated by our team. Evaluation code and methodology available on request.

Request technical details → gaurav@nonlinear.technology