https://neurobrix.es/, https://nordsms.es/, https://wizworks.io/, https://artune.ai/

WizWorks, Harju maakond, Kesklinna linnaosa, Tornimäe tn 5, Tallinn (2026)

12/03/2026

1 million token context windows in AI models.

While technically impressive, larger context sizes do not necessarily translate into better reasoning.

Computational costs, practical implications, and the future of AI systems may depend more on smarter context management (retrieval, memory, agents) than simply increasing token limits.

05/03/2026

"The Universal Runtime Vision: Why We're Not Targeting Mobile (And What We're Building Instead)"

NeuroBrix has a long-term goal: become the standard runtime for neural network inference. Any model. Any GPU architecture. One engine.

Here's our Phase 3 vision for 2027 — and one deliberately honest decision.

𝟔+ 𝐆𝐏𝐔 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞𝐬. NVIDIA, AMD, Intel, Apple Silicon, ARM (Jetson, Snapdragon), and RISC-V. The Prism solver already abstracts hardware into YAML profiles — adding a new GPU family means writing a hardware profile and validating dtype support. The ex*****on engine stays the same.

𝟗𝟓% 𝐓𝐫𝐢𝐭𝐨𝐧 𝐤𝐞𝐫𝐧𝐞𝐥 𝐜𝐨𝐯𝐞𝐫𝐚𝐠𝐞. Today, NeuroBrix uses PyTorch ATen as the default dispatch layer, with optional Triton kernels. By 2027, the vast majority of operations will have custom Triton implementations — reducing PyTorch to a weight-loading utility, not a runtime dependency.

𝐆𝐫𝐚𝐩𝐡 𝐝𝐞𝐛𝐮𝐠𝐠𝐞𝐫. Set breakpoints inside the computation graph. Inspect intermediate tensors at any point. Step through ex*****on op-by-op. Today, we have NBX_TRACE_ZEROS, NBX_TRACE_NAN, and NBX_NAN_GUARD as environment variables for debugging. The graph debugger turns this into a proper interactive tool.

𝐒𝐃𝐊 𝐟𝐨𝐫 𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧𝐬. A stable Python API for embedding NeuroBrix in other applications: web services, batch pipelines, and orchestration platforms. Import, load, execute — three calls.

𝟏𝟓𝟎+ 𝐦𝐨𝐝𝐞𝐥𝐬. Comprehensive coverage across diffusion, LLM, multimodal, audio, and video. Every model uses the same .nbx format, runtime, and CLI.

Now — the honest part.

𝐖𝐞 𝐚𝐫𝐞 𝐧𝐨𝐭 𝐭𝐚𝐫𝐠𝐞𝐭𝐢𝐧𝐠 𝐦𝐨𝐛𝐢𝐥𝐞.

NeuroBrix is built on Python and Triton. These don't run on phones. We will not compile to WASM, ship an iOS framework, or pretend that mobile inference is around the corner for us.

Server-side and edge GPUs (Jetson, ARM servers) are real targets. Apple Silicon Macs are a real target. Phones and browsers are not.

If you need on-device mobile inference, Core ML, TensorFlow Lite, and ONNX Runtime Mobile are better tools. We'd rather point you to the right solution than ship a bad experience.

This is a deliberate technical decision. We believe doing fewer things exceptionally well is more valuable than doing everything poorly.

If Triton gains mobile support or WebGPU matures for real inference — we'll revisit. But we don't chase hype.

Follow: github.com/NeuroBrix/neurobrix

Open source. Apache 2.0. pip install neurobrix

Universal AI Runtime — Execute any model on any hardware - NeuroBrix/neurobrix

05/03/2026

What is NeuroBrix?
NeuroBrix is a universal deep learning inference engine that allows you to run any model, in any modality, on any hardware, all through a single engine without requiring model-specific code.

Why is it a game-changer?

Unified ex*****on
No more switching between tools like Ollama, ComfyUI, or vLLM. With NeuroBrix, you can run LLMs, image generation models (such as FLUX), audio, video, and multimodal models from a single interface via the CLI.

Universal format (.nbx)
Models are packaged into .nbx containers that include everything required for deterministic ex*****on.

Hardware intelligence (Prism)
The engine analyses your available hardware — whether NVIDIA, AMD, Intel GPUs, or Apple Silicon — and automatically selects the optimal ex*****on strategy, including multi-GPU distribution, parallelism, or CPU offloading.

Zero assumptions
Unlike other engines, NeuroBrix does not assume whether it is processing text, images, or audio. It only operates on tensors and computational graphs, which makes the system truly universal and highly robust.

pip install neurobrix

https://github.com/NeuroBrix

04/03/2026

"NeuroBrix Roadmap: What's Coming in 2026"

NeuroBrix launched with support for diffusion models, MoE LLMs, and multimodal architectures — all running on a single universal runtime with automatic hardware allocation.

Here's what's coming next.

𝐏𝐡𝐚𝐬𝐞 𝟏 — 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 (𝐐𝟏-𝐐𝟐 𝟐𝟎𝟐𝟔)

LoRA support is our top priority. Loading and applying LoRA adapters at runtime is critical for the community — custom styles, fine-tuned behaviors, and domain-specific models all through the same unified engine.

Multi-hardware validation: we're testing on AMD ROCm (MI100/MI250), Apple Silicon (MPS), and Intel Arc GPUs. Today, NeuroBrix runs on NVIDIA. By mid-2026, it should run on every major GPU vendor.

Audio models: full Whisper support across all model sizes. Same engine, same .nbx container format, same CLI. Import, serve, transcribe.

Built-in profiler: a --profile flag that measures time and memory per operation. See exactly where your model spends its compute.

Community hardware profiles program: submit your GPU config as YAML, help NeuroBrix run on more hardware. We especially need AMD Instinct, Apple M-series, and consumer NVIDIA (RTX 3090/4090) profiles.

𝐏𝐡𝐚𝐬𝐞 𝟐 — 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 (𝐐𝟑-𝐐𝟒 𝟐𝟎𝟐𝟔)

Quantization: INT4 and FP8, with support for community formats (AWQ, GPTQ). Smaller models, faster inference, same correctness guarantees.

Fused Triton kernels: LayerNorm+Linear, GELU+MatMul fused into single GPU calls. Less memory bandwidth, more compute throughput.

Video models: CogVideoX and other text-to-video architectures. The iterative ex*****on flow already supports temporal denoising loops — the plumbing is there.

KV cache quantization: INT8/FP8 KV cache for longer context windows on limited memory. Critical for running 262K-context models like Qwen3-30B on smaller GPU setups.

Graph visualizer: an interactive web tool that shows what's actually happening inside your model at each step. Every operation, every tensor, every data flow.

Target: 50+ models in the registry by the end of Phase 1, 100+ by the end of Phase 2.

Follow the progress: https://github.com/NeuroBrix/neurobrix

Open source. Apache 2.0.

pip install neurobrix

📦 https://pypi.org/project/neurobrix/
💻 https://github.com/NeuroBrix/neurobrix

03/03/2026

"Prism: How We Automatically Distribute a 105GB Model Across 4 GPUs"

You have 4x V100-32G GPUs connected via NVLink. You want to run FLUX.2-dev — a 32B parameter, 105GB model. How do you split it?

Most people spend hours writing custom sharding configs. Prism does it in seconds.

Prism is our automatic hardware solver. It reads the model's memory footprint directly from safetensors headers — without loading a single weight into memory — and scores 11 ex*****on strategies to find the optimal one.

Here's the scoring cascade:

single_gpu (1000) → single_gpu_lifecycle (900) → pp_nvlink (800) → tp (780) → fgp_nvlink (750) → pp_pcie (700) → fgp_pcie (650) → pp_lazy_nvlink (500) → pp_lazy_pcie (400) → lazy_sequential (300) → zero3 (100)

If the best strategy doesn't fit your hardware, Prism automatically falls through to the next viable option—no manual intervention.

What makes this different from vLLM's parallelism or DeepSpeed's sharding?

𝟏. 𝐇𝐞𝐭𝐞𝐫𝐨𝐠𝐞𝐧𝐞𝐨𝐮𝐬 𝐆𝐏𝐔 𝐬𝐮𝐩𝐩𝐨𝐫𝐭. Mixed GPU configs — 2x V100-16GB + 2x V100-32GB — work out of the box. Block-level sharding is weighted by each device's available VRAM. Nobody else does this automatically.

𝟐. 𝐈𝐧𝐭𝐞𝐫𝐜𝐨𝐧𝐧𝐞𝐜𝐭-𝐚𝐰𝐚𝐫𝐞 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐲 𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧. NVLink at 300 GB/s behaves fundamentally differently from PCIe. Prism maintains separate strategies: pp_nvlink vs pp_pcie, fgp_nvlink vs fgp_pcie. Tensor parallelism (tp) is only selected when NVLink is available.

𝟑. 𝐅𝐢𝐧𝐞-𝐠𝐫𝐚𝐢𝐧𝐞𝐝 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐬𝐦 𝐟𝐨𝐫 𝐌𝐨𝐄. Mixture-of-Experts models (DeepSeek-MoE-16B, Qwen3-30B-A3B) need expert-level distribution, not just layer-level. The fgp strategies distribute experts across GPUs based on memory constraints. This is purpose-built for MoE routing.

𝟒. 𝐋𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞-𝐚𝐰𝐚𝐫𝐞 𝐦𝐞𝐦𝐨𝐫𝐲. Components are classified as persistent (always in VRAM) or transient (loaded on demand). For diffusion: text encoder runs → unloads → denoiser loads → runs → unloads → VAE loads → decodes. Prism calculates the peak of this sequence, not the sum.

𝟓. 𝐊𝐕 𝐜𝐚𝐜𝐡𝐞 𝐦𝐞𝐦𝐨𝐫𝐲 𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐢𝐨𝐧. For LLMs, Prism computes: max_tokens × num_layers × 2 × num_kv_heads × head_dim × dtype_bytes. All from metadata. No trial-and-error OOM testing.

One flag. That's all:

neurobrix serve --model flux2-dev --hardware c4140-4xv100-custom-nvlink

Open source. Apache 2.0.

pip install neurobrix
💻 https://github.com/NeuroBrix/neurobrix
🌐 https://neurobrix.es/models

Hocine Benkelaya

02/03/2026

"The Numerical Stability Bugs Nobody Tells You About (And How We Fixed Them)"

When we built the NeuroBrix DtypeEngine, we discovered stability issues that most inference engines silently ignore. Here are three that cost us weeks to track down.

𝟏. 𝐛𝐦𝐦 𝐦𝐮𝐬𝐭 𝐫𝐮𝐧 𝐢𝐧 𝐅𝐏𝟑𝟐 (𝐧𝐨𝐭 𝐅𝐏𝟏𝟔)

PyTorch's AMP classifies batched matrix multiplication (bmm) as an FP16 operation — same category as standard matmul and conv2d. Makes sense for most architectures.

Except T5-XXL. T5's cross-attention produces intermediate values that overflow fp16 range when batched. The result: subtle quality degradation that's invisible unless you diff against the fp32 reference output.

We intentionally deviate from PyTorch's AMP rules and classify bmm as FP32. This one decision fixed all T5-family text encoders — which power FLUX, PixArt-Sigma, and every model using T5 conditioning.

𝟐. 𝐑𝐨𝐏𝐄 𝐧𝐞𝐞𝐝𝐬 𝐅𝐏𝟑𝟐 𝐜𝐨𝐦𝐩𝐥𝐞𝐱 𝐚𝐫𝐢𝐭𝐡𝐦𝐞𝐭𝐢𝐜

Rotary Position Embeddings (RoPE) use polar() and view_as_complex() to compute rotations. In fp16, the complex number precision collapses at long sequences — positions beyond ~4K tokens start blurring together.

We force polar and view_as_complex to FP32. This is why DeepSeek-MoE-16B and Qwen3-30B-A3B run correctly through NeuroBrix on 262K context — the position encoding stays precise end-to-end.

𝟑. 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥 𝐨𝐯𝐞𝐫𝐟𝐥𝐨𝐰 𝐢𝐧 𝐝𝐞𝐞𝐩 𝐧𝐞𝐭𝐰𝐨𝐫𝐤𝐬

In a 32-block transformer, the residual stream accumulates through repeated add operations. In fp16, values can silently hit ±65504 (the fp16 max) — producing Inf, which becomes NaN on the next operation. One NaN propagates through the entire forward pass.

Our DtypeEngine applies post-computation overflow clamping on add and sub operations in fp16. Values are clamped to the representable range before they can produce Inf. This stops the NaN cascade before it starts.

These aren't theoretical problems. They're bugs we hit running real models on V100 hardware. Every inference engine that runs in mixed precision will encounter them — most just don't surface them.

We built NeuroBrix's DtypeEngine to encode these rules explicitly, per-operation, with no silent defaults. The full AMP rule table: FP32 for pow/rsqrt/softmax/layer_norm/bmm/polar, FP16 for mm/conv2d, promote-to-highest for add/mul/cat.

Open source. Apache 2.0.

pip install neurobrix
💻 https://github.com/NeuroBrix/neurobrix

Hocine Benkelaya
Vladimir WizWorks

Universal AI Runtime — Execute any model on any hardware - NeuroBrix/neurobrix

27/02/2026

¡El futuro de la IA ya está aquí!

Presentamos NeuroBrix, el motor de ejecución nativo para IA desarrollado por WizWorks OÜ.

Diseñado para ser ligero, modular y ultraeficiente, NeuroBrix permite orquestar modelos de lenguaje complejos con total soberanía de datos.

El sistema está optimizado para la construcción de la próxima generación de agentes inteligentes.

Visítanos en: https://neurobrix.es

https://github.com/NeuroBrix/neurobrix

Universal AI Runtime — Execute any model on any hardware - NeuroBrix/neurobrix

14/07/2025

Inicia sesión y lanza tu primera campaña hoy mismo

09/07/2025

https://www.f6s.com/wizworks-ou

WizWorks OÜ - Marketing & Sales

07/07/2025

Votre message est-il lu ?

Avec des taux d’ouverture allant jusqu’à 98 %, le SMS reste l’un des canaux les plus efficaces pour atteindre vos clients. Pendant que les e-mails se perdent dans les spams et que les réseaux sociaux se battent pour capter l’attention, le SMS arrive directement dans la poche — et est lu en moins de 3 minutes.

Notre plateforme vous permet de profiter de cette puissance avec des campagnes faciles à lancer, des statistiques en temps réel et un support en français.
Connectez, fidélisez et obtenez des résultats. C’est aussi simple que ça.

07/07/2025

Is your message getting read?

With open rates of up to 98%, SMS remains one of the most effective ways to reach your customers. While emails get lost in spam folders and social media fights for attention, SMS goes straight to the pocket — and is read in under 3 minutes.

Our platform helps you tap into that power with easy-to-launch campaigns, real-time analytics, and support in Spanish.
Connect, build loyalty, and drive results. It’s that simple.

WizWorks

12/03/2026

05/03/2026

05/03/2026

04/03/2026

03/03/2026

02/03/2026

27/02/2026

14/07/2025

09/07/2025

07/07/2025

07/07/2025

Address

Telephone

Website

Alerts

Contact The Business

Shortcuts

Share

Category