AI Infrastructure

Engineering notes, architectural deep-dives, and practical playbooks from the Devforth team.

Latest

AI InfrastructureFeb 20, 2026

Self-hosted GPT: real response time, token throughput, and cost on L4, L40S and H100 for GPT-OSS-20B

We benchmarked modern open-source LLMs across several popular GPUs (L4, L40S and H100, RTX 4090) to measure real-world context limits, throughput, latency, and cost efficiency under varying levels of concurrency — as close as possible to real production conditions. Here we share the results.

AI InfrastructureFeb 16, 2026

LLM Terminology Guide: Weights, Inference, Effective sequence length, and Self-Hosting Explained

A clear guide to generative AI and LLM terminology. Learn how model weights, quantization, inference, context length, batching, sampling and many more — including how to evaluate vendor APIs and self-host models like GPT-OSS-20B.

AI InfrastructureOct 21, 2022

GPT-J is a self-hosted open-source analog of GPT-3: how to run in Docker

Learn how to setup open-source GPT-J model on custom cheapest servers with GPU. Try to run the text generation AI model of the future and talk to it right now!

Image Licensing & Usage Policy Core Engineering Intelligence