Engineering notes, architectural deep-dives, and practical playbooks from the Devforth team.
Latest
AI Infrastructure
Self-hosted GPT: real response time, token throughput, and cost on L4, L40S and H100 for GPT-OSS-20B
We benchmarked modern open-source LLMs across several popular GPUs to measure real-world context limits, throughput, latency, and cost efficiency under varying levels of concurrency — as close as possible to real production conditions. Here we share the results.
A clear guide to generative AI and LLM terminology. Learn how model weights, quantization, inference, context length, batching, sampling and many more — including how to evaluate vendor APIs and self-host models like GPT-OSS-20B.
AI Infrastructure
GPT-J is a self-hosted open-source analog of GPT-3: how to run in Docker
Learn how to setup open-source GPT-J model on custom cheapest servers with GPU. Try to run the text generation AI model of the future and talk to it right now!