<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Оптимизация инференса LLM on</title><link>/theory/inference-optimization/</link><description>Recent content in Оптимизация инференса LLM on</description><generator>Hugo -- gohugo.io</generator><language>ru</language><managingEditor>rakhmankulovbulat@gmail.com</managingEditor><webMaster>rakhmankulovbulat@gmail.com</webMaster><copyright>© 2026</copyright><lastBuildDate>Sat, 28 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="/theory/inference-optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>Ключевые метрики LLM инференса</title><link>/theory/inference-optimization/llm-inference-metrics/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/llm-inference-metrics/</guid><description/></item><item><title>Бенчмарки LLM</title><link>/theory/inference-optimization/llm-performance-benchmarks/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/llm-performance-benchmarks/</guid><description/></item><item><title>Статический, динамический и непрерывный батчинг</title><link>/theory/inference-optimization/static-dynamic-continuous-batching/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/static-dynamic-continuous-batching/</guid><description/></item><item><title>FlashAttention</title><link>/theory/inference-optimization/flashattention/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/flashattention/</guid><description/></item><item><title>PagedAttention</title><link>/theory/inference-optimization/pagedattention/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/pagedattention/</guid><description/></item><item><title>Speculative decoding</title><link>/theory/inference-optimization/speculative-decoding/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/speculative-decoding/</guid><description/></item><item><title>Дисагрегация prefill и decode</title><link>/theory/inference-optimization/prefill-decode-disaggregation/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/prefill-decode-disaggregation/</guid><description/></item><item><title>Кэширование префикса (Prefix caching)</title><link>/theory/inference-optimization/prefix-caching/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/prefix-caching/</guid><description/></item><item><title>Маршрутизация с учётом префикса</title><link>/theory/inference-optimization/prefix-aware-routing/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/prefix-aware-routing/</guid><description/></item><item><title>Балансировка нагрузки с учётом KV-кэша</title><link>/theory/inference-optimization/kv-cache-utilization-aware-load-balancing/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/kv-cache-utilization-aware-load-balancing/</guid><description/></item><item><title>Выгрузка KV-кэша (KV cache offloading)</title><link>/theory/inference-optimization/kv-cache-offloading/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/kv-cache-offloading/</guid><description/></item><item><title>Data, tensor, pipeline, expert и гибридный параллелизм</title><link>/theory/inference-optimization/data-tensor-pipeline-expert-hybrid-parallelism/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/data-tensor-pipeline-expert-hybrid-parallelism/</guid><description/></item><item><title>Оффлайн пакетный инференс</title><link>/theory/inference-optimization/offline-batch-inference/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><author>rakhmankulovbulat@gmail.com</author><guid>/theory/inference-optimization/offline-batch-inference/</guid><description/></item></channel></rss>