top
new
show
ask
jobs
about

Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale

5 points by ykev 7 hours ago

sammysidhu 6 hours ago

Part of the Daft team here! Happy to answer any questions