Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale daft.ai 5 points by ykev 7 hours ago
Part of the Daft team here! Happy to answer any questions