Together AI Launches Cost-Efficient Batch API for LLM Requests

NEW

Together AI Launches Cost-Efficient Batch API for LLM Requests - Blockchain.News

Together AI has unveiled its new Batch API, a service designed to process large volumes of large language model (LLM) requests at significantly reduced costs. According to Together AI, the Batch API promises to deliver enterprise-grade performance at half the cost of real-time inference, making it an attractive option for businesses and developers.

Why Batch Processing?

Batch processing allows for the handling of AI workloads that do not require immediate responses, such as synthetic data generation and offline summarization. By processing these requests asynchronously during off-peak times, users can benefit from reduced costs while maintaining reliable output. Most batches are completed within a few hours, with a maximum processing window of 24 hours.

Key Benefits

50% Cost Savings

The Batch API offers a 50% cost reduction on non-urgent workloads compared to real-time API calls, enabling users to scale AI inference without increasing their budgets.

Large Scale Processing

Users can submit up to 50,000 requests in a single batch file, with batch operations having their own rate limits separate from real-time usage. The service includes real-time progress tracking through various stages, from validation to completion.

Simple Integration

Requests are uploaded as JSONL files, with progress monitored through the Batch API. Results can be downloaded once processing is complete.

Supported Models

The Batch API supports 15 advanced models, including deepseek-ai and meta-llama series, which are tailored to handle a variety of complex tasks.

How It Works

Prepare Your Requests: Format requests in a JSONL file, each with a unique identifier.
Upload & Submit: Use the Files API to upload the batch and create the job.
Monitor Progress: Track the job through various processing stages.
Download Results: Retrieve structured results, with any errors documented separately.

Rate Limits & Scale

The Batch API operates under dedicated rate limits, allowing up to 10 million tokens per model and 50,000 requests per batch file, with a maximum size of 100MB per input file.

Pricing and Best Practices

Users benefit from an introductory 50% discount, with no upfront commitments. Optimal batch sizes range from 1,000 to 10,000 requests, and model selection should be based on task complexity. Monitoring is advised every 30-60 seconds for updates.

Getting Started

To begin using the Batch API, users should upgrade to the latest together Python client, review the Batch API documentation, and explore example cookbooks available online. The service is now available for all users, offering significant cost savings for bulk processing of LLM requests.

Image source: Shutterstock