diff --git a/api/data_types/index.html b/api/data_types/index.html
index 578a0e549..850b140db 100644
--- a/api/data_types/index.html
+++ b/api/data_types/index.html
@@ -1275,21 +1275,20 @@ <h1 id="python-client-data-type-reference">🐍 Python Client Data Type Referenc
 
 
 <h2 id="llmengine.CompletionOutput" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CompletionOutput</span>
+          <span class="doc doc-object-name doc-class-name">CompletionOutput</span>
 
 
 <a href="#llmengine.CompletionOutput" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Represents the output of a completion request to a model.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1305,7 +1304,7 @@ <h2 id="llmengine.CompletionOutput" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CompletionOutput.text" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">text</span>
+          <span class="doc doc-object-name doc-attribute-name">text</span>
 
   
   <span class="doc doc-labels">
@@ -1328,7 +1327,7 @@ <h3 id="llmengine.data_types.CompletionOutput.text" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CompletionOutput.num_completion_tokens" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">num_completion_tokens</span>
+          <span class="doc doc-object-name doc-attribute-name">num_completion_tokens</span>
 
   
   <span class="doc doc-labels">
@@ -1361,16 +1360,15 @@ <h3 id="llmengine.data_types.CompletionOutput.num_completion_tokens" class="doc
 
 
 <h2 id="llmengine.CompletionStreamOutput" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CompletionStreamOutput</span>
+          <span class="doc doc-object-name doc-class-name">CompletionStreamOutput</span>
 
 
 <a href="#llmengine.CompletionStreamOutput" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
-
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
 
 
@@ -1389,7 +1387,7 @@ <h2 id="llmengine.CompletionStreamOutput" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CompletionStreamOutput.text" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">text</span>
+          <span class="doc doc-object-name doc-attribute-name">text</span>
 
   
   <span class="doc doc-labels">
@@ -1412,7 +1410,7 @@ <h3 id="llmengine.data_types.CompletionStreamOutput.text" class="doc doc-heading
 
 
 <h3 id="llmengine.data_types.CompletionStreamOutput.finished" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">finished</span>
+          <span class="doc doc-object-name doc-attribute-name">finished</span>
 
   
   <span class="doc doc-labels">
@@ -1435,12 +1433,12 @@ <h3 id="llmengine.data_types.CompletionStreamOutput.finished" class="doc doc-hea
 
 
 <h3 id="llmengine.data_types.CompletionStreamOutput.num_completion_tokens" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">num_completion_tokens</span>
+          <span class="doc doc-object-name doc-attribute-name">num_completion_tokens</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CompletionStreamOutput.num_completion_tokens" class="headerlink" title="Permanent link">&para;</a></h3>
@@ -1469,21 +1467,20 @@ <h3 id="llmengine.data_types.CompletionStreamOutput.num_completion_tokens" class
 
 
 <h2 id="llmengine.CompletionSyncResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CompletionSyncResponse</span>
+          <span class="doc doc-object-name doc-class-name">CompletionSyncResponse</span>
 
 
 <a href="#llmengine.CompletionSyncResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for a synchronous prompt completion.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1499,7 +1496,7 @@ <h2 id="llmengine.CompletionSyncResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CompletionSyncResponse.output" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">output</span>
+          <span class="doc doc-object-name doc-attribute-name">output</span>
 
   
   <span class="doc doc-labels">
@@ -1522,7 +1519,7 @@ <h3 id="llmengine.data_types.CompletionSyncResponse.output" class="doc doc-headi
 
 
 <h3 id="llmengine.data_types.CompletionSyncResponse.request_id" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">request_id</span>
+          <span class="doc doc-object-name doc-attribute-name">request_id</span>
 
   
   <span class="doc doc-labels">
@@ -1555,21 +1552,20 @@ <h3 id="llmengine.data_types.CompletionSyncResponse.request_id" class="doc doc-h
 
 
 <h2 id="llmengine.CompletionStreamResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CompletionStreamResponse</span>
+          <span class="doc doc-object-name doc-class-name">CompletionStreamResponse</span>
 
 
 <a href="#llmengine.CompletionStreamResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for a stream prompt completion task.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1585,12 +1581,12 @@ <h2 id="llmengine.CompletionStreamResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CompletionStreamResponse.output" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">output</span>
+          <span class="doc doc-object-name doc-attribute-name">output</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CompletionStreamResponse.output" class="headerlink" title="Permanent link">&para;</a></h3>
@@ -1609,7 +1605,7 @@ <h3 id="llmengine.data_types.CompletionStreamResponse.output" class="doc doc-hea
 
 
 <h3 id="llmengine.data_types.CompletionStreamResponse.request_id" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">request_id</span>
+          <span class="doc doc-object-name doc-attribute-name">request_id</span>
 
   
   <span class="doc doc-labels">
@@ -1642,21 +1638,20 @@ <h3 id="llmengine.data_types.CompletionStreamResponse.request_id" class="doc doc
 
 
 <h2 id="llmengine.CreateFineTuneRequest" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CreateFineTuneRequest</span>
+          <span class="doc doc-object-name doc-class-name">CreateFineTuneRequest</span>
 
 
 <a href="#llmengine.CreateFineTuneRequest" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Request object for creating a FineTune.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1672,16 +1667,19 @@ <h2 id="llmengine.CreateFineTuneRequest" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CreateFineTuneRequest.model" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">model</span>
+          <span class="doc doc-object-name doc-attribute-name">model</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CreateFineTuneRequest.model" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>model: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Identifier of base model to train from.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>model: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Identifier of base model to train from.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1696,16 +1694,19 @@ <h3 id="llmengine.data_types.CreateFineTuneRequest.model" class="doc doc-heading
 
 
 <h3 id="llmengine.data_types.CreateFineTuneRequest.training_file" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">training_file</span>
+          <span class="doc doc-object-name doc-attribute-name">training_file</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CreateFineTuneRequest.training_file" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>training_file: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Path to file of training dataset. Dataset must be a csv with columns &#39;prompt&#39; and &#39;response&#39;.&quot;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>training_file: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Path to file of training dataset. Dataset must be a csv with columns &#39;prompt&#39; and &#39;response&#39;.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1720,16 +1721,19 @@ <h3 id="llmengine.data_types.CreateFineTuneRequest.training_file" class="doc doc
 
 
 <h3 id="llmengine.data_types.CreateFineTuneRequest.validation_file" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">validation_file</span>
+          <span class="doc doc-object-name doc-attribute-name">validation_file</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CreateFineTuneRequest.validation_file" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>validation_file: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>validation_file: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1744,16 +1748,19 @@ <h3 id="llmengine.data_types.CreateFineTuneRequest.validation_file" class="doc d
 
 
 <h3 id="llmengine.data_types.CreateFineTuneRequest.hyperparameters" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">hyperparameters</span>
+          <span class="doc doc-object-name doc-attribute-name">hyperparameters</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CreateFineTuneRequest.hyperparameters" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>hyperparameters: Optional[Dict[<span style="color: #008000">str</span>, Any]] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Hyperparameters to pass in to training job.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>hyperparameters: Optional[Dict[<span style="color: #008000">str</span>, Any]] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Hyperparameters to pass in to training job.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1768,16 +1775,19 @@ <h3 id="llmengine.data_types.CreateFineTuneRequest.hyperparameters" class="doc d
 
 
 <h3 id="llmengine.data_types.CreateFineTuneRequest.suffix" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">suffix</span>
+          <span class="doc doc-object-name doc-attribute-name">suffix</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CreateFineTuneRequest.suffix" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>suffix: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Optional user-provided identifier suffix for the fine-tuned model.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>suffix: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Optional user-provided identifier suffix for the fine-tuned model.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1802,21 +1812,20 @@ <h3 id="llmengine.data_types.CreateFineTuneRequest.suffix" class="doc doc-headin
 
 
 <h2 id="llmengine.CreateFineTuneResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CreateFineTuneResponse</span>
+          <span class="doc doc-object-name doc-class-name">CreateFineTuneResponse</span>
 
 
 <a href="#llmengine.CreateFineTuneResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for creating a FineTune.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1832,16 +1841,18 @@ <h2 id="llmengine.CreateFineTuneResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CreateFineTuneResponse.fine_tune_id" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">fine_tune_id</span>
+          <span class="doc doc-object-name doc-attribute-name">fine_tune_id</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CreateFineTuneResponse.fine_tune_id" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>fine_tune_id: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;ID of the created fine-tuning job.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>fine_tune_id: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;ID of the created fine-tuning job.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1866,21 +1877,20 @@ <h3 id="llmengine.data_types.CreateFineTuneResponse.fine_tune_id" class="doc doc
 
 
 <h2 id="llmengine.GetFineTuneResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">GetFineTuneResponse</span>
+          <span class="doc doc-object-name doc-class-name">GetFineTuneResponse</span>
 
 
 <a href="#llmengine.GetFineTuneResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for retrieving a FineTune.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1896,16 +1906,18 @@ <h2 id="llmengine.GetFineTuneResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.GetFineTuneResponse.fine_tune_id" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">fine_tune_id</span>
+          <span class="doc doc-object-name doc-attribute-name">fine_tune_id</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetFineTuneResponse.fine_tune_id" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>fine_tune_id: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;ID of the requested job.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>fine_tune_id: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;ID of the requested job.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1920,16 +1932,18 @@ <h3 id="llmengine.data_types.GetFineTuneResponse.fine_tune_id" class="doc doc-he
 
 
 <h3 id="llmengine.data_types.GetFineTuneResponse.status" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">status</span>
+          <span class="doc doc-object-name doc-attribute-name">status</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetFineTuneResponse.status" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>status: BatchJobStatus <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Status of the requested job.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>status: BatchJobStatus <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Status of the requested job.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1954,21 +1968,20 @@ <h3 id="llmengine.data_types.GetFineTuneResponse.status" class="doc doc-heading"
 
 
 <h2 id="llmengine.ListFineTunesResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">ListFineTunesResponse</span>
+          <span class="doc doc-object-name doc-class-name">ListFineTunesResponse</span>
 
 
 <a href="#llmengine.ListFineTunesResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for listing FineTunes.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1984,16 +1997,19 @@ <h2 id="llmengine.ListFineTunesResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.ListFineTunesResponse.jobs" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">jobs</span>
+          <span class="doc doc-object-name doc-attribute-name">jobs</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.ListFineTunesResponse.jobs" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>jobs: List[GetFineTuneResponse] <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;List of fine-tuning jobs and their statuses.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>jobs: List[GetFineTuneResponse] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;List of fine-tuning jobs and their statuses.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2018,21 +2034,20 @@ <h3 id="llmengine.data_types.ListFineTunesResponse.jobs" class="doc doc-heading"
 
 
 <h2 id="llmengine.CancelFineTuneResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">CancelFineTuneResponse</span>
+          <span class="doc doc-object-name doc-class-name">CancelFineTuneResponse</span>
 
 
 <a href="#llmengine.CancelFineTuneResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for cancelling a FineTune.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -2048,16 +2063,18 @@ <h2 id="llmengine.CancelFineTuneResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.CancelFineTuneResponse.success" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">success</span>
+          <span class="doc doc-object-name doc-attribute-name">success</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.CancelFineTuneResponse.success" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>success: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Whether cancellation was successful.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>success: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Whether cancellation was successful.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2082,21 +2099,20 @@ <h3 id="llmengine.data_types.CancelFineTuneResponse.success" class="doc doc-head
 
 
 <h2 id="llmengine.GetLLMEndpointResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">GetLLMEndpointResponse</span>
+          <span class="doc doc-object-name doc-class-name">GetLLMEndpointResponse</span>
 
 
 <a href="#llmengine.GetLLMEndpointResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for retrieving a Model.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -2112,16 +2128,18 @@ <h2 id="llmengine.GetLLMEndpointResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.name" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">name</span>
+          <span class="doc doc-object-name doc-attribute-name">name</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.name" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>name: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;The name of the model. Use this for making inference requests to the model.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>name: <span style="color: #008000">str</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;The name of the model. Use this for making inference requests to the model.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2136,16 +2154,18 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.name" class="doc doc-heading
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.source" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">source</span>
+          <span class="doc doc-object-name doc-attribute-name">source</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.source" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>source: LLMSource <span style="color: #666666">=</span> Field(description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;The source of the model, e.g. Hugging Face.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>source: LLMSource <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;The source of the model, e.g. Hugging Face.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2160,16 +2180,18 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.source" class="doc doc-headi
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.inference_framework" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">inference_framework</span>
+          <span class="doc doc-object-name doc-attribute-name">inference_framework</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.inference_framework" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>inference_framework: LLMInferenceFramework <span style="color: #666666">=</span> Field(description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;The inference framework used by the model.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>inference_framework: LLMInferenceFramework <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;The inference framework used by the model.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2184,16 +2206,19 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.inference_framework" class="
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.id" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">id</span>
+          <span class="doc doc-object-name doc-attribute-name">id</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.id" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a><span style="color: #008000">id</span>: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;(For self-hosted users) The autogenerated ID of the model.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a><span style="color: #008000">id</span>: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;(For self-hosted users) The autogenerated ID of the model.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2208,16 +2233,19 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.id" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.model_name" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">model_name</span>
+          <span class="doc doc-object-name doc-attribute-name">model_name</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.model_name" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>model_name: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>model_name: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2232,16 +2260,19 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.model_name" class="doc doc-h
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.inference_framework_tag" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">inference_framework_tag</span>
+          <span class="doc doc-object-name doc-attribute-name">inference_framework_tag</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.inference_framework_tag" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>inference_framework_tag: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;(For self-hosted users) The Docker image tag used to run the model.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>inference_framework_tag: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;(For self-hosted users) The Docker image tag used to run the model.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2256,16 +2287,19 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.inference_framework_tag" cla
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.num_shards" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">num_shards</span>
+          <span class="doc doc-object-name doc-attribute-name">num_shards</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.num_shards" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>num_shards: Optional[<span style="color: #008000">int</span>] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;(For self-hosted users) The number of shards.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>num_shards: Optional[<span style="color: #008000">int</span>] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;(For self-hosted users) The number of shards.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2280,16 +2314,19 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.num_shards" class="doc doc-h
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.quantize" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">quantize</span>
+          <span class="doc doc-object-name doc-attribute-name">quantize</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.quantize" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>quantize: Optional[Quantization] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;(For self-hosted users) The quantization method.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>quantize: Optional[Quantization] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;(For self-hosted users) The quantization method.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2304,16 +2341,19 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.quantize" class="doc doc-hea
 
 
 <h3 id="llmengine.data_types.GetLLMEndpointResponse.spec" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">spec</span>
+          <span class="doc doc-object-name doc-attribute-name">spec</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.GetLLMEndpointResponse.spec" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>spec: Optional[GetModelEndpointResponse] <span style="color: #666666">=</span> Field(default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;(For self-hosted users) Model endpoint details.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>spec: Optional[GetModelEndpointResponse] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    default<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;(For self-hosted users) Model endpoint details.&quot;</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2338,21 +2378,20 @@ <h3 id="llmengine.data_types.GetLLMEndpointResponse.spec" class="doc doc-heading
 
 
 <h2 id="llmengine.ListLLMEndpointsResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">ListLLMEndpointsResponse</span>
+          <span class="doc doc-object-name doc-class-name">ListLLMEndpointsResponse</span>
 
 
 <a href="#llmengine.ListLLMEndpointsResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for listing Models.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -2368,16 +2407,18 @@ <h2 id="llmengine.ListLLMEndpointsResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.ListLLMEndpointsResponse.model_endpoints" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">model_endpoints</span>
+          <span class="doc doc-object-name doc-attribute-name">model_endpoints</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.ListLLMEndpointsResponse.model_endpoints" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>model_endpoints: List[GetLLMEndpointResponse] <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;The list of models.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>model_endpoints: List[GetLLMEndpointResponse] <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;The list of models.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -2402,21 +2443,20 @@ <h3 id="llmengine.data_types.ListLLMEndpointsResponse.model_endpoints" class="do
 
 
 <h2 id="llmengine.DeleteLLMEndpointResponse" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">DeleteLLMEndpointResponse</span>
+          <span class="doc doc-object-name doc-class-name">DeleteLLMEndpointResponse</span>
 
 
 <a href="#llmengine.DeleteLLMEndpointResponse" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="pydantic.BaseModel">BaseModel</span></code></p>
 
   
       <p>Response object for deleting a Model.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -2432,16 +2472,18 @@ <h2 id="llmengine.DeleteLLMEndpointResponse" class="doc doc-heading">
 
 
 <h3 id="llmengine.data_types.DeleteLLMEndpointResponse.deleted" class="doc doc-heading">
-        <span class="doc doc-object-name doc-attribute-name">deleted</span>
+          <span class="doc doc-object-name doc-attribute-name">deleted</span>
 
   
   <span class="doc doc-labels">
-      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
       <small class="doc doc-label doc-label-class-attribute"><code>class-attribute</code></small>
+      <small class="doc doc-label doc-label-instance-attribute"><code>instance-attribute</code></small>
   </span>
 
 <a href="#llmengine.data_types.DeleteLLMEndpointResponse.deleted" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>deleted: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> Field(<span style="color: #008000">Ellipsis</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Whether deletion was successful.&#39;</span>)
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>deleted: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> Field(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    <span style="color: #666666">...</span>, description<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Whether deletion was successful.&quot;</span>
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>)
 </code></pre></div>
 
   <div class="doc doc-contents ">
diff --git a/api/error_handling/index.html b/api/error_handling/index.html
index dcce53c52..e9e745289 100644
--- a/api/error_handling/index.html
+++ b/api/error_handling/index.html
@@ -619,7 +619,7 @@ <h1 id="error-handling">Error handling<a class="headerlink" href="#error-handlin
 
 
 <h2 id="llmengine.errors.BadRequestError" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">BadRequestError</span>
+          <span class="doc doc-object-name doc-class-name">BadRequestError</span>
 
 
 <a href="#llmengine.errors.BadRequestError" class="headerlink" title="Permanent link">&para;</a></h2>
@@ -627,15 +627,14 @@ <h2 id="llmengine.errors.BadRequestError" class="doc doc-heading">
 </code></pre></div>
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code>Exception</code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code>Exception</code></p>
 
   
       <p>Corresponds to HTTP 400. Indicates that the request had inputs that were invalid. The user should not
 attempt to retry the request without changing the inputs.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -661,7 +660,7 @@ <h2 id="llmengine.errors.BadRequestError" class="doc doc-heading">
 
 
 <h2 id="llmengine.errors.UnauthorizedError" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">UnauthorizedError</span>
+          <span class="doc doc-object-name doc-class-name">UnauthorizedError</span>
 
 
 <a href="#llmengine.errors.UnauthorizedError" class="headerlink" title="Permanent link">&para;</a></h2>
@@ -669,14 +668,13 @@ <h2 id="llmengine.errors.UnauthorizedError" class="doc doc-heading">
 </code></pre></div>
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code>Exception</code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code>Exception</code></p>
 
   
       <p>Corresponds to HTTP 401. This means that no valid API key was provided.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -702,7 +700,7 @@ <h2 id="llmengine.errors.UnauthorizedError" class="doc doc-heading">
 
 
 <h2 id="llmengine.errors.NotFoundError" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">NotFoundError</span>
+          <span class="doc doc-object-name doc-class-name">NotFoundError</span>
 
 
 <a href="#llmengine.errors.NotFoundError" class="headerlink" title="Permanent link">&para;</a></h2>
@@ -710,8 +708,8 @@ <h2 id="llmengine.errors.NotFoundError" class="doc doc-heading">
 </code></pre></div>
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code>Exception</code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code>Exception</code></p>
 
   
       <p>Corresponds to HTTP 404. This means that the resource (e.g. a Model, FineTune, etc.) could not be found.
@@ -720,7 +718,6 @@ <h2 id="llmengine.errors.NotFoundError" class="doc doc-heading">
 the user does not have access to.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -746,7 +743,7 @@ <h2 id="llmengine.errors.NotFoundError" class="doc doc-heading">
 
 
 <h2 id="llmengine.errors.RateLimitExceededError" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">RateLimitExceededError</span>
+          <span class="doc doc-object-name doc-class-name">RateLimitExceededError</span>
 
 
 <a href="#llmengine.errors.RateLimitExceededError" class="headerlink" title="Permanent link">&para;</a></h2>
@@ -754,14 +751,13 @@ <h2 id="llmengine.errors.RateLimitExceededError" class="doc doc-heading">
 </code></pre></div>
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code>Exception</code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code>Exception</code></p>
 
   
       <p>Corresponds to HTTP 429. Too many requests hit the API too quickly. We recommend an exponential backoff for retries.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -787,7 +783,7 @@ <h2 id="llmengine.errors.RateLimitExceededError" class="doc doc-heading">
 
 
 <h2 id="llmengine.errors.ServerError" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">ServerError</span>
+          <span class="doc doc-object-name doc-class-name">ServerError</span>
 
 
 <a href="#llmengine.errors.ServerError" class="headerlink" title="Permanent link">&para;</a></h2>
@@ -795,14 +791,13 @@ <h2 id="llmengine.errors.ServerError" class="doc doc-heading">
 </code></pre></div>
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code>Exception</code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code>Exception</code></p>
 
   
       <p>Corresponds to HTTP 5xx errors on the server.</p>
 
 
-
   
 
   <div class="doc doc-children">
diff --git a/api/python_client/index.html b/api/python_client/index.html
index d291398d2..0efcb2293 100644
--- a/api/python_client/index.html
+++ b/api/python_client/index.html
@@ -747,15 +747,15 @@ <h1 id="python-client-api-reference">🐍 Python Client API Reference<a class="h
 
 
 <h2 id="llmengine.Completion" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">Completion</span>
+          <span class="doc doc-object-name doc-class-name">Completion</span>
 
 
 <a href="#llmengine.Completion" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="llmengine.api_engine.APIEngine">APIEngine</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="llmengine.api_engine.APIEngine">APIEngine</span></code></p>
 
   
       <p>Completion API. This API is used to generate text completions.</p>
@@ -767,7 +767,6 @@ <h2 id="llmengine.Completion" class="doc doc-heading">
 For each of these modes, you can also choose whether to stream token responses or not.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -780,12 +779,13 @@ <h2 id="llmengine.Completion" class="doc doc-heading">
 
 
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.completion.Completion.create" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">create</span>
+          <span class="doc doc-object-name doc-function-name">create</span>
 
   
   <span class="doc doc-labels">
@@ -793,7 +793,17 @@ <h3 id="llmengine.completion.Completion.create" class="doc doc-heading">
   </span>
 
 <a href="#llmengine.completion.Completion.create" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>create(model: <span style="color: #008000">str</span>, prompt: <span style="color: #008000">str</span>, max_new_tokens: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">20</span>, temperature: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0.2</span>, timeout: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">10</span>, stream: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>) <span style="color: #666666">-&gt;</span> Union[CompletionSyncResponse, Iterator[CompletionStreamResponse]]
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>create(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    model: <span style="color: #008000">str</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    prompt: <span style="color: #008000">str</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>    max_new_tokens: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">20</span>,
+<a id="__codelineno-0-5" name="__codelineno-0-5" href="#__codelineno-0-5"></a>    temperature: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0.2</span>,
+<a id="__codelineno-0-6" name="__codelineno-0-6" href="#__codelineno-0-6"></a>    timeout: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">10</span>,
+<a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a>    stream: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>,
+<a id="__codelineno-0-8" name="__codelineno-0-8" href="#__codelineno-0-8"></a>) <span style="color: #666666">-&gt;</span> Union[
+<a id="__codelineno-0-9" name="__codelineno-0-9" href="#__codelineno-0-9"></a>    CompletionSyncResponse,
+<a id="__codelineno-0-10" name="__codelineno-0-10" href="#__codelineno-0-10"></a>    Iterator[CompletionStreamResponse],
+<a id="__codelineno-0-11" name="__codelineno-0-11" href="#__codelineno-0-11"></a>]
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -808,6 +818,8 @@ <h3 id="llmengine.completion.Completion.create" class="doc doc-heading">
 <a href="../../api/data_types/#llmengine.CompletionStreamV1Response">CompletionStreamV1Response</a>
 with <code>request_id</code> and <code>outputs</code> fields.</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -911,6 +923,8 @@ <h3 id="llmengine.completion.Completion.create" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -997,12 +1011,13 @@ <h3 id="llmengine.completion.Completion.create" class="doc doc-heading">
 
 </div>
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.completion.Completion.acreate" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">acreate</span>
+          <span class="doc doc-object-name doc-function-name">acreate</span>
 
   
   <span class="doc doc-labels">
@@ -1011,7 +1026,17 @@ <h3 id="llmengine.completion.Completion.acreate" class="doc doc-heading">
   </span>
 
 <a href="#llmengine.completion.Completion.acreate" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>acreate(model: <span style="color: #008000">str</span>, prompt: <span style="color: #008000">str</span>, max_new_tokens: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">20</span>, temperature: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0.2</span>, timeout: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">10</span>, stream: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>) <span style="color: #666666">-&gt;</span> Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>acreate(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    model: <span style="color: #008000">str</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    prompt: <span style="color: #008000">str</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>    max_new_tokens: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">20</span>,
+<a id="__codelineno-0-5" name="__codelineno-0-5" href="#__codelineno-0-5"></a>    temperature: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0.2</span>,
+<a id="__codelineno-0-6" name="__codelineno-0-6" href="#__codelineno-0-6"></a>    timeout: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">10</span>,
+<a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a>    stream: <span style="color: #008000">bool</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>,
+<a id="__codelineno-0-8" name="__codelineno-0-8" href="#__codelineno-0-8"></a>) <span style="color: #666666">-&gt;</span> Union[
+<a id="__codelineno-0-9" name="__codelineno-0-9" href="#__codelineno-0-9"></a>    CompletionSyncResponse,
+<a id="__codelineno-0-10" name="__codelineno-0-10" href="#__codelineno-0-10"></a>    AsyncIterable[CompletionStreamResponse],
+<a id="__codelineno-0-11" name="__codelineno-0-11" href="#__codelineno-0-11"></a>]
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1026,6 +1051,8 @@ <h3 id="llmengine.completion.Completion.acreate" class="doc doc-heading">
 <a href="../../api/data_types/#llmengine.CompletionStreamV1Response">CompletionStreamV1Response</a>
 with <code>request_id</code> and <code>outputs</code> fields.</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -1129,6 +1156,8 @@ <h3 id="llmengine.completion.Completion.acreate" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1235,15 +1264,15 @@ <h3 id="llmengine.completion.Completion.acreate" class="doc doc-heading">
 
 
 <h2 id="llmengine.FineTune" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">FineTune</span>
+          <span class="doc doc-object-name doc-class-name">FineTune</span>
 
 
 <a href="#llmengine.FineTune" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="llmengine.api_engine.APIEngine">APIEngine</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="llmengine.api_engine.APIEngine">APIEngine</span></code></p>
 
   
       <p>FineTune API. This API is used to fine-tune models.</p>
@@ -1252,7 +1281,6 @@ <h2 id="llmengine.FineTune" class="doc doc-heading">
 <p>Creating a fine-tune will end with the creation of a Model, which you can view using <code>Model.get(model_name)</code> or delete using <code>Model.delete(model_name)</code>.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1265,12 +1293,13 @@ <h2 id="llmengine.FineTune" class="doc doc-heading">
 
 
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.fine_tuning.FineTune.create" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">create</span>
+          <span class="doc doc-object-name doc-function-name">create</span>
 
   
   <span class="doc doc-labels">
@@ -1278,7 +1307,13 @@ <h3 id="llmengine.fine_tuning.FineTune.create" class="doc doc-heading">
   </span>
 
 <a href="#llmengine.fine_tuning.FineTune.create" class="headerlink" title="Permanent link">&para;</a></h3>
-<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>create(model: <span style="color: #008000">str</span>, training_file: <span style="color: #008000">str</span>, validation_file: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>, hyperparameters: Optional[Dict[<span style="color: #008000">str</span>, <span style="color: #008000">str</span>]] <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>, suffix: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>) <span style="color: #666666">-&gt;</span> CreateFineTuneResponse
+<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>create(
+<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>    model: <span style="color: #008000">str</span>,
+<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a>    training_file: <span style="color: #008000">str</span>,
+<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>    validation_file: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-5" name="__codelineno-0-5" href="#__codelineno-0-5"></a>    hyperparameters: Optional[Dict[<span style="color: #008000">str</span>, <span style="color: #008000">str</span>]] <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-6" name="__codelineno-0-6" href="#__codelineno-0-6"></a>    suffix: Optional[<span style="color: #008000">str</span>] <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
+<a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a>) <span style="color: #666666">-&gt;</span> CreateFineTuneResponse
 </code></pre></div>
 
   <div class="doc doc-contents ">
@@ -1291,6 +1326,8 @@ <h3 id="llmengine.fine_tuning.FineTune.create" class="doc doc-heading">
 currently supported. At least 200 rows of data is recommended to start to see benefits from
 fine-tuning.</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -1383,6 +1420,8 @@ <h3 id="llmengine.fine_tuning.FineTune.create" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1455,12 +1494,13 @@ <h3 id="llmengine.fine_tuning.FineTune.create" class="doc doc-heading">
 
 </div>
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.fine_tuning.FineTune.get" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">get</span>
+          <span class="doc doc-object-name doc-function-name">get</span>
 
   
   <span class="doc doc-labels">
@@ -1481,6 +1521,8 @@ <h3 id="llmengine.fine_tuning.FineTune.get" class="doc doc-heading">
 object with the id and status (<code>PENDING</code>, <code>STARTED</code>,
 <code>UNDEFINED</code>, <code>FAILURE</code> or <code>SUCCESS</code>).</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -1509,6 +1551,8 @@ <h3 id="llmengine.fine_tuning.FineTune.get" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1556,12 +1600,13 @@ <h3 id="llmengine.fine_tuning.FineTune.get" class="doc doc-heading">
 
 </div>
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.fine_tuning.FineTune.list" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">list</span>
+          <span class="doc doc-object-name doc-function-name">list</span>
 
   
   <span class="doc doc-labels">
@@ -1579,6 +1624,8 @@ <h3 id="llmengine.fine_tuning.FineTune.list" class="doc doc-heading">
 It returns a list of pairs of <code>fine_tune_id</code> and <code>status</code> for
 all existing jobs.</p>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1631,12 +1678,13 @@ <h3 id="llmengine.fine_tuning.FineTune.list" class="doc doc-heading">
 
 </div>
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.fine_tuning.FineTune.cancel" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">cancel</span>
+          <span class="doc doc-object-name doc-function-name">cancel</span>
 
   
   <span class="doc doc-labels">
@@ -1655,6 +1703,8 @@ <h3 id="llmengine.fine_tuning.FineTune.cancel" class="doc doc-heading">
 and returns a response object which has a <code>success</code> field
 confirming if the cancellation was successful.</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -1683,6 +1733,8 @@ <h3 id="llmengine.fine_tuning.FineTune.cancel" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1739,15 +1791,15 @@ <h3 id="llmengine.fine_tuning.FineTune.cancel" class="doc doc-heading">
 
 
 <h2 id="llmengine.Model" class="doc doc-heading">
-        <span class="doc doc-object-name doc-class-name">Model</span>
+          <span class="doc doc-object-name doc-class-name">Model</span>
 
 
 <a href="#llmengine.Model" class="headerlink" title="Permanent link">&para;</a></h2>
 
 
   <div class="doc doc-contents first">
-      <p class="doc doc-class-bases">
-        Bases: <code><span title="llmengine.api_engine.APIEngine">APIEngine</span></code></p>
+          <p class="doc doc-class-bases">
+            Bases: <code><span title="llmengine.api_engine.APIEngine">APIEngine</span></code></p>
 
   
       <p>Model API. This API is used to get, list, and delete models. Models include both base
@@ -1756,7 +1808,6 @@ <h2 id="llmengine.Model" class="doc doc-heading">
 <p>See <a href="../../model_zoo">Model Zoo</a> for the list of publicly available base models.</p>
 
 
-
   
 
   <div class="doc doc-children">
@@ -1769,12 +1820,13 @@ <h2 id="llmengine.Model" class="doc doc-heading">
 
 
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.model.Model.get" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">get</span>
+          <span class="doc doc-object-name doc-function-name">get</span>
 
   
   <span class="doc doc-labels">
@@ -1795,6 +1847,8 @@ <h3 id="llmengine.model.Model.get" class="doc doc-heading">
 <a href="../../api/data_types/#llmengine.GetLLMEndpointResponse">GetLLMEndpointResponse</a>
 object.</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -1823,6 +1877,8 @@ <h3 id="llmengine.model.Model.get" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1875,12 +1931,13 @@ <h3 id="llmengine.model.Model.get" class="doc doc-heading">
 
 </div>
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.model.Model.list" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">list</span>
+          <span class="doc doc-object-name doc-function-name">list</span>
 
   
   <span class="doc doc-labels">
@@ -1900,6 +1957,8 @@ <h3 id="llmengine.model.Model.list" class="doc doc-heading">
 <a href="../../api/data_types/#llmengine.GetLLMEndpointResponse">GetLLMEndpointResponse</a>
 objects for all models. The most important field is the model <code>name</code>.</p>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
@@ -1988,12 +2047,13 @@ <h3 id="llmengine.model.Model.list" class="doc doc-heading">
 
 </div>
 
+
 <div class="doc doc-object doc-function">
 
 
 
 <h3 id="llmengine.model.Model.delete" class="doc doc-heading">
-        <span class="doc doc-object-name doc-function-name">delete</span>
+          <span class="doc doc-object-name doc-function-name">delete</span>
 
   
   <span class="doc doc-labels">
@@ -2013,6 +2073,8 @@ <h3 id="llmengine.model.Model.delete" class="doc doc-heading">
 was successful. If called on a base model included with LLM
 Engine, an error will be thrown.</p>
 
+
+
   <p><strong>Parameters:</strong></p>
   <table>
     <thead>
@@ -2041,6 +2103,8 @@ <h3 id="llmengine.model.Model.delete" class="doc doc-heading">
     </tbody>
   </table>
 
+
+
   <p><strong>Returns:</strong></p>
   <table>
     <thead>
diff --git a/index.html b/index.html
index aff738222..38519954e 100644
--- a/index.html
+++ b/index.html
@@ -607,6 +607,11 @@ <h1 id="llm-engine">LLM Engine<a class="headerlink" href="#llm-engine" title="Pe
 <p>LLM Engine is the easiest way to customize and serve LLMs.</p>
 <p>LLMs can be accessed via Scale's hosted version or by using the helm charts in this repository to run model inference and fine-tuning in your own infrastructure.</p>
 <h2 id="quick-install">Quick Install<a class="headerlink" href="#quick-install" title="Permanent link">&para;</a></h2>
+<div class="tabbed-set tabbed-alternate" data-tabs="1:1"><input checked="checked" id="__tabbed_1_1" name="__tabbed_1" type="radio" /><div class="tabbed-labels"><label for="__tabbed_1_1">Install the python package</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block"></div>
+</div>
+</div>
 <div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%;"><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>pip install scale-llm-engine
 </code></pre></div>
 <h2 id="about">About<a class="headerlink" href="#about" title="Permanent link">&para;</a></h2>
@@ -620,9 +625,11 @@ <h2 id="about">About<a class="headerlink" href="#about" title="Permanent link">&
 Scale's hosted infrastructure or do it in your own cloud infrastructure using
 Kubernetes.</p>
 <h3 id="key-features">Key Features<a class="headerlink" href="#key-features" title="Permanent link">&para;</a></h3>
-<p><strong>Ready-to-use APIs for your favorite models</strong>: Serve and fine-tune
-open-source foundation models, including MPT, Falcon,
-and LLaMA. Use Scale-hosted models or deploy to your own infrastructure.</p>
+<p><strong>Ready-to-use APIs for your favorite models</strong>: Deploy and serve
+open-source foundation models - including LLaMA, MPT and Falcon.
+Use Scale-hosted models or deploy to your own infrastructure.</p>
+<p><strong>Fine-tune your favorite models</strong>: Fine-tune open-source foundation
+models like LLaMA, MPT etc. with your own data for optimized performance.</p>
 <p><strong>Optimized Inference</strong>: LLM Engine provides inference APIs
 for streaming responses and dynamically batching inputs for higher throughput
 and lower latency.</p>
diff --git a/search/search_index.json b/search/search_index.json
index 1ed2056f6..9b5425a85 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Introduction","text":""},{"location":"#llm-engine","title":"LLM Engine","text":"<p>The open source engine for inference and fine-tuning of Large Language Models.</p> <p>LLM Engine is the easiest way to customize and serve LLMs.</p> <p>LLMs can be accessed via Scale's hosted version or by using the helm charts in this repository to run model inference and fine-tuning in your own infrastructure.</p>"},{"location":"#quick-install","title":"Quick Install","text":"<pre><code>pip install scale-llm-engine\n</code></pre>"},{"location":"#about","title":"About","text":"<p>Foundation models are emerging as the building blocks of AI. However, deploying these models to the cloud and fine-tuning them are expensive operations that require infrastructure and ML expertise. It is also difficult to maintain over time as new models are released and new techniques for both inference and fine-tuning are made available.</p> <p>LLM Engine is a Python library, CLI, and Helm chart that provides everything you need to serve and fine-tune foundation models, whether you use Scale's hosted infrastructure or do it in your own cloud infrastructure using Kubernetes.</p>"},{"location":"#key-features","title":"Key Features","text":"<p>Ready-to-use APIs for your favorite models: Serve and fine-tune open-source foundation models, including MPT, Falcon, and LLaMA. Use Scale-hosted models or deploy to your own infrastructure.</p> <p>Optimized Inference: LLM Engine provides inference APIs for streaming responses and dynamically batching inputs for higher throughput and lower latency.</p> <p>Open-Source Integrations: Deploy any Hugging Face model with a single command.</p> <p>Deploying from any docker image: Turn any Docker image into an auto-scaling deployment with simple APIs.</p>"},{"location":"#features-coming-soon","title":"Features Coming Soon","text":"<p>k8s Installation Documentation: We are working hard to document installation and maintenance of inference and fine-tuning functionality on your own infrastructure. For now, our documentation covers using our client libraries to access Scale's hosted infrastructure.</p> <p>Fast Cold-Start Times: To prevent GPUs from idling, LLM Engine automatically scales your model to zero when it's not in use and scales up within seconds, even for large foundation models.</p> <p>Cost Optimization: Deploy AI models cheaper than commercial ones, including cold-start and warm-down times.</p>"},{"location":"contributing/","title":"Contributing to LLM Engine","text":""},{"location":"contributing/#updating-llm-engine-documentation","title":"Updating LLM Engine Documentation","text":"<p>LLM Engine leverages mkdocs to create beautiful, community-oriented documentation.</p>"},{"location":"contributing/#step-1-clone-the-repository","title":"Step 1: Clone the Repository","text":"<p>Clone/Fork the LLM Engine Repository. Our documentation lives in the <code>docs</code> folder.</p>"},{"location":"contributing/#step-2-install-the-dependencies","title":"Step 2: Install the Dependencies","text":"<p>Dependencies are located in <code>requirements-docs.txt</code>, go ahead and pip install those with </p> <pre><code>pip install -r requirements-docs.txt\n</code></pre>"},{"location":"contributing/#step-3-run-locally","title":"Step 3: Run Locally","text":"<p>To run the documentation service locally, execute the following command:</p> <pre><code>mkdocs serve\n</code></pre> <p>This should kick off a locally running instance on http://127.0.0.1:8000/.</p> <p>As you edit the content in the <code>docs</code> folder, the site will be automatically reloaded on each file save.</p>"},{"location":"contributing/#step-4-editing-navigation-and-settings","title":"Step 4: Editing Navigation and Settings","text":"<p>If you are less familair with <code>mkdocs</code>, in addition to the markdown content in the <code>docs</code> folder, there is a top-level <code>mkdocs.yml</code> file as well that defines the navigation pane and other website settings. If you don't see your page where you think it should be, double-check the .yml file.</p>"},{"location":"contributing/#step-5-building-and-deploying","title":"Step 5: Building and Deploying","text":"<p>CircleCI (via <code>.circleci/config.yml</code>) handles the building and deployment of our documentation service for us.</p>"},{"location":"faq/","title":"Frequently Asked Questions","text":""},{"location":"getting_started/","title":"Getting Started","text":"<p>The fastest way to get started with LLM Engine is to use the python client in this repository to run inference and fine-tuning on Scale's infrastructure. This path does not require you to install anything on your infrastructure, and Scale's free tier gives you access to experimentation using open source LLMs.</p> <p>To start with, install LLM Engine via pip or conda:</p> pipconda <pre><code>pip install scale-llm-engine\n</code></pre> <pre><code>conda install scale-llm-engine -c conda-forge\n</code></pre>"},{"location":"getting_started/#scale-api-keys","title":"Scale API Keys","text":"<p>Next, you need a Scale Spellbook API key.</p>"},{"location":"getting_started/#retrieving-your-api-key","title":"Retrieving your API Key","text":"<p>To retrieve your API key, head to Scale Spellbook where you will get an API key on the settings page.</p> <p>Different API Keys for different Scale Products</p> <p>If you have leveraged Scale's platform for annotation work in the past, please note that your Spellbook API key will be different than the Scale Annotation API key. You will want to create a Spellbook API key before getting started.</p>"},{"location":"getting_started/#set-your-api-key","title":"Set your API Key","text":"<p>LLM Engine uses environment variables to access your API key.</p> <p>Set this API key as the <code>SCALE_API_KEY</code> environment variable by running the following command in your terminal before you run your python application.</p> <pre><code>export SCALE_API_KEY=\"[Your API key]\"\n</code></pre> <p>You can also add in the line above to your <code>.zshrc</code> or <code>.bash_profile</code> so it's automatically set for future sessions.</p>"},{"location":"getting_started/#example-code","title":"Example Code","text":""},{"location":"getting_started/#sample-completion","title":"Sample Completion","text":"<p>With your API key set, you can now send LLM Engine requests using the Python client:</p> <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model=\"falcon-7b-instruct\",\n    prompt=\"I'm opening a pancake restaurant that specializes in unique pancake shapes, colors, and flavors. List 3 quirky names I could name my restaurant.\",\n    max_new_tokens=100,\n    temperature=0.2,\n)\n\nprint(response.outputs[0].text)\n</code></pre>"},{"location":"getting_started/#with-streaming","title":"With Streaming","text":"<pre><code>from llmengine import Completion\nimport sys\nstream = Completion.create(\n    model=\"falcon-7b-instruct\",\n    prompt=\"Give me a 200 word summary on the current economic events in the US.\",\n    max_new_tokens=1000,\n    temperature=0.2,\n    stream=True\n)\n\nfor response in stream:\n    if response.output:\n        print(response.output.text, end=\"\")\n        sys.stdout.flush()\n</code></pre>"},{"location":"model_zoo/","title":"Public Model Zoo","text":"<p>Scale hosts the following models in a model zoo:</p> Model Name Inference APIs Available Fine-tuning APIs Available <code>llama-7b</code> \u2705 \u2705 <code>falcon-7b</code> \u2705 <code>falcon-7b-instruct</code> \u2705 <code>falcon-40b</code> \u2705 <code>falcon-40b-instruct</code> \u2705 <code>mpt-7b</code> \u2705 <code>mpt-7b-instruct</code> \u2705 \u2705 <code>flan-t5-xxl</code> \u2705"},{"location":"model_zoo/#usage","title":"Usage","text":"<p>Each of these models can be used with the Completion API.</p> <p>The specified models can be fine-tuned with the FineTune API.</p> <p>More information about the models can be found using the Model API.</p>"},{"location":"api/data_types/","title":"\ud83d\udc0d Python Client Data Type Reference","text":""},{"location":"api/data_types/#llmengine.CompletionOutput","title":"CompletionOutput","text":"<p>         Bases: <code>BaseModel</code></p> <p>Represents the output of a completion request to a model.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionOutput.text","title":"text  <code>instance-attribute</code>","text":"<pre><code>text: str\n</code></pre> <p>The text of the completion.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionOutput.num_completion_tokens","title":"num_completion_tokens  <code>instance-attribute</code>","text":"<pre><code>num_completion_tokens: int\n</code></pre> <p>Number of tokens in the completion.</p>"},{"location":"api/data_types/#llmengine.CompletionStreamOutput","title":"CompletionStreamOutput","text":"<p>         Bases: <code>BaseModel</code></p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamOutput.text","title":"text  <code>instance-attribute</code>","text":"<pre><code>text: str\n</code></pre> <p>The text of the completion.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamOutput.finished","title":"finished  <code>instance-attribute</code>","text":"<pre><code>finished: bool\n</code></pre> <p>Whether the completion is finished.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamOutput.num_completion_tokens","title":"num_completion_tokens  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>num_completion_tokens: Optional[int] = None\n</code></pre> <p>Number of tokens in the completion.</p>"},{"location":"api/data_types/#llmengine.CompletionSyncResponse","title":"CompletionSyncResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for a synchronous prompt completion.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionSyncResponse.output","title":"output  <code>instance-attribute</code>","text":"<pre><code>output: CompletionOutput\n</code></pre> <p>Completion output.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionSyncResponse.request_id","title":"request_id  <code>instance-attribute</code>","text":"<pre><code>request_id: str\n</code></pre> <p>Unique ID of request.</p>"},{"location":"api/data_types/#llmengine.CompletionStreamResponse","title":"CompletionStreamResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for a stream prompt completion task.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamResponse.output","title":"output  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>output: Optional[CompletionStreamOutput] = None\n</code></pre> <p>Completion output.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamResponse.request_id","title":"request_id  <code>instance-attribute</code>","text":"<pre><code>request_id: str\n</code></pre> <p>Unique ID of request.</p>"},{"location":"api/data_types/#llmengine.CreateFineTuneRequest","title":"CreateFineTuneRequest","text":"<p>         Bases: <code>BaseModel</code></p> <p>Request object for creating a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.model","title":"model  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>model: str = Field(Ellipsis, description='Identifier of base model to train from.')\n</code></pre> <p>Identifier of base model to train from.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.training_file","title":"training_file  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>training_file: str = Field(Ellipsis, description=\"Path to file of training dataset. Dataset must be a csv with columns 'prompt' and 'response'.\")\n</code></pre> <p>Path to file of training dataset. Dataset must be a csv with columns 'prompt' and 'response'.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.validation_file","title":"validation_file  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>validation_file: Optional[str] = Field(default=None, description='Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.')\n</code></pre> <p>Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.hyperparameters","title":"hyperparameters  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>hyperparameters: Optional[Dict[str, Any]] = Field(default=None, description='Hyperparameters to pass in to training job.')\n</code></pre> <p>Hyperparameters to pass in to training job.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.suffix","title":"suffix  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>suffix: Optional[str] = Field(default=None, description='Optional user-provided identifier suffix for the fine-tuned model.')\n</code></pre> <p>Optional user-provided identifier suffix for the fine-tuned model.</p>"},{"location":"api/data_types/#llmengine.CreateFineTuneResponse","title":"CreateFineTuneResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for creating a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneResponse.fine_tune_id","title":"fine_tune_id  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>fine_tune_id: str = Field(Ellipsis, description='ID of the created fine-tuning job.')\n</code></pre> <p>The ID of the FineTune.</p>"},{"location":"api/data_types/#llmengine.GetFineTuneResponse","title":"GetFineTuneResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for retrieving a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.GetFineTuneResponse.fine_tune_id","title":"fine_tune_id  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>fine_tune_id: str = Field(Ellipsis, description='ID of the requested job.')\n</code></pre> <p>The ID of the FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.GetFineTuneResponse.status","title":"status  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>status: BatchJobStatus = Field(Ellipsis, description='Status of the requested job.')\n</code></pre> <p>The status of the FineTune job.</p>"},{"location":"api/data_types/#llmengine.ListFineTunesResponse","title":"ListFineTunesResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for listing FineTunes.</p>"},{"location":"api/data_types/#llmengine.data_types.ListFineTunesResponse.jobs","title":"jobs  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>jobs: List[GetFineTuneResponse] = Field(Ellipsis, description='List of fine-tuning jobs and their statuses.')\n</code></pre> <p>A list of FineTunes, represented as <code>GetFineTuneResponse</code>s.</p>"},{"location":"api/data_types/#llmengine.CancelFineTuneResponse","title":"CancelFineTuneResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for cancelling a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.CancelFineTuneResponse.success","title":"success  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>success: bool = Field(Ellipsis, description='Whether cancellation was successful.')\n</code></pre> <p>Whether the cancellation succeeded.</p>"},{"location":"api/data_types/#llmengine.GetLLMEndpointResponse","title":"GetLLMEndpointResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for retrieving a Model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.name","title":"name  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>name: str = Field(description='The name of the model. Use this for making inference requests to the model.')\n</code></pre> <p>The name of the model. Use this for making inference requests to the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.source","title":"source  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>source: LLMSource = Field(description='The source of the model, e.g. Hugging Face.')\n</code></pre> <p>The source of the model, e.g. Hugging Face.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.inference_framework","title":"inference_framework  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>inference_framework: LLMInferenceFramework = Field(description='The inference framework used by the model.')\n</code></pre> <p>The inference framework used by the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.id","title":"id  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>id: Optional[str] = Field(default=None, description='(For self-hosted users) The autogenerated ID of the model.')\n</code></pre> <p>(For self-hosted users) The autogenerated ID of the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.model_name","title":"model_name  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>model_name: Optional[str] = Field(default=None, description='(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.')\n</code></pre> <p>(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as <code>name</code>.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.inference_framework_tag","title":"inference_framework_tag  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>inference_framework_tag: Optional[str] = Field(default=None, description='(For self-hosted users) The Docker image tag used to run the model.')\n</code></pre> <p>(For self-hosted users) The Docker image tag used to run the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.num_shards","title":"num_shards  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>num_shards: Optional[int] = Field(default=None, description='(For self-hosted users) The number of shards.')\n</code></pre> <p>(For self-hosted users) The number of shards.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.quantize","title":"quantize  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>quantize: Optional[Quantization] = Field(default=None, description='(For self-hosted users) The quantization method.')\n</code></pre> <p>(For self-hosted users) The quantization method.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.spec","title":"spec  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>spec: Optional[GetModelEndpointResponse] = Field(default=None, description='(For self-hosted users) Model endpoint details.')\n</code></pre> <p>(For self-hosted users) Model endpoint details.</p>"},{"location":"api/data_types/#llmengine.ListLLMEndpointsResponse","title":"ListLLMEndpointsResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for listing Models.</p>"},{"location":"api/data_types/#llmengine.data_types.ListLLMEndpointsResponse.model_endpoints","title":"model_endpoints  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>model_endpoints: List[GetLLMEndpointResponse] = Field(Ellipsis, description='The list of models.')\n</code></pre> <p>A list of Models, represented as <code>GetLLMEndpointResponse</code>s.</p>"},{"location":"api/data_types/#llmengine.DeleteLLMEndpointResponse","title":"DeleteLLMEndpointResponse","text":"<p>         Bases: <code>BaseModel</code></p> <p>Response object for deleting a Model.</p>"},{"location":"api/data_types/#llmengine.data_types.DeleteLLMEndpointResponse.deleted","title":"deleted  <code>instance-attribute</code> <code>class-attribute</code>","text":"<pre><code>deleted: bool = Field(Ellipsis, description='Whether deletion was successful.')\n</code></pre> <p>Whether the deletion succeeded.</p>"},{"location":"api/error_handling/","title":"Error handling","text":"<p>LLM Engine uses conventional HTTP response codes to indicate the success or failure of an API request. In general: codes in the <code>2xx</code> range indicate success. Codes in the <code>4xx</code> range indicate indicate an error that failed given the  information provided (e.g. a given Model was not found, or an invalid temperature was specified). Codes in the <code>5xx</code>  range indicate an error with the LLM Engine servers.</p> <p>In the Python client, errors are presented via a set of corresponding Exception classes, which should be caught  and handled by the user accordingly.</p>"},{"location":"api/error_handling/#llmengine.errors.BadRequestError","title":"BadRequestError","text":"<pre><code>BadRequestError(message: str)\n</code></pre> <p>         Bases: <code>Exception</code></p> <p>Corresponds to HTTP 400. Indicates that the request had inputs that were invalid. The user should not attempt to retry the request without changing the inputs.</p>"},{"location":"api/error_handling/#llmengine.errors.UnauthorizedError","title":"UnauthorizedError","text":"<pre><code>UnauthorizedError(message: str)\n</code></pre> <p>         Bases: <code>Exception</code></p> <p>Corresponds to HTTP 401. This means that no valid API key was provided.</p>"},{"location":"api/error_handling/#llmengine.errors.NotFoundError","title":"NotFoundError","text":"<pre><code>NotFoundError(message: str)\n</code></pre> <p>         Bases: <code>Exception</code></p> <p>Corresponds to HTTP 404. This means that the resource (e.g. a Model, FineTune, etc.) could not be found. Note that this can also be returned in some cases where the object might exist, but the user does not have access to the object. This is done to avoid leaking information about the existence or nonexistence of said object that the user does not have access to.</p>"},{"location":"api/error_handling/#llmengine.errors.RateLimitExceededError","title":"RateLimitExceededError","text":"<pre><code>RateLimitExceededError(message: str)\n</code></pre> <p>         Bases: <code>Exception</code></p> <p>Corresponds to HTTP 429. Too many requests hit the API too quickly. We recommend an exponential backoff for retries.</p>"},{"location":"api/error_handling/#llmengine.errors.ServerError","title":"ServerError","text":"<pre><code>ServerError(status_code: int, message: str)\n</code></pre> <p>         Bases: <code>Exception</code></p> <p>Corresponds to HTTP 5xx errors on the server.</p>"},{"location":"api/langchain/","title":"\ud83e\udd9c Langchain","text":"<p>Coming soon!</p>"},{"location":"api/python_client/","title":"\ud83d\udc0d Python Client API Reference","text":""},{"location":"api/python_client/#llmengine.Completion","title":"Completion","text":"<p>         Bases: <code>APIEngine</code></p> <p>Completion API. This API is used to generate text completions.</p> <p>Language models are trained to understand natural language and predict text outputs as a response to their inputs. The inputs are called prompts and the outputs are referred to as completions. LLMs take the input prompts and chunk them into smaller units called tokens to process and generate language. Tokens may include trailing spaces and even sub-words; this process is language dependent.</p> <p>The Completion API can be run either synchronous or asynchronously (via Python <code>asyncio</code>). For each of these modes, you can also choose whether to stream token responses or not.</p>"},{"location":"api/python_client/#llmengine.completion.Completion.create","title":"create  <code>classmethod</code>","text":"<pre><code>create(model: str, prompt: str, max_new_tokens: int = 20, temperature: float = 0.2, timeout: int = 10, stream: bool = False) -&gt; Union[CompletionSyncResponse, Iterator[CompletionStreamResponse]]\n</code></pre> <p>Creates a completion for the provided prompt and parameters synchronously.</p> <p>This API can be used to get the LLM to generate a completion synchronously. It takes as parameters the <code>model</code>see Model Zoo and the <code>prompt</code>. Optionally it takes <code>max_new_tokens</code>, <code>temperature</code>, <code>timeout</code> and <code>stream</code>. It returns CompletionSyncV1Response if <code>stream=False</code> or an async iterator of CompletionStreamV1Response with <code>request_id</code> and <code>outputs</code> fields.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>str</code> <p>Name of the model to use. See Model Zoo for a list of Models that are supported.</p> required <code>prompt</code> <code>str</code> <p>The prompt to generate completions for, encoded as a string.</p> required <code>max_new_tokens</code> <code>int</code> <p>The maximum number of tokens to generate in the completion.</p> <p>The token count of your prompt plus <code>max_new_tokens</code> cannot exceed the model's context length. See Model Zoo for information on each supported model's context length.</p> <code>20</code> <code>temperature</code> <code>float</code> <p>What sampling temperature to use, in the range <code>(0, 1]</code>. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.</p> <code>0.2</code> <code>timeout</code> <code>int</code> <p>Timeout in seconds. This is the maximum amount of time you are willing to wait for a response.</p> <code>10</code> <code>stream</code> <code>bool</code> <p>Whether to stream the response. If true, the return type is an <code>Iterator[CompletionStreamResponse]</code>. Otherwise, the return type is a <code>CompletionSyncResponse</code>. When streaming, tokens will be sent as data-only server-sent events.</p> <code>False</code> <p>Returns:</p> Name Type Description <code>response</code> <code>Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]</code> <p>The generated response (if <code>stream=False</code>) or iterator of response chunks (if <code>stream=True</code>)</p> <p>Token streaming can be used to reduce percieved latency for applications:</p> Synchronous completion without token streaming in pythonResponse in json <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"Hello, my name is\",\n    max_new_tokens=10,\n    temperature=0.2,\n)\nprint(response.json())\n</code></pre> <pre><code>{\n    \"request_id\": \"0123456789\",\n    \"outputs\":\n    [\n        {\n            \"text\": \"_______ and I am a _______\",\n            \"num_completion_tokens\": 10\n}\n    ],\n    \"traceback\": null\n}\n</code></pre> <p>Here is how applications can use streaming:</p> Synchronous completion with token streaming in pythonResponse in json <pre><code>from llmengine import Completion\n\nstream = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"why is the sky blue?\",\n    max_new_tokens=5,\n    temperature=0.2,\n    stream=True,\n)\n\nfor response in stream:\n    if response.output:\n        print(response.json())\n</code></pre> <pre><code>{\"request_id\": \"0123456789\", \"output\": {\"text\": \"\\n\", \"finished\": false, \"num_completion_tokens\": 1 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"I\", \"finished\": false, \"num_completion_tokens\": 2 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" don\", \"finished\": false, \"num_completion_tokens\": 3 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"\u2019\", \"finished\": false, \"num_completion_tokens\": 4 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"t\", \"finished\": true, \"num_completion_tokens\": 5 } }\n</code></pre>"},{"location":"api/python_client/#llmengine.completion.Completion.acreate","title":"acreate  <code>async</code> <code>classmethod</code>","text":"<pre><code>acreate(model: str, prompt: str, max_new_tokens: int = 20, temperature: float = 0.2, timeout: int = 10, stream: bool = False) -&gt; Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]\n</code></pre> <p>Creates a completion for the provided prompt and parameters asynchronously (with <code>asyncio</code>).</p> <p>This API can be used to get the LLM to generate a completion asynchronously. It takes as parameters the <code>model</code>see Model Zoo and the <code>prompt</code>. Optionally it takes <code>max_new_tokens</code>, <code>temperature</code>, <code>timeout</code> and <code>stream</code>. It returns CompletionSyncV1Response if <code>stream=False</code> or an async iterator of CompletionStreamV1Response with <code>request_id</code> and <code>outputs</code> fields.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>str</code> <p>Name of the model to use. See Model Zoo for a list of Models that are supported.</p> required <code>prompt</code> <code>str</code> <p>The prompt to generate completions for, encoded as a string.</p> required <code>max_new_tokens</code> <code>int</code> <p>The maximum number of tokens to generate in the completion.</p> <p>The token count of your prompt plus <code>max_new_tokens</code> cannot exceed the model's context length. See Model Zoo for information on each supported model's context length.</p> <code>20</code> <code>temperature</code> <code>float</code> <p>What sampling temperature to use, in the range <code>(0, 1]</code>. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.</p> <code>0.2</code> <code>timeout</code> <code>int</code> <p>Timeout in seconds. This is the maximum amount of time you are willing to wait for a response.</p> <code>10</code> <code>stream</code> <code>bool</code> <p>Whether to stream the response. If true, the return type is an <code>Iterator[CompletionStreamResponse]</code>. Otherwise, the return type is a <code>CompletionSyncResponse</code>. When streaming, tokens will be sent as data-only server-sent events.</p> <code>False</code> <p>Returns:</p> Name Type Description <code>response</code> <code>Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]</code> <p>The generated response (if <code>stream=False</code>) or iterator of response chunks (if <code>stream=True</code>)</p> <p>Token streaming can be used to reduce percieved latency for applications:</p> Asynchronous completion without token streaming in pythonResponse in json <pre><code>import asyncio\nfrom llmengine import Completion\n\nasync def main():\n    response = await Completion.acreate(\n        model=\"llama-7b\",\n        prompt=\"Hello, my name is\",\n        max_new_tokens=10,\n        temperature=0.2,\n    )\n    print(response.json())\n\nasyncio.run(main())\n</code></pre> <pre><code>{\n    \"request_id\": \"b1b2c3d4e5f6g7h8i9j0\",\n    \"outputs\":\n    [\n        {\n            \"text\": \"_______, and I am a _____\",\n            \"num_completion_tokens\": 10\n}\n    ],\n}\n</code></pre> <p>Here is how applications can use streaming:</p> Asynchronous completion with token streaming in pythonResponse in json <pre><code>import asyncio\nfrom llmengine import Completion\n\nasync def main():\n    stream = await Completion.acreate(\n        model=\"llama-7b\",\n        prompt=\"why is the sky blue?\",\n        max_new_tokens=5,\n        temperature=0.2,\n        stream=True,\n    )\n\nasync for response in stream:\n        if response.output:\n            print(response.json())\n\nasyncio.run(main())\n</code></pre> <pre><code>{\"request_id\": \"0123456789\", \"output\": {\"text\": \"\\n\", \"finished\": false, \"num_completion_tokens\": 1}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"I\", \"finished\": false, \"num_completion_tokens\": 2}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" think\", \"finished\": false, \"num_completion_tokens\": 3}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" the\", \"finished\": false, \"num_completion_tokens\": 4}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" sky\", \"finished\": true, \"num_completion_tokens\": 5}}\n</code></pre>"},{"location":"api/python_client/#llmengine.FineTune","title":"FineTune","text":"<p>         Bases: <code>APIEngine</code></p> <p>FineTune API. This API is used to fine-tune models.</p> <p>Fine-tuning is a process where the LLM is further trained on a task-specific dataset, allowing the model to adjust its parameters to better align with the task at hand. Fine-tuning is a supervised training phase, where prompt/response pairs are provided to optimize the performance of the LLM.</p> <p>LLM Engine provides APIs to create fine-tunes on a base model with training &amp; validation datasets. APIs are also provided to list, cancel and retrieve fine-tuning jobs.</p> <p>Creating a fine-tune will end with the creation of a Model, which you can view using <code>Model.get(model_name)</code> or delete using <code>Model.delete(model_name)</code>.</p>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.create","title":"create  <code>classmethod</code>","text":"<pre><code>create(model: str, training_file: str, validation_file: Optional[str] = None, hyperparameters: Optional[Dict[str, str]] = None, suffix: Optional[str] = None) -&gt; CreateFineTuneResponse\n</code></pre> <p>Creates a job that fine-tunes a specified model from a given dataset.</p> <p>This API can be used to fine-tune a model. The model is the name of base model (Model Zoo for available models) to fine-tune. The training file should consist of prompt and response pairs. Your data must be formatted as a CSV file that includes two columns: <code>prompt</code> and <code>response</code>. A maximum of 100,000 rows of data is currently supported. At least 200 rows of data is recommended to start to see benefits from fine-tuning.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>`str`</code> <p>The name of the base model to fine-tune. See Model Zoo for the list of available models to fine-tune.</p> required <code>training_file</code> <code>`str`</code> <p>Publicly accessible URL to a CSV file for training.</p> required <code>validation_file</code> <code>`Optional[str]`</code> <p>Publicly accessible URL to a CSV file for validation.</p> <code>None</code> <code>hyperparameters</code> <code>`Optional[Dict[str, str]]`</code> <p>A dict of hyperparameters to customize fine-tuning behavior.</p> <p>Currently supported hyperparameters:</p> <ul> <li><code>lr</code>: Peak learning rate used during fine-tuning. It decays with a cosine schedule afterward. (Default: 2e-5)</li> <li><code>warmup_ratio</code>: Ratio of training steps used for learning rate warmup. (Default: 0.03)</li> <li><code>epochs</code>: Number of fine-tuning epochs. (Default: 5)</li> <li><code>weight_decay</code>: Regularization penalty applied to learned weights. (Default: 0.001)</li> <li><code>max_seq_length</code>: Maximum number of tokens per sequence in the dataset. (Default: 1024)</li> </ul> <code>None</code> <code>suffix</code> <code>`Optional[str]`</code> <p>A string that will be added to your fine-tuned model name.</p> <code>None</code> <p>Returns:</p> Name Type Description <code>CreateFineTuneResponse</code> <code>CreateFineTuneResponse</code> <p>an object that contains the ID of the created fine-tuning job</p> <p>Here is an example script to create a 5-row CSV of properly formatted data for fine-tuning an airline question answering bot:</p> Formatting data in python <pre><code>import csv\n# Define data\ndata = [\n  (\"What is your policy on carry-on luggage?\", \"Our policy allows each passenger to bring one piece of carry-on luggage and one personal item such as a purse or briefcase. The maximum size for carry-on luggage is 22 x 14 x 9 inches.\"),\n  (\"How can I change my flight?\", \"You can change your flight through our website or mobile app. Go to 'Manage my booking' section, enter your booking reference and last name, then follow the prompts to change your flight.\"),\n  (\"What meals are available on my flight?\", \"We offer a variety of meals depending on the flight's duration and route. These can range from snacks and light refreshments to full-course meals on long-haul flights. Specific meal options can be viewed during the booking process.\"),\n  (\"How early should I arrive at the airport before my flight?\", \"We recommend arriving at least two hours before domestic flights and three hours before international flights.\"),\n  \"Can I select my seat in advance?\", \"Yes, you can select your seat during the booking process or afterwards via the 'Manage my booking' section on our website or mobile app.\"),\n  ]\n\n# Write data to a CSV file\nwith open('customer_service_data.csv', 'w', newline='') as file:\n    writer = csv.writer(file)\n    writer.writerow([\"prompt\", \"response\"])\n    writer.writerows(data)\n</code></pre> <p>Example code for fine-tuning:</p> Fine-tuning in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.create(\n    model=\"llama-7b\",\n    training_file=\"https://my-bucket.s3.us-west-2.amazonaws.com/path/to/training-file.csv\",\n)\n\nprint(response.json())\n</code></pre> <pre><code>{\n    \"fine_tune_id\": \"ft_abc123\"\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.get","title":"get  <code>classmethod</code>","text":"<pre><code>get(fine_tune_id: str) -&gt; GetFineTuneResponse\n</code></pre> <p>Get status of a fine-tuning job.</p> <p>This API can be used to get the status of an already running fine-tuning job. It takes as a single parameter the <code>fine_tune_id</code> and returns a GetFineTuneResponse object with the id and status (<code>PENDING</code>, <code>STARTED</code>, <code>UNDEFINED</code>, <code>FAILURE</code> or <code>SUCCESS</code>).</p> <p>Parameters:</p> Name Type Description Default <code>fine_tune_id</code> <code>`str`</code> <p>ID of the fine-tuning job</p> required <p>Returns:</p> Name Type Description <code>GetFineTuneResponse</code> <code>GetFineTuneResponse</code> <p>an object that contains the ID and status of the requested job</p> Getting status of fine-tuning in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.get(\n    fine_tune_id=\"ft_abc123\",\n)\n\nprint(response.json())\n</code></pre> <pre><code>{\n    \"fine_tune_id\": \"ft_abc123\",\n    \"status\": \"STARTED\"\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.list","title":"list  <code>classmethod</code>","text":"<pre><code>list() -&gt; ListFineTunesResponse\n</code></pre> <p>List fine-tuning jobs.</p> <p>This API can be used to list all the fine-tuning jobs. It returns a list of pairs of <code>fine_tune_id</code> and <code>status</code> for all existing jobs.</p> <p>Returns:</p> Name Type Description <code>ListFineTunesResponse</code> <code>ListFineTunesResponse</code> <p>an object that contains a list of all fine-tuning jobs and their statuses</p> Listing fine-tuning jobs in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.list()\nprint(response.json())\n</code></pre> <pre><code>{\n    \"jobs\": [\n        {\n            \"fine_tune_id\": \"ft_abc123\",\n            \"status\": \"STARTED\"\n},\n        {\n            \"fine_tune_id\": \"ft_def456\",\n            \"status\": \"SUCCESS\"\n}\n    ]\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.cancel","title":"cancel  <code>classmethod</code>","text":"<pre><code>cancel(fine_tune_id: str) -&gt; CancelFineTuneResponse\n</code></pre> <p>Cancel a fine-tuning job.</p> <p>This API can be used to cancel an existing fine-tuning job if it's no longer required. It takes as parameter the <code>fine_tune_id</code> and returns a response object which has a <code>success</code> field confirming if the cancellation was successful.</p> <p>Parameters:</p> Name Type Description Default <code>fine_tune_id</code> <code>`str`</code> <p>ID of the fine-tuning job</p> required <p>Returns:</p> Name Type Description <code>CancelFineTuneResponse</code> <code>CancelFineTuneResponse</code> <p>an object that contains whether the cancellation was successful</p> Cancelling fine-tuning job in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.cancel(fine_tune_id=\"ft_abc123\")\nprint(response.json())\n</code></pre> <pre><code>{\n    \"success\": true\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.Model","title":"Model","text":"<p>         Bases: <code>APIEngine</code></p> <p>Model API. This API is used to get, list, and delete models. Models include both base models built into LLM Engine, and fine-tuned models that you create through the FineTune.create() API.</p> <p>See Model Zoo for the list of publicly available base models.</p>"},{"location":"api/python_client/#llmengine.model.Model.get","title":"get  <code>classmethod</code>","text":"<pre><code>get(model: str) -&gt; GetLLMEndpointResponse\n</code></pre> <p>Get information about an LLM model.</p> <p>This API can be used to get information about a Model's source and inference framework. For self-hosted users, it returns additional information about number of shards, quantization, infra settings, etc. The function takes as a single parameter the name <code>model</code> and returns a GetLLMEndpointResponse object.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>`str`</code> <p>Name of the model</p> required <p>Returns:</p> Name Type Description <code>GetLLMEndpointResponse</code> <code>GetLLMEndpointResponse</code> <p>object representing the LLM and configurations</p> Accessing model in pythonResponse in json <pre><code>from llmengine import Model\n\nresponse = Model.get(\"llama-7b.suffix.2023-07-18-12-00-00\")\n\nprint(response.json())\n</code></pre> <pre><code>{\n    \"id\": null,\n    \"name\": \"llama-7b.suffix.2023-07-18-12-00-00\",\n    \"model_name\": null,\n    \"source\": \"hugging_face\",\n    \"inference_framework\": \"text_generation_inference\",\n    \"inference_framework_tag\": null,\n    \"num_shards\": null,\n    \"quantize\": null,\n    \"spec\": null\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.model.Model.list","title":"list  <code>classmethod</code>","text":"<pre><code>list() -&gt; ListLLMEndpointsResponse\n</code></pre> <p>List LLM models available to call inference on.</p> <p>This API can be used to list all available models, including both publicly available models and user-created fine-tuned models. It returns a list of GetLLMEndpointResponse objects for all models. The most important field is the model <code>name</code>.</p> <p>Returns:</p> Name Type Description <code>ListLLMEndpointsResponse</code> <code>ListLLMEndpointsResponse</code> <p>list of models</p> Listing available modes in pythonResponse in json <pre><code>from llmengine import Model\n\nresponse = Model.list()\nprint(response.json())\n</code></pre> <pre><code>{\n    \"model_endpoints\": [\n        {\n            \"id\": null,\n            \"name\": \"llama-7b.suffix.2023-07-18-12-00-00\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"text_generation_inference\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n},\n        {\n            \"id\": null,\n            \"name\": \"llama-7b\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"text_generation_inference\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n},\n        {\n            \"id\": null,\n            \"name\": \"llama-13b-deepspeed-sync\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"deepspeed\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n},\n        {\n            \"id\": null,\n            \"name\": \"falcon-40b\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"text_generation_inference\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n}\n    ]\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.model.Model.delete","title":"delete  <code>classmethod</code>","text":"<pre><code>delete(model: str) -&gt; DeleteLLMEndpointResponse\n</code></pre> <p>Deletes an LLM model.</p> <p>This API can be used to delete a fine-tuned model. It takes as parameter the name of the <code>model</code> and returns a response object which has a <code>deleted</code> field confirming if the deletion was successful. If called on a base model included with LLM Engine, an error will be thrown.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>`str`</code> <p>Name of the model</p> required <p>Returns:</p> Name Type Description <code>response</code> <code>DeleteLLMEndpointResponse</code> <p>whether the model was successfully deleted</p> Deleting model in pythonResponse in json <pre><code>from llmengine import Model\n\nresponse = Model.delete(\"llama-7b.suffix.2023-07-18-12-00-00\")\nprint(response.json())\n</code></pre> <pre><code>{\n    \"deleted\": true\n}\n</code></pre>"},{"location":"guides/completions/","title":"Completions","text":"<p>Language Models are trained to understand natural language and provide text outputs as a response to their inputs. The inputs are called prompts and outputs are referred to as completions. LLMs take the input prompts and chunk them smaller units called tokens to process and generate language. Tokens may include trailing spaces and even sub-words, this process is language dependent.</p> <p>Scale LLM Engine provides access to open source language models (see Model Zoo) that can be used for producing completions to prompts.</p>"},{"location":"guides/completions/#completion-api-call","title":"Completion API call","text":"<p>An example API call looks as follows:</p> <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"Hello, my name is\",\n    max_new_tokens=10,\n    temperature=0.2,\n)\n</code></pre> <p>The model_name is the LLM to be used (see Model Zoo). The prompt is the main input for the LLM to respond to. The max_new_tokens parameter is the maximum number of tokens to generate in the chat completion. The temperature is the sampling temperature to use. Higher values make the output more random, while lower values will make it more focussed and deterministic.</p> <p>See the full API reference documentation to learn more.</p>"},{"location":"guides/completions/#completion-api-response","title":"Completion API response","text":"<p>An example Completion API response looks as follows:</p> Response in jsonResponse in python <pre><code>{\n  \"outputs\": [\n    {\n      \"text\": \"_______ and I am a _______\",\n      \"num_completion_tokens\": 10\n}\n  ]\n}\n</code></pre> <pre><code>&gt;&gt;&gt; print( response.outputs[0].text )\n________. Hello, my name is _\n</code></pre>"},{"location":"guides/completions/#token-streaming","title":"Token streaming","text":"<p>The Completions API support token streaming to reduce perceived latency for certain applications. When streaming, tokens will be sent as data-only server-side events.</p> <p>To enable token streaming, pass <code>stream=True</code> to either <code>Completion.create</code> or <code>Completion.acreate</code>.</p> <p>An example of token streaming using the synchronous Completions API looks as follows</p> Token streaming with synchronous API in python <pre><code>from llmengine import Completion\n\nstream = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"why is the sky blue?\",\n    max_new_tokens=5,\n    temperature=0.2,\n    stream=True,\n)\n\nfor response in stream:\n    if response.output:\n        print(response.json())\n</code></pre>"},{"location":"guides/completions/#async-requests","title":"Async requests","text":"<p>The Python client supports <code>asyncio</code> for creating Completions. Use <code>Completion.acreate</code> instead of <code>Completion.create</code> to utilize async processing. The function signatures are otherwise identical.</p> <p>An example of async Completions looks as follows</p> Completions with asynchronous API in python <pre><code>import asyncio\nfrom llmengine import Completion\n\nasync def main():\n    response = await Completion.acreate(\n        model=\"llama-7b\",\n        prompt=\"Hello, my name is\",\n        max_new_tokens=10,\n        temperature=0.2,\n    )\n    print(response.json())\n\nasyncio.run(main())\n</code></pre>"},{"location":"guides/completions/#which-model-should-i-use","title":"Which model should I use?","text":"<p>See the Model Zoo for more information on best practices for which model to use for Completions.</p>"},{"location":"guides/fine_tuning/","title":"Fine-tuning","text":"<p>Learn how to customize your models on your data with fine-tuning.</p>"},{"location":"guides/fine_tuning/#introduction","title":"Introduction","text":"<p>Fine-tuning helps improve model performance by training on specific examples of prompts and desired responses. LLMs are initially trained on data collected from the entire internet. With fine-tuning, LLMs can be optimized to perform better in a specific domain by learning from examples for that domain. Smaller LLMs that have been fine-tuned on a specific use case often outperform larger ones that were trained more generally.</p> <p>Fine-tuning allows for:</p> <ol> <li>Higher quality results than prompt engineering alone</li> <li>Cost savings through shorter prompts</li> <li>The ability to reach equivalent accuracy with a smaller model</li> <li>Lower latency at inference time</li> <li>The chance to show an LLM more examples than can fit in a single context window</li> </ol> <p>LLM Engine's fine-tuning API lets you fine-tune various open source LLMs on your own data and then make inference calls to the resulting LLM. For more specific details, see the fine-tuning API reference.</p>"},{"location":"guides/fine_tuning/#producing-high-quality-data-for-fine-tuning","title":"Producing high quality data for fine-tuning","text":"<p>The training data for fine-tuning should consist of prompt and response pairs.</p> <p>As a rule of thumb, you should expect to see linear improvements in your fine-tuned model's quality with each doubling of the dataset size. Having high-quality data is also essential to improving performance. For every linear increase in the error rate in your training data, you may encounter a roughly quadratic increase in your fine-tuned model's error rate.</p> <p>High quality data is critical to achieve improved model performance, and in several cases will require experts to  generate and prepare data - the breadth and diversity of the data is highly critical. Scale's Data Engine can help  prepare such high quality, diverse data sets - more information here.</p>"},{"location":"guides/fine_tuning/#preparing-data","title":"Preparing data","text":"<p>Your data must be formatted as a CSV file that includes two columns: <code>prompt</code> and <code>response</code>. A maximum of 100,000 rows of data is currently supported. At least 200 rows of data is recommended to start to see benefits from fine-tuning.</p> <p>Here is an example script to create a 50-row CSV of properly formatted data for fine-tuning an airline question answering bot</p> Creating a sample dataset <pre><code>import csv\n# Define data\ndata = [\n    (\"What is your policy on carry-on luggage?\", \"Our policy allows each passenger to bring one piece of carry-on luggage and one personal item such as a purse or briefcase. The maximum size for carry-on luggage is 22 x 14 x 9 inches.\"),\n    (\"How can I change my flight?\", \"You can change your flight through our website or mobile app. Go to 'Manage my booking' section, enter your booking reference and last name, then follow the prompts to change your flight.\"),\n    (\"What meals are available on my flight?\", \"We offer a variety of meals depending on the flight's duration and route. These can range from snacks and light refreshments to full-course meals on long-haul flights. Specific meal options can be viewed during the booking process.\"),\n    (\"How early should I arrive at the airport before my flight?\", \"We recommend arriving at least two hours before domestic flights and three hours before international flights.\"),\n    (\"Can I select my seat in advance?\", \"Yes, you can select your seat during the booking process or afterwards via the 'Manage my booking' section on our website or mobile app.\"),\n    (\"What should I do if my luggage is lost?\", \"If your luggage is lost, please report this immediately at our 'Lost and Found' counter at the airport. We will assist you in tracking your luggage.\"),\n    (\"Do you offer special assistance for passengers with disabilities?\", \"Yes, we offer special assistance for passengers with disabilities. Please notify us of your needs at least 48 hours prior to your flight.\"),\n    (\"Can I bring my pet on the flight?\", \"Yes, we allow small pets in the cabin, and larger pets in the cargo hold. Please check our pet policy for more details.\"),\n    (\"What is your policy on flight cancellations?\", \"In case of flight cancellations, we aim to notify passengers as early as possible and offer either a refund or a rebooking on the next available flight.\"),\n    (\"Can I get a refund if I cancel my flight?\", \"Refunds depend on the type of ticket purchased. Please check our cancellation policy for details. Non-refundable tickets, however, are typically not eligible for refunds unless due to extraordinary circumstances.\"),\n    (\"How can I check-in for my flight?\", \"You can check-in for your flight either online, through our mobile app, or at the airport. Online and mobile app check-in opens 24 hours before departure and closes 90 minutes before.\"),\n    (\"Do you offer free meals on your flights?\", \"Yes, we serve free meals on all long-haul flights. For short-haul flights, we offer a complimentary drink and snack. Special meal requests should be made at least 48 hours before departure.\"),\n    (\"Can I use my electronic devices during the flight?\", \"Small electronic devices can be used throughout the flight in flight mode. Larger devices like laptops may be used above 10,000 feet.\"),\n    (\"How much baggage can I check-in?\", \"The checked baggage allowance depends on the class of travel and route. The details would be mentioned on your ticket, or you can check on our website.\"),\n    (\"How can I request for a wheelchair?\", \"To request a wheelchair or any other special assistance, please call our customer service at least 48 hours before your flight.\"),\n    (\"Do I get a discount for group bookings?\", \"Yes, we offer discounts on group bookings of 10 or more passengers. Please contact our group bookings team for more information.\"),\n    (\"Do you offer Wi-fi on your flights?\", \"Yes, we offer complimentary Wi-fi on select flights. You can check the availability during the booking process.\"),\n    (\"What is the minimum connecting time between flights?\", \"The minimum connecting time varies depending on the airport and whether your flight is international or domestic. Generally, it's recommended to allow at least 45-60 minutes for domestic connections and 60-120 minutes for international.\"),\n    (\"Do you offer duty-free shopping on international flights?\", \"Yes, we have a selection of duty-free items that you can pre-order on our website or purchase onboard on international flights.\"),\n    (\"Can I upgrade my ticket to business class?\", \"Yes, you can upgrade your ticket through the 'Manage my booking' section on our website or by contacting our customer service. The availability and costs depend on the specific flight.\"),\n    (\"Can unaccompanied minors travel on your flights?\", \"Yes, we do accommodate unaccompanied minors on our flights, with special services to ensure their safety and comfort. Please contact our customer service for more details.\"),\n    (\"What amenities do you provide in business class?\", \"In business class, you will enjoy additional legroom, reclining seats, premium meals, priority boarding and disembarkation, access to our business lounge, extra baggage allowance, and personalized service.\"),\n    (\"How much does extra baggage cost?\", \"Extra baggage costs vary based on flight route and the weight of the baggage. Please refer to our 'Extra Baggage' section on the website for specific rates.\"),\n    (\"Are there any specific rules for carrying liquids in carry-on?\", \"Yes, liquids carried in your hand luggage must be in containers of 100 ml or less and they should all fit into a single, transparent, resealable plastic bag of 20 cm x 20 cm.\"),\n    (\"What if I have a medical condition that requires special assistance during the flight?\", \"We aim to make the flight comfortable for all passengers. If you have a medical condition that may require special assistance, please contact our \u2018special services\u2019 team 48 hours before your flight.\"),\n    (\"What in-flight entertainment options are available?\", \"We offer a range of in-flight entertainment options including a selection of movies, TV shows, music, and games, available on your personal seat-back screen.\"),\n    (\"What types of payment methods do you accept?\", \"We accept credit/debit cards, PayPal, bank transfers, and various other forms of payment. The available options may vary depending on the country of departure.\"),\n    (\"How can I earn and redeem frequent flyer miles?\", \"You can earn miles for every journey you take with us or our partner airlines. These miles can be redeemed for flight tickets, upgrades, or various other benefits. To earn and redeem miles, you need to join our frequent flyer program.\"),\n    (\"Can I bring a stroller for my baby?\", \"Yes, you can bring a stroller for your baby. It can be checked in for free and will normally be given back to you at the aircraft door upon arrival.\"),\n    (\"What age does my child have to be to qualify as an unaccompanied minor?\", \"Children aged between 5 and 12 years who are traveling alone are considered unaccompanied minors. Our team provides special care for these children from departure to arrival.\"),\n    (\"What documents do I need to travel internationally?\", \"For international travel, you need a valid passport and may also require visas, depending on your destination and your country of residence. It's important to check the specific requirements before you travel.\"),\n    (\"What happens if I miss my flight?\", \"If you miss your flight, please contact our customer service immediately. Depending on the circumstances, you may be able to rebook on a later flight, but additional fees may apply.\"),\n    (\"Can I travel with my musical instrument?\", \"Yes, small musical instruments can be brought on board as your one carry-on item. Larger instruments must be transported in the cargo, or if small enough, a seat may be purchased for them.\"),\n    (\"Do you offer discounts for children or infants?\", \"Yes, children aged 2-11 traveling with an adult usually receive a discount on the fare. Infants under the age of 2 who do not occupy a seat can travel for a reduced fare or sometimes for free.\"),\n    (\"Is smoking allowed on your flights?\", \"No, all our flights are non-smoking for the comfort and safety of all passengers.\"),\n    (\"Do you have family seating?\", \"Yes, we offer the option to seat families together. You can select seats during booking or afterwards through the 'Manage my booking' section on the website.\"),\n    (\"Is there any discount for senior citizens?\", \"Some flights may offer a discount for senior citizens. Please check our website or contact customer service for accurate information.\"),\n    (\"What items are prohibited on your flights?\", \"Prohibited items include, but are not limited to, sharp objects, firearms, explosive materials, and certain chemicals. You can find a comprehensive list on our website under the 'Security Regulations' section.\"),\n    (\"Can I purchase a ticket for someone else?\", \"Yes, you can purchase a ticket for someone else. You'll need their correct name as it appears on their government-issued ID, and their correct travel dates.\"),\n    (\"What is the process for lost and found items on the plane?\", \"If you realize you forgot an item on the plane, report it as soon as possible to our lost and found counter. We will make every effort to locate and return your item.\"),\n    (\"Can I request a special meal?\", \"Yes, we offer a variety of special meals to accommodate dietary restrictions. Please request your preferred meal at least 48 hours prior to your flight.\"),\n    (\"Is there a weight limit for checked baggage?\", \"Yes, luggage weight limits depend on your ticket class and route. You can find the details on your ticket or by visiting our website.\"),\n    (\"Can I bring my sports equipment?\", \"Yes, certain types of sports equipment can be carried either as or in addition to your permitted baggage. Some equipment may require additional fees. It's best to check our policy on our website or contact us directly.\"),\n    (\"Do I need a visa to travel to certain countries?\", \"Yes, visa requirements depend on the country you are visiting and your nationality. We advise checking with the relevant embassy or consulate prior to travel.\"),\n    (\"How can I add extra baggage to my booking?\", \"You can add extra baggage to your booking through the 'Manage my booking' section on our website or by contacting our customer services.\"),\n    (\"Can I check-in at the airport?\", \"Yes, you can choose to check-in at the airport. However, we also offer online and mobile check-in, which may save you time.\"),\n    (\"How do I know if my flight is delayed or cancelled?\", \"In case of any changes to your flight, we will attempt to notify all passengers using the contact information given at the time of booking. You can also check your flight status on our website.\"),\n    (\"What is your policy on pregnant passengers?\", \"Pregnant passengers can travel up to the end of the 36th week for single pregnancies, and the end of the 32nd week for multiple pregnancies. We recommend consulting your doctor before any air travel.\"),\n    (\"Can children travel alone?\", \"Yes, children age 5 to 12 can travel alone as unaccompanied minors. We provide special care for these seats. Please contact our customer service for more information.\"),\n    (\"How can I pay for my booking?\", \"You can pay for your booking using a variety of methods including credit and debit cards, PayPal, or bank transfers. The options may vary depending on the country of departure.\"),\n]\n\n# Write data to a CSV file\nwith open('customer_service_data.csv', 'w', newline='') as file:\n    writer = csv.writer(file)\n    writer.writerow([\"prompt\", \"response\"])\n    writer.writerows(data)\n</code></pre>"},{"location":"guides/fine_tuning/#making-your-data-accessible-to-llm-engine","title":"Making your data accessible to LLM Engine","text":"<p>Currently, data needs to be uploaded to a publicly accessible web URL so that it can be read for fine-tuning. Publicly accessible HTTP, HTTPS, and S3 URLs are currently supported. Support for privately sharing data with the LLM Engine API is coming shortly. For quick iteration, you can look into tools like Pastebin or Github Gists to quickly host your CSV files in a public manner. We created an example Github Gist you can see here. To use the gist, you can just use the URL given when you click the \u201cRaw\u201d button (URL).</p>"},{"location":"guides/fine_tuning/#launching-the-fine-tune","title":"Launching the fine-tune","text":"<p>Once you have uploaded your data, you can use the LLM Engine API to launch a fine-tune. You will  need to specify which base model to fine-tune, the locations of the training file and optional  validation data file, an optional set of hyperparameters to customize the fine-tuning behavior,  and an optional suffix to append to the name of the fine-tune.</p> <p>If you specify a suffix, the fine-tune will be named <code>model:suffix:&lt;timestamp&gt;</code>. If you do not,  the fine-tune will be named <code>model:&lt;timestamp&gt;</code>. The timestamp will be the time the fine-tune was  launched.</p> Hyper-parameters for fine-tune  * `lr`: Peak learning rate used during fine-tuning. It decays with a cosine schedule afterward. (Default: 2e-5) * `warmup_ratio`: Ratio of training steps used for learning rate warmup. (Default: 0.03) * `epochs`: Number of fine-tuning epochs. (Default: 5) * `weight_decay`: Regularization penalty applied to learned weights. (Default: 0.001) * `max_seq_length`: Maximum number of tokens per sequence in the dataset. (Default: 1024)  Create a fine-tune in python <pre><code>from llmengine import FineTune\n\nresponse = FineTune.create(\n    model=\"llama-7b\",\n    training_file=\"s3://my-bucket/path/to/training-file.csv\",\n)\n\nprint(response.json())\n</code></pre> <p>See the Model Zoo to see which models have fine-tuning support.</p> <p>Once the fine-tune is launched, you can also get the status of your fine-tune.</p>"},{"location":"guides/fine_tuning/#making-inference-calls-to-your-fine-tune","title":"Making inference calls to your fine-tune","text":"<p>Once your fine-tune is finished, you will be able to start making inference requests to the  model. You can use the <code>fine_tuned_model</code> returned from your  FineTune.get API call to reference your fine-tuned model in the Completions API. Alternatively, you can list  available LLMs with <code>Model.list</code> in order to find the name of your fine-tuned model. See the  Completion API for more details. You can then  use that name to direct your completion requests.  You must wait until your fine-tune is complete  before you can plug it into the Completions API. You can check the status of your fine-tune with  FineTune.get.</p> Inference with a fine-tuned model in python <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model_name=\"llama-7b:airlines:2023-07-17-08-30-45\",\n    prompt=\"Do you offer in-flight Wi-fi?\",\n    max_new_tokens=100,\n    temperature=0.2,\n)\nprint(response.json())\n</code></pre>"},{"location":"guides/rate_limits/","title":"Overview","text":""},{"location":"guides/rate_limits/#what-are-rate-limits","title":"What are rate limits?","text":"<p>A rate limit is a restriction that an API imposes on the number of times a user or client can access the server within a specified period of time.</p>"},{"location":"guides/rate_limits/#how-do-i-know-if-i-am-rate-limited","title":"How do I know if I am rate limited?","text":"<p>Per standard HTTP practices, your request will receive a response with HTTP status code of <code>429</code>, <code>Too Many Requests</code>.</p>"},{"location":"guides/rate_limits/#what-are-the-rate-limits-for-our-api","title":"What are the rate limits for our API?","text":"<p>The LLM Engine API is currently in a preview mode, and therefore we currently do not have any advertised rate limits. As the API moves towards a production release, we will update this section with specific rate limits. For now, the API will return HTTP 429 on an as-needed basis.</p>"},{"location":"guides/rate_limits/#error-mitigation","title":"Error mitigation","text":""},{"location":"guides/rate_limits/#retrying-with-exponential-backoff","title":"Retrying with exponential backoff","text":"<p>One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff.  Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the  unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated.  This continues until the request is successful or until a maximum number of retries is reached. This approach has many benefits:</p> <ul> <li>Automatic retries means you can recover from rate limit errors without crashes or missing data</li> <li>Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail</li> <li>Adding random jitter to the delay helps retries from all hitting at the same time.</li> </ul> <p>Below are a few example solutions for Python that use exponential backoff.</p>"},{"location":"guides/rate_limits/#example-1-using-the-tenacity-library","title":"Example #1: Using the <code>tenacity</code> library","text":"<p>Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding  retry behavior to just about anything. To add exponential backoff to your requests, you can use the tenacity.retry  decorator. The below example uses the tenacity.wait_random_exponential function to add random exponential backoff to a  request.</p> Exponential backoff in python <pre><code>import llmengine\nfrom tenacity import (\n    retry,\n    stop_after_attempt,\n    wait_random_exponential,\n)  # for exponential backoff\n@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))\ndef completion_with_backoff(**kwargs):\n    return llmengine.Completion.create(**kwargs)\n\ncompletion_with_backoff(model=\"llama-7b\", prompt=\"Why is the sky blue?\")\n</code></pre>"},{"location":"guides/rate_limits/#example-2-using-the-backoff-library","title":"Example #2: Using the <code>backoff</code> library","text":"<p>Another python library that provides function decorators for backoff and retry is backoff:</p> Decorators for backoff and retry in python <pre><code>import llmengine\nimport backoff\n@backoff.on_exception(backoff.expo, llmengine.error.RateLimitError)\ndef completions_with_backoff(**kwargs):\n    return llmengine.Completion.create(**kwargs)\n\ncompletions_with_backoff(model=\"llama-7b\", prompt=\"Why is the sky blue?\")\n</code></pre>"},{"location":"guides/token_streaming/","title":"Token streaming","text":"<p>The Completions APIs support a <code>stream</code> boolean parameter that, when <code>True</code>, will return a streamed response of token-by-token server-sent events (SSEs) rather than waiting to receive the full response when model generation has finished. This decreases latency of when you start getting a response.</p> <p>The response will consist of SSEs of the form <code>{\"token\": dict, \"generated_text\": str | null, \"details\": dict | null}</code>, where the dictionary for each token will contain log probability information in addition to the generated string; the <code>generated_text</code> field will be <code>null</code> for all but the last SSE, for which it will contain the full generated response.</p>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Introduction","text":""},{"location":"#llm-engine","title":"LLM Engine","text":"<p>The open source engine for inference and fine-tuning of Large Language Models.</p> <p>LLM Engine is the easiest way to customize and serve LLMs.</p> <p>LLMs can be accessed via Scale's hosted version or by using the helm charts in this repository to run model inference and fine-tuning in your own infrastructure.</p>"},{"location":"#quick-install","title":"Quick Install","text":"Install the python package <pre><code>pip install scale-llm-engine\n</code></pre>"},{"location":"#about","title":"About","text":"<p>Foundation models are emerging as the building blocks of AI. However, deploying these models to the cloud and fine-tuning them are expensive operations that require infrastructure and ML expertise. It is also difficult to maintain over time as new models are released and new techniques for both inference and fine-tuning are made available.</p> <p>LLM Engine is a Python library, CLI, and Helm chart that provides everything you need to serve and fine-tune foundation models, whether you use Scale's hosted infrastructure or do it in your own cloud infrastructure using Kubernetes.</p>"},{"location":"#key-features","title":"Key Features","text":"<p>Ready-to-use APIs for your favorite models: Deploy and serve open-source foundation models - including LLaMA, MPT and Falcon. Use Scale-hosted models or deploy to your own infrastructure.</p> <p>Fine-tune your favorite models: Fine-tune open-source foundation models like LLaMA, MPT etc. with your own data for optimized performance.</p> <p>Optimized Inference: LLM Engine provides inference APIs for streaming responses and dynamically batching inputs for higher throughput and lower latency.</p> <p>Open-Source Integrations: Deploy any Hugging Face model with a single command.</p> <p>Deploying from any docker image: Turn any Docker image into an auto-scaling deployment with simple APIs.</p>"},{"location":"#features-coming-soon","title":"Features Coming Soon","text":"<p>k8s Installation Documentation: We are working hard to document installation and maintenance of inference and fine-tuning functionality on your own infrastructure. For now, our documentation covers using our client libraries to access Scale's hosted infrastructure.</p> <p>Fast Cold-Start Times: To prevent GPUs from idling, LLM Engine automatically scales your model to zero when it's not in use and scales up within seconds, even for large foundation models.</p> <p>Cost Optimization: Deploy AI models cheaper than commercial ones, including cold-start and warm-down times.</p>"},{"location":"contributing/","title":"Contributing to LLM Engine","text":""},{"location":"contributing/#updating-llm-engine-documentation","title":"Updating LLM Engine Documentation","text":"<p>LLM Engine leverages mkdocs to create beautiful, community-oriented documentation.</p>"},{"location":"contributing/#step-1-clone-the-repository","title":"Step 1: Clone the Repository","text":"<p>Clone/Fork the LLM Engine Repository. Our documentation lives in the <code>docs</code> folder.</p>"},{"location":"contributing/#step-2-install-the-dependencies","title":"Step 2: Install the Dependencies","text":"<p>Dependencies are located in <code>requirements-docs.txt</code>, go ahead and pip install those with </p> <pre><code>pip install -r requirements-docs.txt\n</code></pre>"},{"location":"contributing/#step-3-run-locally","title":"Step 3: Run Locally","text":"<p>To run the documentation service locally, execute the following command:</p> <pre><code>mkdocs serve\n</code></pre> <p>This should kick off a locally running instance on http://127.0.0.1:8000/.</p> <p>As you edit the content in the <code>docs</code> folder, the site will be automatically reloaded on each file save.</p>"},{"location":"contributing/#step-4-editing-navigation-and-settings","title":"Step 4: Editing Navigation and Settings","text":"<p>If you are less familair with <code>mkdocs</code>, in addition to the markdown content in the <code>docs</code> folder, there is a top-level <code>mkdocs.yml</code> file as well that defines the navigation pane and other website settings. If you don't see your page where you think it should be, double-check the .yml file.</p>"},{"location":"contributing/#step-5-building-and-deploying","title":"Step 5: Building and Deploying","text":"<p>CircleCI (via <code>.circleci/config.yml</code>) handles the building and deployment of our documentation service for us.</p>"},{"location":"faq/","title":"Frequently Asked Questions","text":""},{"location":"getting_started/","title":"Getting Started","text":"<p>The fastest way to get started with LLM Engine is to use the python client in this repository to run inference and fine-tuning on Scale's infrastructure. This path does not require you to install anything on your infrastructure, and Scale's free tier gives you access to experimentation using open source LLMs.</p> <p>To start with, install LLM Engine via pip or conda:</p> pipconda <pre><code>pip install scale-llm-engine\n</code></pre> <pre><code>conda install scale-llm-engine -c conda-forge\n</code></pre>"},{"location":"getting_started/#scale-api-keys","title":"Scale API Keys","text":"<p>Next, you need a Scale Spellbook API key.</p>"},{"location":"getting_started/#retrieving-your-api-key","title":"Retrieving your API Key","text":"<p>To retrieve your API key, head to Scale Spellbook where you will get an API key on the settings page.</p> <p>Different API Keys for different Scale Products</p> <p>If you have leveraged Scale's platform for annotation work in the past, please note that your Spellbook API key will be different than the Scale Annotation API key. You will want to create a Spellbook API key before getting started.</p>"},{"location":"getting_started/#set-your-api-key","title":"Set your API Key","text":"<p>LLM Engine uses environment variables to access your API key.</p> <p>Set this API key as the <code>SCALE_API_KEY</code> environment variable by running the following command in your terminal before you run your python application.</p> <pre><code>export SCALE_API_KEY=\"[Your API key]\"\n</code></pre> <p>You can also add in the line above to your <code>.zshrc</code> or <code>.bash_profile</code> so it's automatically set for future sessions.</p>"},{"location":"getting_started/#example-code","title":"Example Code","text":""},{"location":"getting_started/#sample-completion","title":"Sample Completion","text":"<p>With your API key set, you can now send LLM Engine requests using the Python client:</p> <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model=\"falcon-7b-instruct\",\n    prompt=\"I'm opening a pancake restaurant that specializes in unique pancake shapes, colors, and flavors. List 3 quirky names I could name my restaurant.\",\n    max_new_tokens=100,\n    temperature=0.2,\n)\n\nprint(response.outputs[0].text)\n</code></pre>"},{"location":"getting_started/#with-streaming","title":"With Streaming","text":"<pre><code>from llmengine import Completion\nimport sys\nstream = Completion.create(\n    model=\"falcon-7b-instruct\",\n    prompt=\"Give me a 200 word summary on the current economic events in the US.\",\n    max_new_tokens=1000,\n    temperature=0.2,\n    stream=True\n)\n\nfor response in stream:\n    if response.output:\n        print(response.output.text, end=\"\")\n        sys.stdout.flush()\n</code></pre>"},{"location":"model_zoo/","title":"Public Model Zoo","text":"<p>Scale hosts the following models in a model zoo:</p> Model Name Inference APIs Available Fine-tuning APIs Available <code>llama-7b</code> \u2705 \u2705 <code>falcon-7b</code> \u2705 <code>falcon-7b-instruct</code> \u2705 <code>falcon-40b</code> \u2705 <code>falcon-40b-instruct</code> \u2705 <code>mpt-7b</code> \u2705 <code>mpt-7b-instruct</code> \u2705 \u2705 <code>flan-t5-xxl</code> \u2705"},{"location":"model_zoo/#usage","title":"Usage","text":"<p>Each of these models can be used with the Completion API.</p> <p>The specified models can be fine-tuned with the FineTune API.</p> <p>More information about the models can be found using the Model API.</p>"},{"location":"api/data_types/","title":"\ud83d\udc0d Python Client Data Type Reference","text":""},{"location":"api/data_types/#llmengine.CompletionOutput","title":"CompletionOutput","text":"<p>             Bases: <code>BaseModel</code></p> <p>Represents the output of a completion request to a model.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionOutput.text","title":"text  <code>instance-attribute</code>","text":"<pre><code>text: str\n</code></pre> <p>The text of the completion.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionOutput.num_completion_tokens","title":"num_completion_tokens  <code>instance-attribute</code>","text":"<pre><code>num_completion_tokens: int\n</code></pre> <p>Number of tokens in the completion.</p>"},{"location":"api/data_types/#llmengine.CompletionStreamOutput","title":"CompletionStreamOutput","text":"<p>             Bases: <code>BaseModel</code></p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamOutput.text","title":"text  <code>instance-attribute</code>","text":"<pre><code>text: str\n</code></pre> <p>The text of the completion.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamOutput.finished","title":"finished  <code>instance-attribute</code>","text":"<pre><code>finished: bool\n</code></pre> <p>Whether the completion is finished.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamOutput.num_completion_tokens","title":"num_completion_tokens  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>num_completion_tokens: Optional[int] = None\n</code></pre> <p>Number of tokens in the completion.</p>"},{"location":"api/data_types/#llmengine.CompletionSyncResponse","title":"CompletionSyncResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for a synchronous prompt completion.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionSyncResponse.output","title":"output  <code>instance-attribute</code>","text":"<pre><code>output: CompletionOutput\n</code></pre> <p>Completion output.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionSyncResponse.request_id","title":"request_id  <code>instance-attribute</code>","text":"<pre><code>request_id: str\n</code></pre> <p>Unique ID of request.</p>"},{"location":"api/data_types/#llmengine.CompletionStreamResponse","title":"CompletionStreamResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for a stream prompt completion task.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamResponse.output","title":"output  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>output: Optional[CompletionStreamOutput] = None\n</code></pre> <p>Completion output.</p>"},{"location":"api/data_types/#llmengine.data_types.CompletionStreamResponse.request_id","title":"request_id  <code>instance-attribute</code>","text":"<pre><code>request_id: str\n</code></pre> <p>Unique ID of request.</p>"},{"location":"api/data_types/#llmengine.CreateFineTuneRequest","title":"CreateFineTuneRequest","text":"<p>             Bases: <code>BaseModel</code></p> <p>Request object for creating a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.model","title":"model  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>model: str = Field(\n    ...,\n    description=\"Identifier of base model to train from.\",\n)\n</code></pre> <p>Identifier of base model to train from.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.training_file","title":"training_file  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>training_file: str = Field(\n    ...,\n    description=\"Path to file of training dataset. Dataset must be a csv with columns 'prompt' and 'response'.\",\n)\n</code></pre> <p>Path to file of training dataset. Dataset must be a csv with columns 'prompt' and 'response'.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.validation_file","title":"validation_file  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>validation_file: Optional[str] = Field(\n    default=None,\n    description=\"Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.\",\n)\n</code></pre> <p>Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.hyperparameters","title":"hyperparameters  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>hyperparameters: Optional[Dict[str, Any]] = Field(\n    default=None,\n    description=\"Hyperparameters to pass in to training job.\",\n)\n</code></pre> <p>Hyperparameters to pass in to training job.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneRequest.suffix","title":"suffix  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>suffix: Optional[str] = Field(\n    default=None,\n    description=\"Optional user-provided identifier suffix for the fine-tuned model.\",\n)\n</code></pre> <p>Optional user-provided identifier suffix for the fine-tuned model.</p>"},{"location":"api/data_types/#llmengine.CreateFineTuneResponse","title":"CreateFineTuneResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for creating a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.CreateFineTuneResponse.fine_tune_id","title":"fine_tune_id  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>fine_tune_id: str = Field(\n    ..., description=\"ID of the created fine-tuning job.\"\n)\n</code></pre> <p>The ID of the FineTune.</p>"},{"location":"api/data_types/#llmengine.GetFineTuneResponse","title":"GetFineTuneResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for retrieving a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.GetFineTuneResponse.fine_tune_id","title":"fine_tune_id  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>fine_tune_id: str = Field(\n    ..., description=\"ID of the requested job.\"\n)\n</code></pre> <p>The ID of the FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.GetFineTuneResponse.status","title":"status  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>status: BatchJobStatus = Field(\n    ..., description=\"Status of the requested job.\"\n)\n</code></pre> <p>The status of the FineTune job.</p>"},{"location":"api/data_types/#llmengine.ListFineTunesResponse","title":"ListFineTunesResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for listing FineTunes.</p>"},{"location":"api/data_types/#llmengine.data_types.ListFineTunesResponse.jobs","title":"jobs  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>jobs: List[GetFineTuneResponse] = Field(\n    ...,\n    description=\"List of fine-tuning jobs and their statuses.\",\n)\n</code></pre> <p>A list of FineTunes, represented as <code>GetFineTuneResponse</code>s.</p>"},{"location":"api/data_types/#llmengine.CancelFineTuneResponse","title":"CancelFineTuneResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for cancelling a FineTune.</p>"},{"location":"api/data_types/#llmengine.data_types.CancelFineTuneResponse.success","title":"success  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>success: bool = Field(\n    ..., description=\"Whether cancellation was successful.\"\n)\n</code></pre> <p>Whether the cancellation succeeded.</p>"},{"location":"api/data_types/#llmengine.GetLLMEndpointResponse","title":"GetLLMEndpointResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for retrieving a Model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.name","title":"name  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>name: str = Field(\n    description=\"The name of the model. Use this for making inference requests to the model.\"\n)\n</code></pre> <p>The name of the model. Use this for making inference requests to the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.source","title":"source  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>source: LLMSource = Field(\n    description=\"The source of the model, e.g. Hugging Face.\"\n)\n</code></pre> <p>The source of the model, e.g. Hugging Face.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.inference_framework","title":"inference_framework  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>inference_framework: LLMInferenceFramework = Field(\n    description=\"The inference framework used by the model.\"\n)\n</code></pre> <p>The inference framework used by the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.id","title":"id  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>id: Optional[str] = Field(\n    default=None,\n    description=\"(For self-hosted users) The autogenerated ID of the model.\",\n)\n</code></pre> <p>(For self-hosted users) The autogenerated ID of the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.model_name","title":"model_name  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>model_name: Optional[str] = Field(\n    default=None,\n    description=\"(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.\",\n)\n</code></pre> <p>(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as <code>name</code>.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.inference_framework_tag","title":"inference_framework_tag  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>inference_framework_tag: Optional[str] = Field(\n    default=None,\n    description=\"(For self-hosted users) The Docker image tag used to run the model.\",\n)\n</code></pre> <p>(For self-hosted users) The Docker image tag used to run the model.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.num_shards","title":"num_shards  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>num_shards: Optional[int] = Field(\n    default=None,\n    description=\"(For self-hosted users) The number of shards.\",\n)\n</code></pre> <p>(For self-hosted users) The number of shards.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.quantize","title":"quantize  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>quantize: Optional[Quantization] = Field(\n    default=None,\n    description=\"(For self-hosted users) The quantization method.\",\n)\n</code></pre> <p>(For self-hosted users) The quantization method.</p>"},{"location":"api/data_types/#llmengine.data_types.GetLLMEndpointResponse.spec","title":"spec  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>spec: Optional[GetModelEndpointResponse] = Field(\n    default=None,\n    description=\"(For self-hosted users) Model endpoint details.\",\n)\n</code></pre> <p>(For self-hosted users) Model endpoint details.</p>"},{"location":"api/data_types/#llmengine.ListLLMEndpointsResponse","title":"ListLLMEndpointsResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for listing Models.</p>"},{"location":"api/data_types/#llmengine.data_types.ListLLMEndpointsResponse.model_endpoints","title":"model_endpoints  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>model_endpoints: List[GetLLMEndpointResponse] = Field(\n    ..., description=\"The list of models.\"\n)\n</code></pre> <p>A list of Models, represented as <code>GetLLMEndpointResponse</code>s.</p>"},{"location":"api/data_types/#llmengine.DeleteLLMEndpointResponse","title":"DeleteLLMEndpointResponse","text":"<p>             Bases: <code>BaseModel</code></p> <p>Response object for deleting a Model.</p>"},{"location":"api/data_types/#llmengine.data_types.DeleteLLMEndpointResponse.deleted","title":"deleted  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>deleted: bool = Field(\n    ..., description=\"Whether deletion was successful.\"\n)\n</code></pre> <p>Whether the deletion succeeded.</p>"},{"location":"api/error_handling/","title":"Error handling","text":"<p>LLM Engine uses conventional HTTP response codes to indicate the success or failure of an API request. In general: codes in the <code>2xx</code> range indicate success. Codes in the <code>4xx</code> range indicate indicate an error that failed given the  information provided (e.g. a given Model was not found, or an invalid temperature was specified). Codes in the <code>5xx</code>  range indicate an error with the LLM Engine servers.</p> <p>In the Python client, errors are presented via a set of corresponding Exception classes, which should be caught  and handled by the user accordingly.</p>"},{"location":"api/error_handling/#llmengine.errors.BadRequestError","title":"BadRequestError","text":"<pre><code>BadRequestError(message: str)\n</code></pre> <p>             Bases: <code>Exception</code></p> <p>Corresponds to HTTP 400. Indicates that the request had inputs that were invalid. The user should not attempt to retry the request without changing the inputs.</p>"},{"location":"api/error_handling/#llmengine.errors.UnauthorizedError","title":"UnauthorizedError","text":"<pre><code>UnauthorizedError(message: str)\n</code></pre> <p>             Bases: <code>Exception</code></p> <p>Corresponds to HTTP 401. This means that no valid API key was provided.</p>"},{"location":"api/error_handling/#llmengine.errors.NotFoundError","title":"NotFoundError","text":"<pre><code>NotFoundError(message: str)\n</code></pre> <p>             Bases: <code>Exception</code></p> <p>Corresponds to HTTP 404. This means that the resource (e.g. a Model, FineTune, etc.) could not be found. Note that this can also be returned in some cases where the object might exist, but the user does not have access to the object. This is done to avoid leaking information about the existence or nonexistence of said object that the user does not have access to.</p>"},{"location":"api/error_handling/#llmengine.errors.RateLimitExceededError","title":"RateLimitExceededError","text":"<pre><code>RateLimitExceededError(message: str)\n</code></pre> <p>             Bases: <code>Exception</code></p> <p>Corresponds to HTTP 429. Too many requests hit the API too quickly. We recommend an exponential backoff for retries.</p>"},{"location":"api/error_handling/#llmengine.errors.ServerError","title":"ServerError","text":"<pre><code>ServerError(status_code: int, message: str)\n</code></pre> <p>             Bases: <code>Exception</code></p> <p>Corresponds to HTTP 5xx errors on the server.</p>"},{"location":"api/langchain/","title":"\ud83e\udd9c Langchain","text":"<p>Coming soon!</p>"},{"location":"api/python_client/","title":"\ud83d\udc0d Python Client API Reference","text":""},{"location":"api/python_client/#llmengine.Completion","title":"Completion","text":"<p>             Bases: <code>APIEngine</code></p> <p>Completion API. This API is used to generate text completions.</p> <p>Language models are trained to understand natural language and predict text outputs as a response to their inputs. The inputs are called prompts and the outputs are referred to as completions. LLMs take the input prompts and chunk them into smaller units called tokens to process and generate language. Tokens may include trailing spaces and even sub-words; this process is language dependent.</p> <p>The Completion API can be run either synchronous or asynchronously (via Python <code>asyncio</code>). For each of these modes, you can also choose whether to stream token responses or not.</p>"},{"location":"api/python_client/#llmengine.completion.Completion.create","title":"create  <code>classmethod</code>","text":"<pre><code>create(\n    model: str,\n    prompt: str,\n    max_new_tokens: int = 20,\n    temperature: float = 0.2,\n    timeout: int = 10,\n    stream: bool = False,\n) -&gt; Union[\n    CompletionSyncResponse,\n    Iterator[CompletionStreamResponse],\n]\n</code></pre> <p>Creates a completion for the provided prompt and parameters synchronously.</p> <p>This API can be used to get the LLM to generate a completion synchronously. It takes as parameters the <code>model</code>see Model Zoo and the <code>prompt</code>. Optionally it takes <code>max_new_tokens</code>, <code>temperature</code>, <code>timeout</code> and <code>stream</code>. It returns CompletionSyncV1Response if <code>stream=False</code> or an async iterator of CompletionStreamV1Response with <code>request_id</code> and <code>outputs</code> fields.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>str</code> <p>Name of the model to use. See Model Zoo for a list of Models that are supported.</p> required <code>prompt</code> <code>str</code> <p>The prompt to generate completions for, encoded as a string.</p> required <code>max_new_tokens</code> <code>int</code> <p>The maximum number of tokens to generate in the completion.</p> <p>The token count of your prompt plus <code>max_new_tokens</code> cannot exceed the model's context length. See Model Zoo for information on each supported model's context length.</p> <code>20</code> <code>temperature</code> <code>float</code> <p>What sampling temperature to use, in the range <code>(0, 1]</code>. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.</p> <code>0.2</code> <code>timeout</code> <code>int</code> <p>Timeout in seconds. This is the maximum amount of time you are willing to wait for a response.</p> <code>10</code> <code>stream</code> <code>bool</code> <p>Whether to stream the response. If true, the return type is an <code>Iterator[CompletionStreamResponse]</code>. Otherwise, the return type is a <code>CompletionSyncResponse</code>. When streaming, tokens will be sent as data-only server-sent events.</p> <code>False</code> <p>Returns:</p> Name Type Description <code>response</code> <code>Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]</code> <p>The generated response (if <code>stream=False</code>) or iterator of response chunks (if <code>stream=True</code>)</p> <p>Token streaming can be used to reduce percieved latency for applications:</p> Synchronous completion without token streaming in pythonResponse in json <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"Hello, my name is\",\n    max_new_tokens=10,\n    temperature=0.2,\n)\nprint(response.json())\n</code></pre> <pre><code>{\n    \"request_id\": \"0123456789\",\n    \"outputs\":\n    [\n        {\n            \"text\": \"_______ and I am a _______\",\n            \"num_completion_tokens\": 10\n}\n    ],\n    \"traceback\": null\n}\n</code></pre> <p>Here is how applications can use streaming:</p> Synchronous completion with token streaming in pythonResponse in json <pre><code>from llmengine import Completion\n\nstream = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"why is the sky blue?\",\n    max_new_tokens=5,\n    temperature=0.2,\n    stream=True,\n)\n\nfor response in stream:\n    if response.output:\n        print(response.json())\n</code></pre> <pre><code>{\"request_id\": \"0123456789\", \"output\": {\"text\": \"\\n\", \"finished\": false, \"num_completion_tokens\": 1 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"I\", \"finished\": false, \"num_completion_tokens\": 2 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" don\", \"finished\": false, \"num_completion_tokens\": 3 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"\u2019\", \"finished\": false, \"num_completion_tokens\": 4 } }\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"t\", \"finished\": true, \"num_completion_tokens\": 5 } }\n</code></pre>"},{"location":"api/python_client/#llmengine.completion.Completion.acreate","title":"acreate  <code>async</code> <code>classmethod</code>","text":"<pre><code>acreate(\n    model: str,\n    prompt: str,\n    max_new_tokens: int = 20,\n    temperature: float = 0.2,\n    timeout: int = 10,\n    stream: bool = False,\n) -&gt; Union[\n    CompletionSyncResponse,\n    AsyncIterable[CompletionStreamResponse],\n]\n</code></pre> <p>Creates a completion for the provided prompt and parameters asynchronously (with <code>asyncio</code>).</p> <p>This API can be used to get the LLM to generate a completion asynchronously. It takes as parameters the <code>model</code>see Model Zoo and the <code>prompt</code>. Optionally it takes <code>max_new_tokens</code>, <code>temperature</code>, <code>timeout</code> and <code>stream</code>. It returns CompletionSyncV1Response if <code>stream=False</code> or an async iterator of CompletionStreamV1Response with <code>request_id</code> and <code>outputs</code> fields.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>str</code> <p>Name of the model to use. See Model Zoo for a list of Models that are supported.</p> required <code>prompt</code> <code>str</code> <p>The prompt to generate completions for, encoded as a string.</p> required <code>max_new_tokens</code> <code>int</code> <p>The maximum number of tokens to generate in the completion.</p> <p>The token count of your prompt plus <code>max_new_tokens</code> cannot exceed the model's context length. See Model Zoo for information on each supported model's context length.</p> <code>20</code> <code>temperature</code> <code>float</code> <p>What sampling temperature to use, in the range <code>(0, 1]</code>. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.</p> <code>0.2</code> <code>timeout</code> <code>int</code> <p>Timeout in seconds. This is the maximum amount of time you are willing to wait for a response.</p> <code>10</code> <code>stream</code> <code>bool</code> <p>Whether to stream the response. If true, the return type is an <code>Iterator[CompletionStreamResponse]</code>. Otherwise, the return type is a <code>CompletionSyncResponse</code>. When streaming, tokens will be sent as data-only server-sent events.</p> <code>False</code> <p>Returns:</p> Name Type Description <code>response</code> <code>Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]</code> <p>The generated response (if <code>stream=False</code>) or iterator of response chunks (if <code>stream=True</code>)</p> <p>Token streaming can be used to reduce percieved latency for applications:</p> Asynchronous completion without token streaming in pythonResponse in json <pre><code>import asyncio\nfrom llmengine import Completion\n\nasync def main():\n    response = await Completion.acreate(\n        model=\"llama-7b\",\n        prompt=\"Hello, my name is\",\n        max_new_tokens=10,\n        temperature=0.2,\n    )\n    print(response.json())\n\nasyncio.run(main())\n</code></pre> <pre><code>{\n    \"request_id\": \"b1b2c3d4e5f6g7h8i9j0\",\n    \"outputs\":\n    [\n        {\n            \"text\": \"_______, and I am a _____\",\n            \"num_completion_tokens\": 10\n}\n    ],\n}\n</code></pre> <p>Here is how applications can use streaming:</p> Asynchronous completion with token streaming in pythonResponse in json <pre><code>import asyncio\nfrom llmengine import Completion\n\nasync def main():\n    stream = await Completion.acreate(\n        model=\"llama-7b\",\n        prompt=\"why is the sky blue?\",\n        max_new_tokens=5,\n        temperature=0.2,\n        stream=True,\n    )\n\nasync for response in stream:\n        if response.output:\n            print(response.json())\n\nasyncio.run(main())\n</code></pre> <pre><code>{\"request_id\": \"0123456789\", \"output\": {\"text\": \"\\n\", \"finished\": false, \"num_completion_tokens\": 1}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \"I\", \"finished\": false, \"num_completion_tokens\": 2}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" think\", \"finished\": false, \"num_completion_tokens\": 3}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" the\", \"finished\": false, \"num_completion_tokens\": 4}}\n{\"request_id\": \"0123456789\", \"output\": {\"text\": \" sky\", \"finished\": true, \"num_completion_tokens\": 5}}\n</code></pre>"},{"location":"api/python_client/#llmengine.FineTune","title":"FineTune","text":"<p>             Bases: <code>APIEngine</code></p> <p>FineTune API. This API is used to fine-tune models.</p> <p>Fine-tuning is a process where the LLM is further trained on a task-specific dataset, allowing the model to adjust its parameters to better align with the task at hand. Fine-tuning is a supervised training phase, where prompt/response pairs are provided to optimize the performance of the LLM.</p> <p>LLM Engine provides APIs to create fine-tunes on a base model with training &amp; validation datasets. APIs are also provided to list, cancel and retrieve fine-tuning jobs.</p> <p>Creating a fine-tune will end with the creation of a Model, which you can view using <code>Model.get(model_name)</code> or delete using <code>Model.delete(model_name)</code>.</p>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.create","title":"create  <code>classmethod</code>","text":"<pre><code>create(\n    model: str,\n    training_file: str,\n    validation_file: Optional[str] = None,\n    hyperparameters: Optional[Dict[str, str]] = None,\n    suffix: Optional[str] = None,\n) -&gt; CreateFineTuneResponse\n</code></pre> <p>Creates a job that fine-tunes a specified model from a given dataset.</p> <p>This API can be used to fine-tune a model. The model is the name of base model (Model Zoo for available models) to fine-tune. The training file should consist of prompt and response pairs. Your data must be formatted as a CSV file that includes two columns: <code>prompt</code> and <code>response</code>. A maximum of 100,000 rows of data is currently supported. At least 200 rows of data is recommended to start to see benefits from fine-tuning.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>`str`</code> <p>The name of the base model to fine-tune. See Model Zoo for the list of available models to fine-tune.</p> required <code>training_file</code> <code>`str`</code> <p>Publicly accessible URL to a CSV file for training.</p> required <code>validation_file</code> <code>`Optional[str]`</code> <p>Publicly accessible URL to a CSV file for validation.</p> <code>None</code> <code>hyperparameters</code> <code>`Optional[Dict[str, str]]`</code> <p>A dict of hyperparameters to customize fine-tuning behavior.</p> <p>Currently supported hyperparameters:</p> <ul> <li><code>lr</code>: Peak learning rate used during fine-tuning. It decays with a cosine schedule afterward. (Default: 2e-5)</li> <li><code>warmup_ratio</code>: Ratio of training steps used for learning rate warmup. (Default: 0.03)</li> <li><code>epochs</code>: Number of fine-tuning epochs. (Default: 5)</li> <li><code>weight_decay</code>: Regularization penalty applied to learned weights. (Default: 0.001)</li> <li><code>max_seq_length</code>: Maximum number of tokens per sequence in the dataset. (Default: 1024)</li> </ul> <code>None</code> <code>suffix</code> <code>`Optional[str]`</code> <p>A string that will be added to your fine-tuned model name.</p> <code>None</code> <p>Returns:</p> Name Type Description <code>CreateFineTuneResponse</code> <code>CreateFineTuneResponse</code> <p>an object that contains the ID of the created fine-tuning job</p> <p>Here is an example script to create a 5-row CSV of properly formatted data for fine-tuning an airline question answering bot:</p> Formatting data in python <pre><code>import csv\n# Define data\ndata = [\n  (\"What is your policy on carry-on luggage?\", \"Our policy allows each passenger to bring one piece of carry-on luggage and one personal item such as a purse or briefcase. The maximum size for carry-on luggage is 22 x 14 x 9 inches.\"),\n  (\"How can I change my flight?\", \"You can change your flight through our website or mobile app. Go to 'Manage my booking' section, enter your booking reference and last name, then follow the prompts to change your flight.\"),\n  (\"What meals are available on my flight?\", \"We offer a variety of meals depending on the flight's duration and route. These can range from snacks and light refreshments to full-course meals on long-haul flights. Specific meal options can be viewed during the booking process.\"),\n  (\"How early should I arrive at the airport before my flight?\", \"We recommend arriving at least two hours before domestic flights and three hours before international flights.\"),\n  \"Can I select my seat in advance?\", \"Yes, you can select your seat during the booking process or afterwards via the 'Manage my booking' section on our website or mobile app.\"),\n  ]\n\n# Write data to a CSV file\nwith open('customer_service_data.csv', 'w', newline='') as file:\n    writer = csv.writer(file)\n    writer.writerow([\"prompt\", \"response\"])\n    writer.writerows(data)\n</code></pre> <p>Example code for fine-tuning:</p> Fine-tuning in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.create(\n    model=\"llama-7b\",\n    training_file=\"https://my-bucket.s3.us-west-2.amazonaws.com/path/to/training-file.csv\",\n)\n\nprint(response.json())\n</code></pre> <pre><code>{\n    \"fine_tune_id\": \"ft_abc123\"\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.get","title":"get  <code>classmethod</code>","text":"<pre><code>get(fine_tune_id: str) -&gt; GetFineTuneResponse\n</code></pre> <p>Get status of a fine-tuning job.</p> <p>This API can be used to get the status of an already running fine-tuning job. It takes as a single parameter the <code>fine_tune_id</code> and returns a GetFineTuneResponse object with the id and status (<code>PENDING</code>, <code>STARTED</code>, <code>UNDEFINED</code>, <code>FAILURE</code> or <code>SUCCESS</code>).</p> <p>Parameters:</p> Name Type Description Default <code>fine_tune_id</code> <code>`str`</code> <p>ID of the fine-tuning job</p> required <p>Returns:</p> Name Type Description <code>GetFineTuneResponse</code> <code>GetFineTuneResponse</code> <p>an object that contains the ID and status of the requested job</p> Getting status of fine-tuning in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.get(\n    fine_tune_id=\"ft_abc123\",\n)\n\nprint(response.json())\n</code></pre> <pre><code>{\n    \"fine_tune_id\": \"ft_abc123\",\n    \"status\": \"STARTED\"\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.list","title":"list  <code>classmethod</code>","text":"<pre><code>list() -&gt; ListFineTunesResponse\n</code></pre> <p>List fine-tuning jobs.</p> <p>This API can be used to list all the fine-tuning jobs. It returns a list of pairs of <code>fine_tune_id</code> and <code>status</code> for all existing jobs.</p> <p>Returns:</p> Name Type Description <code>ListFineTunesResponse</code> <code>ListFineTunesResponse</code> <p>an object that contains a list of all fine-tuning jobs and their statuses</p> Listing fine-tuning jobs in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.list()\nprint(response.json())\n</code></pre> <pre><code>{\n    \"jobs\": [\n        {\n            \"fine_tune_id\": \"ft_abc123\",\n            \"status\": \"STARTED\"\n},\n        {\n            \"fine_tune_id\": \"ft_def456\",\n            \"status\": \"SUCCESS\"\n}\n    ]\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.fine_tuning.FineTune.cancel","title":"cancel  <code>classmethod</code>","text":"<pre><code>cancel(fine_tune_id: str) -&gt; CancelFineTuneResponse\n</code></pre> <p>Cancel a fine-tuning job.</p> <p>This API can be used to cancel an existing fine-tuning job if it's no longer required. It takes as parameter the <code>fine_tune_id</code> and returns a response object which has a <code>success</code> field confirming if the cancellation was successful.</p> <p>Parameters:</p> Name Type Description Default <code>fine_tune_id</code> <code>`str`</code> <p>ID of the fine-tuning job</p> required <p>Returns:</p> Name Type Description <code>CancelFineTuneResponse</code> <code>CancelFineTuneResponse</code> <p>an object that contains whether the cancellation was successful</p> Cancelling fine-tuning job in pythonResponse in json <pre><code>from llmengine import FineTune\n\nresponse = FineTune.cancel(fine_tune_id=\"ft_abc123\")\nprint(response.json())\n</code></pre> <pre><code>{\n    \"success\": true\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.Model","title":"Model","text":"<p>             Bases: <code>APIEngine</code></p> <p>Model API. This API is used to get, list, and delete models. Models include both base models built into LLM Engine, and fine-tuned models that you create through the FineTune.create() API.</p> <p>See Model Zoo for the list of publicly available base models.</p>"},{"location":"api/python_client/#llmengine.model.Model.get","title":"get  <code>classmethod</code>","text":"<pre><code>get(model: str) -&gt; GetLLMEndpointResponse\n</code></pre> <p>Get information about an LLM model.</p> <p>This API can be used to get information about a Model's source and inference framework. For self-hosted users, it returns additional information about number of shards, quantization, infra settings, etc. The function takes as a single parameter the name <code>model</code> and returns a GetLLMEndpointResponse object.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>`str`</code> <p>Name of the model</p> required <p>Returns:</p> Name Type Description <code>GetLLMEndpointResponse</code> <code>GetLLMEndpointResponse</code> <p>object representing the LLM and configurations</p> Accessing model in pythonResponse in json <pre><code>from llmengine import Model\n\nresponse = Model.get(\"llama-7b.suffix.2023-07-18-12-00-00\")\n\nprint(response.json())\n</code></pre> <pre><code>{\n    \"id\": null,\n    \"name\": \"llama-7b.suffix.2023-07-18-12-00-00\",\n    \"model_name\": null,\n    \"source\": \"hugging_face\",\n    \"inference_framework\": \"text_generation_inference\",\n    \"inference_framework_tag\": null,\n    \"num_shards\": null,\n    \"quantize\": null,\n    \"spec\": null\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.model.Model.list","title":"list  <code>classmethod</code>","text":"<pre><code>list() -&gt; ListLLMEndpointsResponse\n</code></pre> <p>List LLM models available to call inference on.</p> <p>This API can be used to list all available models, including both publicly available models and user-created fine-tuned models. It returns a list of GetLLMEndpointResponse objects for all models. The most important field is the model <code>name</code>.</p> <p>Returns:</p> Name Type Description <code>ListLLMEndpointsResponse</code> <code>ListLLMEndpointsResponse</code> <p>list of models</p> Listing available modes in pythonResponse in json <pre><code>from llmengine import Model\n\nresponse = Model.list()\nprint(response.json())\n</code></pre> <pre><code>{\n    \"model_endpoints\": [\n        {\n            \"id\": null,\n            \"name\": \"llama-7b.suffix.2023-07-18-12-00-00\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"text_generation_inference\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n},\n        {\n            \"id\": null,\n            \"name\": \"llama-7b\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"text_generation_inference\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n},\n        {\n            \"id\": null,\n            \"name\": \"llama-13b-deepspeed-sync\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"deepspeed\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n},\n        {\n            \"id\": null,\n            \"name\": \"falcon-40b\",\n            \"model_name\": null,\n            \"source\": \"hugging_face\",\n            \"inference_framework\": \"text_generation_inference\",\n            \"inference_framework_tag\": null,\n            \"num_shards\": null,\n            \"quantize\": null,\n            \"spec\": null\n}\n    ]\n}\n</code></pre>"},{"location":"api/python_client/#llmengine.model.Model.delete","title":"delete  <code>classmethod</code>","text":"<pre><code>delete(model: str) -&gt; DeleteLLMEndpointResponse\n</code></pre> <p>Deletes an LLM model.</p> <p>This API can be used to delete a fine-tuned model. It takes as parameter the name of the <code>model</code> and returns a response object which has a <code>deleted</code> field confirming if the deletion was successful. If called on a base model included with LLM Engine, an error will be thrown.</p> <p>Parameters:</p> Name Type Description Default <code>model</code> <code>`str`</code> <p>Name of the model</p> required <p>Returns:</p> Name Type Description <code>response</code> <code>DeleteLLMEndpointResponse</code> <p>whether the model was successfully deleted</p> Deleting model in pythonResponse in json <pre><code>from llmengine import Model\n\nresponse = Model.delete(\"llama-7b.suffix.2023-07-18-12-00-00\")\nprint(response.json())\n</code></pre> <pre><code>{\n    \"deleted\": true\n}\n</code></pre>"},{"location":"guides/completions/","title":"Completions","text":"<p>Language Models are trained to understand natural language and provide text outputs as a response to their inputs. The inputs are called prompts and outputs are referred to as completions. LLMs take the input prompts and chunk them smaller units called tokens to process and generate language. Tokens may include trailing spaces and even sub-words, this process is language dependent.</p> <p>Scale LLM Engine provides access to open source language models (see Model Zoo) that can be used for producing completions to prompts.</p>"},{"location":"guides/completions/#completion-api-call","title":"Completion API call","text":"<p>An example API call looks as follows:</p> <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"Hello, my name is\",\n    max_new_tokens=10,\n    temperature=0.2,\n)\n</code></pre> <p>The model_name is the LLM to be used (see Model Zoo). The prompt is the main input for the LLM to respond to. The max_new_tokens parameter is the maximum number of tokens to generate in the chat completion. The temperature is the sampling temperature to use. Higher values make the output more random, while lower values will make it more focussed and deterministic.</p> <p>See the full API reference documentation to learn more.</p>"},{"location":"guides/completions/#completion-api-response","title":"Completion API response","text":"<p>An example Completion API response looks as follows:</p> Response in jsonResponse in python <pre><code>{\n  \"outputs\": [\n    {\n      \"text\": \"_______ and I am a _______\",\n      \"num_completion_tokens\": 10\n}\n  ]\n}\n</code></pre> <pre><code>&gt;&gt;&gt; print( response.outputs[0].text )\n________. Hello, my name is _\n</code></pre>"},{"location":"guides/completions/#token-streaming","title":"Token streaming","text":"<p>The Completions API support token streaming to reduce perceived latency for certain applications. When streaming, tokens will be sent as data-only server-side events.</p> <p>To enable token streaming, pass <code>stream=True</code> to either <code>Completion.create</code> or <code>Completion.acreate</code>.</p> <p>An example of token streaming using the synchronous Completions API looks as follows</p> Token streaming with synchronous API in python <pre><code>from llmengine import Completion\n\nstream = Completion.create(\n    model=\"llama-7b\",\n    prompt=\"why is the sky blue?\",\n    max_new_tokens=5,\n    temperature=0.2,\n    stream=True,\n)\n\nfor response in stream:\n    if response.output:\n        print(response.json())\n</code></pre>"},{"location":"guides/completions/#async-requests","title":"Async requests","text":"<p>The Python client supports <code>asyncio</code> for creating Completions. Use <code>Completion.acreate</code> instead of <code>Completion.create</code> to utilize async processing. The function signatures are otherwise identical.</p> <p>An example of async Completions looks as follows</p> Completions with asynchronous API in python <pre><code>import asyncio\nfrom llmengine import Completion\n\nasync def main():\n    response = await Completion.acreate(\n        model=\"llama-7b\",\n        prompt=\"Hello, my name is\",\n        max_new_tokens=10,\n        temperature=0.2,\n    )\n    print(response.json())\n\nasyncio.run(main())\n</code></pre>"},{"location":"guides/completions/#which-model-should-i-use","title":"Which model should I use?","text":"<p>See the Model Zoo for more information on best practices for which model to use for Completions.</p>"},{"location":"guides/fine_tuning/","title":"Fine-tuning","text":"<p>Learn how to customize your models on your data with fine-tuning.</p>"},{"location":"guides/fine_tuning/#introduction","title":"Introduction","text":"<p>Fine-tuning helps improve model performance by training on specific examples of prompts and desired responses. LLMs are initially trained on data collected from the entire internet. With fine-tuning, LLMs can be optimized to perform better in a specific domain by learning from examples for that domain. Smaller LLMs that have been fine-tuned on a specific use case often outperform larger ones that were trained more generally.</p> <p>Fine-tuning allows for:</p> <ol> <li>Higher quality results than prompt engineering alone</li> <li>Cost savings through shorter prompts</li> <li>The ability to reach equivalent accuracy with a smaller model</li> <li>Lower latency at inference time</li> <li>The chance to show an LLM more examples than can fit in a single context window</li> </ol> <p>LLM Engine's fine-tuning API lets you fine-tune various open source LLMs on your own data and then make inference calls to the resulting LLM. For more specific details, see the fine-tuning API reference.</p>"},{"location":"guides/fine_tuning/#producing-high-quality-data-for-fine-tuning","title":"Producing high quality data for fine-tuning","text":"<p>The training data for fine-tuning should consist of prompt and response pairs.</p> <p>As a rule of thumb, you should expect to see linear improvements in your fine-tuned model's quality with each doubling of the dataset size. Having high-quality data is also essential to improving performance. For every linear increase in the error rate in your training data, you may encounter a roughly quadratic increase in your fine-tuned model's error rate.</p> <p>High quality data is critical to achieve improved model performance, and in several cases will require experts to  generate and prepare data - the breadth and diversity of the data is highly critical. Scale's Data Engine can help  prepare such high quality, diverse data sets - more information here.</p>"},{"location":"guides/fine_tuning/#preparing-data","title":"Preparing data","text":"<p>Your data must be formatted as a CSV file that includes two columns: <code>prompt</code> and <code>response</code>. A maximum of 100,000 rows of data is currently supported. At least 200 rows of data is recommended to start to see benefits from fine-tuning.</p> <p>Here is an example script to create a 50-row CSV of properly formatted data for fine-tuning an airline question answering bot</p> Creating a sample dataset <pre><code>import csv\n# Define data\ndata = [\n    (\"What is your policy on carry-on luggage?\", \"Our policy allows each passenger to bring one piece of carry-on luggage and one personal item such as a purse or briefcase. The maximum size for carry-on luggage is 22 x 14 x 9 inches.\"),\n    (\"How can I change my flight?\", \"You can change your flight through our website or mobile app. Go to 'Manage my booking' section, enter your booking reference and last name, then follow the prompts to change your flight.\"),\n    (\"What meals are available on my flight?\", \"We offer a variety of meals depending on the flight's duration and route. These can range from snacks and light refreshments to full-course meals on long-haul flights. Specific meal options can be viewed during the booking process.\"),\n    (\"How early should I arrive at the airport before my flight?\", \"We recommend arriving at least two hours before domestic flights and three hours before international flights.\"),\n    (\"Can I select my seat in advance?\", \"Yes, you can select your seat during the booking process or afterwards via the 'Manage my booking' section on our website or mobile app.\"),\n    (\"What should I do if my luggage is lost?\", \"If your luggage is lost, please report this immediately at our 'Lost and Found' counter at the airport. We will assist you in tracking your luggage.\"),\n    (\"Do you offer special assistance for passengers with disabilities?\", \"Yes, we offer special assistance for passengers with disabilities. Please notify us of your needs at least 48 hours prior to your flight.\"),\n    (\"Can I bring my pet on the flight?\", \"Yes, we allow small pets in the cabin, and larger pets in the cargo hold. Please check our pet policy for more details.\"),\n    (\"What is your policy on flight cancellations?\", \"In case of flight cancellations, we aim to notify passengers as early as possible and offer either a refund or a rebooking on the next available flight.\"),\n    (\"Can I get a refund if I cancel my flight?\", \"Refunds depend on the type of ticket purchased. Please check our cancellation policy for details. Non-refundable tickets, however, are typically not eligible for refunds unless due to extraordinary circumstances.\"),\n    (\"How can I check-in for my flight?\", \"You can check-in for your flight either online, through our mobile app, or at the airport. Online and mobile app check-in opens 24 hours before departure and closes 90 minutes before.\"),\n    (\"Do you offer free meals on your flights?\", \"Yes, we serve free meals on all long-haul flights. For short-haul flights, we offer a complimentary drink and snack. Special meal requests should be made at least 48 hours before departure.\"),\n    (\"Can I use my electronic devices during the flight?\", \"Small electronic devices can be used throughout the flight in flight mode. Larger devices like laptops may be used above 10,000 feet.\"),\n    (\"How much baggage can I check-in?\", \"The checked baggage allowance depends on the class of travel and route. The details would be mentioned on your ticket, or you can check on our website.\"),\n    (\"How can I request for a wheelchair?\", \"To request a wheelchair or any other special assistance, please call our customer service at least 48 hours before your flight.\"),\n    (\"Do I get a discount for group bookings?\", \"Yes, we offer discounts on group bookings of 10 or more passengers. Please contact our group bookings team for more information.\"),\n    (\"Do you offer Wi-fi on your flights?\", \"Yes, we offer complimentary Wi-fi on select flights. You can check the availability during the booking process.\"),\n    (\"What is the minimum connecting time between flights?\", \"The minimum connecting time varies depending on the airport and whether your flight is international or domestic. Generally, it's recommended to allow at least 45-60 minutes for domestic connections and 60-120 minutes for international.\"),\n    (\"Do you offer duty-free shopping on international flights?\", \"Yes, we have a selection of duty-free items that you can pre-order on our website or purchase onboard on international flights.\"),\n    (\"Can I upgrade my ticket to business class?\", \"Yes, you can upgrade your ticket through the 'Manage my booking' section on our website or by contacting our customer service. The availability and costs depend on the specific flight.\"),\n    (\"Can unaccompanied minors travel on your flights?\", \"Yes, we do accommodate unaccompanied minors on our flights, with special services to ensure their safety and comfort. Please contact our customer service for more details.\"),\n    (\"What amenities do you provide in business class?\", \"In business class, you will enjoy additional legroom, reclining seats, premium meals, priority boarding and disembarkation, access to our business lounge, extra baggage allowance, and personalized service.\"),\n    (\"How much does extra baggage cost?\", \"Extra baggage costs vary based on flight route and the weight of the baggage. Please refer to our 'Extra Baggage' section on the website for specific rates.\"),\n    (\"Are there any specific rules for carrying liquids in carry-on?\", \"Yes, liquids carried in your hand luggage must be in containers of 100 ml or less and they should all fit into a single, transparent, resealable plastic bag of 20 cm x 20 cm.\"),\n    (\"What if I have a medical condition that requires special assistance during the flight?\", \"We aim to make the flight comfortable for all passengers. If you have a medical condition that may require special assistance, please contact our \u2018special services\u2019 team 48 hours before your flight.\"),\n    (\"What in-flight entertainment options are available?\", \"We offer a range of in-flight entertainment options including a selection of movies, TV shows, music, and games, available on your personal seat-back screen.\"),\n    (\"What types of payment methods do you accept?\", \"We accept credit/debit cards, PayPal, bank transfers, and various other forms of payment. The available options may vary depending on the country of departure.\"),\n    (\"How can I earn and redeem frequent flyer miles?\", \"You can earn miles for every journey you take with us or our partner airlines. These miles can be redeemed for flight tickets, upgrades, or various other benefits. To earn and redeem miles, you need to join our frequent flyer program.\"),\n    (\"Can I bring a stroller for my baby?\", \"Yes, you can bring a stroller for your baby. It can be checked in for free and will normally be given back to you at the aircraft door upon arrival.\"),\n    (\"What age does my child have to be to qualify as an unaccompanied minor?\", \"Children aged between 5 and 12 years who are traveling alone are considered unaccompanied minors. Our team provides special care for these children from departure to arrival.\"),\n    (\"What documents do I need to travel internationally?\", \"For international travel, you need a valid passport and may also require visas, depending on your destination and your country of residence. It's important to check the specific requirements before you travel.\"),\n    (\"What happens if I miss my flight?\", \"If you miss your flight, please contact our customer service immediately. Depending on the circumstances, you may be able to rebook on a later flight, but additional fees may apply.\"),\n    (\"Can I travel with my musical instrument?\", \"Yes, small musical instruments can be brought on board as your one carry-on item. Larger instruments must be transported in the cargo, or if small enough, a seat may be purchased for them.\"),\n    (\"Do you offer discounts for children or infants?\", \"Yes, children aged 2-11 traveling with an adult usually receive a discount on the fare. Infants under the age of 2 who do not occupy a seat can travel for a reduced fare or sometimes for free.\"),\n    (\"Is smoking allowed on your flights?\", \"No, all our flights are non-smoking for the comfort and safety of all passengers.\"),\n    (\"Do you have family seating?\", \"Yes, we offer the option to seat families together. You can select seats during booking or afterwards through the 'Manage my booking' section on the website.\"),\n    (\"Is there any discount for senior citizens?\", \"Some flights may offer a discount for senior citizens. Please check our website or contact customer service for accurate information.\"),\n    (\"What items are prohibited on your flights?\", \"Prohibited items include, but are not limited to, sharp objects, firearms, explosive materials, and certain chemicals. You can find a comprehensive list on our website under the 'Security Regulations' section.\"),\n    (\"Can I purchase a ticket for someone else?\", \"Yes, you can purchase a ticket for someone else. You'll need their correct name as it appears on their government-issued ID, and their correct travel dates.\"),\n    (\"What is the process for lost and found items on the plane?\", \"If you realize you forgot an item on the plane, report it as soon as possible to our lost and found counter. We will make every effort to locate and return your item.\"),\n    (\"Can I request a special meal?\", \"Yes, we offer a variety of special meals to accommodate dietary restrictions. Please request your preferred meal at least 48 hours prior to your flight.\"),\n    (\"Is there a weight limit for checked baggage?\", \"Yes, luggage weight limits depend on your ticket class and route. You can find the details on your ticket or by visiting our website.\"),\n    (\"Can I bring my sports equipment?\", \"Yes, certain types of sports equipment can be carried either as or in addition to your permitted baggage. Some equipment may require additional fees. It's best to check our policy on our website or contact us directly.\"),\n    (\"Do I need a visa to travel to certain countries?\", \"Yes, visa requirements depend on the country you are visiting and your nationality. We advise checking with the relevant embassy or consulate prior to travel.\"),\n    (\"How can I add extra baggage to my booking?\", \"You can add extra baggage to your booking through the 'Manage my booking' section on our website or by contacting our customer services.\"),\n    (\"Can I check-in at the airport?\", \"Yes, you can choose to check-in at the airport. However, we also offer online and mobile check-in, which may save you time.\"),\n    (\"How do I know if my flight is delayed or cancelled?\", \"In case of any changes to your flight, we will attempt to notify all passengers using the contact information given at the time of booking. You can also check your flight status on our website.\"),\n    (\"What is your policy on pregnant passengers?\", \"Pregnant passengers can travel up to the end of the 36th week for single pregnancies, and the end of the 32nd week for multiple pregnancies. We recommend consulting your doctor before any air travel.\"),\n    (\"Can children travel alone?\", \"Yes, children age 5 to 12 can travel alone as unaccompanied minors. We provide special care for these seats. Please contact our customer service for more information.\"),\n    (\"How can I pay for my booking?\", \"You can pay for your booking using a variety of methods including credit and debit cards, PayPal, or bank transfers. The options may vary depending on the country of departure.\"),\n]\n\n# Write data to a CSV file\nwith open('customer_service_data.csv', 'w', newline='') as file:\n    writer = csv.writer(file)\n    writer.writerow([\"prompt\", \"response\"])\n    writer.writerows(data)\n</code></pre>"},{"location":"guides/fine_tuning/#making-your-data-accessible-to-llm-engine","title":"Making your data accessible to LLM Engine","text":"<p>Currently, data needs to be uploaded to a publicly accessible web URL so that it can be read for fine-tuning. Publicly accessible HTTP, HTTPS, and S3 URLs are currently supported. Support for privately sharing data with the LLM Engine API is coming shortly. For quick iteration, you can look into tools like Pastebin or Github Gists to quickly host your CSV files in a public manner. We created an example Github Gist you can see here. To use the gist, you can just use the URL given when you click the \u201cRaw\u201d button (URL).</p>"},{"location":"guides/fine_tuning/#launching-the-fine-tune","title":"Launching the fine-tune","text":"<p>Once you have uploaded your data, you can use the LLM Engine API to launch a fine-tune. You will  need to specify which base model to fine-tune, the locations of the training file and optional  validation data file, an optional set of hyperparameters to customize the fine-tuning behavior,  and an optional suffix to append to the name of the fine-tune.</p> <p>If you specify a suffix, the fine-tune will be named <code>model:suffix:&lt;timestamp&gt;</code>. If you do not,  the fine-tune will be named <code>model:&lt;timestamp&gt;</code>. The timestamp will be the time the fine-tune was  launched.</p> Hyper-parameters for fine-tune  * `lr`: Peak learning rate used during fine-tuning. It decays with a cosine schedule afterward. (Default: 2e-5) * `warmup_ratio`: Ratio of training steps used for learning rate warmup. (Default: 0.03) * `epochs`: Number of fine-tuning epochs. (Default: 5) * `weight_decay`: Regularization penalty applied to learned weights. (Default: 0.001) * `max_seq_length`: Maximum number of tokens per sequence in the dataset. (Default: 1024)  Create a fine-tune in python <pre><code>from llmengine import FineTune\n\nresponse = FineTune.create(\n    model=\"llama-7b\",\n    training_file=\"s3://my-bucket/path/to/training-file.csv\",\n)\n\nprint(response.json())\n</code></pre> <p>See the Model Zoo to see which models have fine-tuning support.</p> <p>Once the fine-tune is launched, you can also get the status of your fine-tune.</p>"},{"location":"guides/fine_tuning/#making-inference-calls-to-your-fine-tune","title":"Making inference calls to your fine-tune","text":"<p>Once your fine-tune is finished, you will be able to start making inference requests to the  model. You can use the <code>fine_tuned_model</code> returned from your  FineTune.get API call to reference your fine-tuned model in the Completions API. Alternatively, you can list  available LLMs with <code>Model.list</code> in order to find the name of your fine-tuned model. See the  Completion API for more details. You can then  use that name to direct your completion requests.  You must wait until your fine-tune is complete  before you can plug it into the Completions API. You can check the status of your fine-tune with  FineTune.get.</p> Inference with a fine-tuned model in python <pre><code>from llmengine import Completion\n\nresponse = Completion.create(\n    model_name=\"llama-7b:airlines:2023-07-17-08-30-45\",\n    prompt=\"Do you offer in-flight Wi-fi?\",\n    max_new_tokens=100,\n    temperature=0.2,\n)\nprint(response.json())\n</code></pre>"},{"location":"guides/rate_limits/","title":"Overview","text":""},{"location":"guides/rate_limits/#what-are-rate-limits","title":"What are rate limits?","text":"<p>A rate limit is a restriction that an API imposes on the number of times a user or client can access the server within a specified period of time.</p>"},{"location":"guides/rate_limits/#how-do-i-know-if-i-am-rate-limited","title":"How do I know if I am rate limited?","text":"<p>Per standard HTTP practices, your request will receive a response with HTTP status code of <code>429</code>, <code>Too Many Requests</code>.</p>"},{"location":"guides/rate_limits/#what-are-the-rate-limits-for-our-api","title":"What are the rate limits for our API?","text":"<p>The LLM Engine API is currently in a preview mode, and therefore we currently do not have any advertised rate limits. As the API moves towards a production release, we will update this section with specific rate limits. For now, the API will return HTTP 429 on an as-needed basis.</p>"},{"location":"guides/rate_limits/#error-mitigation","title":"Error mitigation","text":""},{"location":"guides/rate_limits/#retrying-with-exponential-backoff","title":"Retrying with exponential backoff","text":"<p>One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff.  Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the  unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated.  This continues until the request is successful or until a maximum number of retries is reached. This approach has many benefits:</p> <ul> <li>Automatic retries means you can recover from rate limit errors without crashes or missing data</li> <li>Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail</li> <li>Adding random jitter to the delay helps retries from all hitting at the same time.</li> </ul> <p>Below are a few example solutions for Python that use exponential backoff.</p>"},{"location":"guides/rate_limits/#example-1-using-the-tenacity-library","title":"Example #1: Using the <code>tenacity</code> library","text":"<p>Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding  retry behavior to just about anything. To add exponential backoff to your requests, you can use the tenacity.retry  decorator. The below example uses the tenacity.wait_random_exponential function to add random exponential backoff to a  request.</p> Exponential backoff in python <pre><code>import llmengine\nfrom tenacity import (\n    retry,\n    stop_after_attempt,\n    wait_random_exponential,\n)  # for exponential backoff\n@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))\ndef completion_with_backoff(**kwargs):\n    return llmengine.Completion.create(**kwargs)\n\ncompletion_with_backoff(model=\"llama-7b\", prompt=\"Why is the sky blue?\")\n</code></pre>"},{"location":"guides/rate_limits/#example-2-using-the-backoff-library","title":"Example #2: Using the <code>backoff</code> library","text":"<p>Another python library that provides function decorators for backoff and retry is backoff:</p> Decorators for backoff and retry in python <pre><code>import llmengine\nimport backoff\n@backoff.on_exception(backoff.expo, llmengine.error.RateLimitError)\ndef completions_with_backoff(**kwargs):\n    return llmengine.Completion.create(**kwargs)\n\ncompletions_with_backoff(model=\"llama-7b\", prompt=\"Why is the sky blue?\")\n</code></pre>"},{"location":"guides/token_streaming/","title":"Token streaming","text":"<p>The Completions APIs support a <code>stream</code> boolean parameter that, when <code>True</code>, will return a streamed response of token-by-token server-sent events (SSEs) rather than waiting to receive the full response when model generation has finished. This decreases latency of when you start getting a response.</p> <p>The response will consist of SSEs of the form <code>{\"token\": dict, \"generated_text\": str | null, \"details\": dict | null}</code>, where the dictionary for each token will contain log probability information in addition to the generated string; the <code>generated_text</code> field will be <code>null</code> for all but the last SSE, for which it will contain the full generated response.</p>"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 3b09734ab..e73c22d16 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ