Error Handling & Retrigger

Durably doesn't auto-retry failures. This is intentional — you decide what to do when something goes wrong.

How Failures Work

When a step throws an error, the run is marked failed immediately. Completed steps keep their cached results.

run: async (step) => {
  await step.run('step-1', () => 'ok') // Saved
  await step.run('step-2', () => {
    throw new Error('boom')
  }) // Run fails here
  await step.run('step-3', () => 'never') // Never reached
}

Retriggering creates a fresh new run with the same input — the original run stays as-is and previous steps are not reused.

Retrigger Patterns

Server-Side Retrigger

// Check and retrigger a failed run
const run = await durably.getRun(runId)
if (run?.status === 'failed') {
  const newRun = await durably.retrigger(runId) // Validates input against current schema, creates fresh run
  console.log(`New run: ${newRun.id}`)
}

Fullstack Retrigger (React)

tsx

import { durablyClient } from '~/lib/durably'

function FailedRunActions({ runId }: { runId: string }) {
  const { retrigger, cancel } = durablyClient.useRunActions()
  const { status, error } = durablyClient.importCsv.useRun(runId)

  if (status === 'failed') {
    return (
      <div>
        <p>Failed: {error}</p>
        <button
          onClick={async () => {
            try {
              const newRunId = await retrigger(runId)
              console.log(`New run: ${newRunId}`)
            } catch (e) {
              console.error('Retrigger failed:', e)
            }
          }}
        >
          Retrigger
        </button>
      </div>
    )
  }

  if (status === 'leased') {
    return (
      <button
        onClick={async () => {
          try {
            await cancel(runId)
          } catch (e) {
            console.error('Cancel failed:', e)
          }
        }}
      >
        Cancel
      </button>
    )
  }

  return null
}

SPA Retrigger

In SPA mode, trigger the same job again — Durably doesn't expose a direct retrigger() in the browser hooks.

tsx

import { useJob } from '@coji/durably-react/spa'

function RetryableJob() {
  const { trigger, isFailed, error, reset } = useJob(myJob)

  if (isFailed) {
    return (
      <div>
        <p>Failed: {error}</p>
        <button onClick={() => { reset(); trigger({ ... }) }}>
          Try Again
        </button>
      </div>
    )
  }

  return <button onClick={() => trigger({ ... })}>Run</button>
}

Handling ConflictError

When using concurrencyKey, at most one pending run per key is allowed. A second trigger throws ConflictError:

import { ConflictError } from '@coji/durably'

try {
  await job.trigger({ orgId: 'org_123' }, { concurrencyKey: 'org_123' })
} catch (err) {
  if (err instanceof ConflictError) {
    // A pending run already exists for this key.
    // Use coalesce: 'skip' to return the existing run instead:
    const run = await job.trigger(
      { orgId: 'org_123' },
      { concurrencyKey: 'org_123', coalesce: 'skip' },
    )
    // run.disposition === 'coalesced' — the existing pending run was returned
  }
}

Designing Resilient Steps

Make Steps Idempotent

Steps may re-execute after a crash. Use upserts and idempotency keys:

// Good: upsert instead of insert
await step.run('save-user', () => db.upsert(user))

// Good: idempotency key with external APIs
await step.run('charge', () =>
  stripe.charges.create({
    amount: 1000,
    idempotency_key: `order_${orderId}`,
  }),
)

// Bad: duplicate insert on re-execution
await step.run('save-user', () => db.insert(user))

Keep Steps Small

Smaller steps = less work to redo on failure:

// Bad: one step for everything
await step.run('import-all', async () => {
  for (const row of rows) await db.insert(row)
})

// Good: batch checkpoints
for (let i = 0; i < rows.length; i += 100) {
  await step.run(`batch-${i}`, async () => {
    for (const row of rows.slice(i, i + 100)) {
      await db.insert(row)
    }
  })
}

Handle Partial Failures

Use step results to track what succeeded:

run: async (step, input) => {
  const results = []

  for (const item of input.items) {
    const result = await step.run(`process-${item.id}`, async () => {
      try {
        await processItem(item)
        return { id: item.id, ok: true }
      } catch (e) {
        step.log.warn(`Failed to process ${item.id}: ${e}`)
        return { id: item.id, ok: false, error: String(e) }
      }
    })
    results.push(result)
  }

  const succeeded = results.filter((r) => r.ok).length
  const failed = results.filter((r) => !r.ok).length
  return { succeeded, failed }
}

Preventing Duplicates

Use idempotency keys to ensure a job runs at most once for a given operation:

// Same key = same run (returns existing if already triggered)
await durably.jobs.importCsv.trigger(
  { filename: 'data.csv' },
  { idempotencyKey: `import-${fileHash}` },
)

This is useful for:

Form double-submit protection
Webhook deduplication
Scheduled job deduplication (one per day)

Cancellation

Cancel a pending or leased run. If leased, the current step finishes, then the run stops.

// Server-side
await durably.cancel(runId)

// Fullstack (React) — useRunActions rejects on failure; handle errors in the caller (try/catch or .catch).
const { cancel } = durablyClient.useRunActions()
await cancel(runId)

Monitoring Failures

Use events to detect and alert on failures:

durably.on('run:fail', ({ runId, jobName, error }) => {
  console.error(`Job ${jobName} failed (${runId}): ${error}`)
  // Send to your alerting system
})

durably.on('step:fail', ({ runId, stepName, error }) => {
  console.error(`Step ${stepName} failed in ${runId}: ${error}`)
})

Next Steps

Authentication — Protect your endpoints
Deployment Guide — Choose the right mode for your app
Events Reference — All event types

Error Handling & Retrigger ​

How Failures Work ​

Retrigger Patterns ​

Server-Side Retrigger ​

Fullstack Retrigger (React) ​

SPA Retrigger ​

Handling ConflictError ​

Designing Resilient Steps ​

Make Steps Idempotent ​

Keep Steps Small ​

Handle Partial Failures ​

Preventing Duplicates ​

Cancellation ​

Monitoring Failures ​

Next Steps ​