Parser
Current ISL/MSL parser split, queue semantics, and review boundary.Parser modules
src/services/ingest.ts(source upload + queue dispatch)src/queues/parse-queue.ts(queue entrypoint)src/services/parse.ts(orchestrator, claim, attempt lifecycle)src/services/document-parser.ts(DOCX text extraction + single-pass classifier + prompt plumbing)src/services/isl-classifier.ts(one-call classifier wrapper + error normalization)src/services/isl-parse-context.ts(LP/fund context and coverage math)src/services/parse-persistence.ts(draft writes + parse-state transition)src/services/review.ts(human confirmation and live promotion)
Important boundary
The parser output is draft-only. It does not create operative side-letter applicability. Review confirmation is the boundary that writes live clauses and updates commitment_clause_assignments.
Current pipeline
| # | Stage | What happens |
|---|---|---|
| 1 | Upload | POST /funds/:fundSlug/upload accepts a docx via upload-handler and normalizes file metadata. |
| 2 | Ingest | ingestUpload validates hash/duplicates, writes a source row with parse_state='queued', puts bytes to R2, and sends PARSE_QUEUE. |
| 3 | Queue | handleParseQueue consumes the message, runs parseDocumentJob, and retries transient failures. |
| 4 | Attempt claim | Parser opens document_parse_attempts (append-like log), claims source row by swapping parse_state from queued to parsing, and stamps parse_started_at. |
| 5 | Cleanup | Previous draft artifacts for the same source doc are deleted idempotently before running the classifier. |
| 6 | Extract + classify | R2 payload is extracted and passed through the single-pass classifier in document-parser.ts. |
| 7 | Context | resolveIslParsedContext resolves LP/fund hints, computes match fields, and computes uncovered paragraph count. |
| 8 | Persist draft | Draft clauses are written in clause_intake_drafts with source spans in clause_intake_sources and rows in document_extracted_paragraphs. |
| 9 | Attempt close | Attempt row is closed as success, permanent_failure, or transient_failure. Source row transitions to parsed only on success. |
| 10 | Review | Review routes expose draft rows and call confirmDraftClauses only on submit, then set reviewed rows and promote to live. |
Failure model
| Failure class | Examples | Result |
|---|---|---|
| Permanent | Missing R2 object, malformed DOCX, classifier syntax violations, FK violation | source_row.parse_state='parse_failed', attempt marked permanent_failure; queue returns success (no further automatic retry). |
| Transient | AI timeout/network, runtime interruption, non-FK parser exceptions | source_row.parse_state='queued', attempt marked transient_failure; queue retries or DLQ depending on retry policy. |
| DLQ | Repeated transient failures | Dead-letter messages set source state to dead_lettered and close open attempts. |
Scheduled recovery thresholds
stuck queuedrows older than 15 minutes are re-enqueued (src/queues/scheduled-sweep.ts).stuck parsingrows withparse_started_atolder than 30 minutes are moved back toqueuedand re-enqueued.
Classifier prompt shape
The parser still uses a one-pass classification flow built in code via buildIslSinglePassSystemPrompt / buildMslSinglePassSystemPrompt in document-parser.ts. The generated prompt, plus catalogue context, is applied to each queued source upload.