Clause Export API
Read-only JSON / NDJSON / CSV export of Hugo's clause data — designed for piping into jq, DuckDB, spreadsheets, and other tools.Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /funds/:fundId/export/clauses |
Per-fund export. Gated by the existing fund-access middleware. |
| GET | /export/clauses |
Cross-fund export. Requires at least one fund_id query param. The handler intersects user-supplied fund_id[] with the caller's accessible fund set and silently drops fund IDs the caller cannot see (no 403 — that would leak existence). |
Authentication
Same auth as the rest of Hugo. From a browser session the cookie is enough. From a shell, use the CF Access service token already configured for *.nordiclawfirm.com:
source ~/.env.cloudflare
curl -sS \
-H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
"https://hugo.nordiclawfirm.com/funds/$FUND/export/clauses?view=library&format=ndjson"
Views
Each view returns a different shape and carries a distinct schema discriminator in the schema field of the JSON envelope and in the X-Hugo-Schema header on NDJSON / CSV responses.
clauses_library_v1Flat clause list with assignments, lineage, and provenance. Default view.
Fields: id, fund_id, clause_name_id, clause_name, clause_group_id, clause_group_name, text, lifecycle_status, version, is_latest, source, ai_confidence, prev_clause_id, parent_id, change_note, created_at, updated_at, created_by_email, updated_by_email, assignments[], ops_tags[], hugo_url.
clauses_msl_v1Master Side Letter view: clause-centric, with elected_by, eligible_lps, and excluded_lps arrays per clause. Builds on the cached loadMslData() so subsequent calls within 60s are cheap.
Fields: clause_id, clause_name_id, clause_name, clause_group_id, clause_group_name, text, lifecycle_status, source_lp_id, source_lp_name, elected_by[], eligible_lps[], excluded_lps[] (each excluded LP carries an entries[] with the rationale source), sort_order, hugo_url.
clauses_isl_v1Per-LP Individual Side Letter view. Returns clauses owned by the specified LP plus clauses they are MFN-eligible for.
Required: commitment_id (must belong to the fund — validated server-side).
Fields: clause_id, clause_name_id, clause_name, clause_group_id, text, lifecycle_status, version, is_latest, source, ai_confidence, parent_id, prev_clause_id, origin (own | mfn_eligible), source_lp_id, source_lp_name, created_at, updated_at, hugo_url.
Filters
{error, code: "invalid_filter", parameter, allowed[]}. There is no silent ignoring — if a filter key is mistyped, the request fails loudly.
Multi-value filters accept either repeated params (?source=manual&source=ai) or comma-separated lists (?source=manual,ai). Both work.
Common (all views)
| Param | Type | Notes |
|---|---|---|
view | enum | library (default) / msl / isl |
format | enum | json (default) / ndjson / csv. Also honored via Accept header. |
limit | int | Default 5 000, hard cap 50 000. Truncation reported via truncated: true in the envelope and X-Hugo-Truncated header. |
fields | csv | Projection — only emit these top-level fields per clause. Default: all. |
clause_name_id | multi | IN |
clause_group_id | multi | IN |
lifecycle_status | multi | Default live. Values: draft, live, election, completed, archived, superseded. |
source | multi | manual, upload, ai, bpmsl_upload, msl_upload |
is_latest | enum | true (default) / any |
q | string | Full-text search via clauses_fts |
min_confidence | float | gte; rows with NULL ai_confidence are excluded when this filter is set |
updated_since | RFC3339 | gte |
ops_tag | multi | Filter by ops tag |
investor_category_id | multi | Filter by investor category |
Library-only
| Param | Notes |
|---|---|
source_lp_id | multi — filter by originating LP |
commitment_id | multi — alternative to source_lp_id |
shared | bool — clauses shared across multiple LPs |
unassigned | bool — clauses with no assignments |
bpmsl | bool — base-paper MSL clauses |
Cross-fund-only (/export/clauses)
| Param | Notes |
|---|---|
fund_id | multi, required (≥1). Capped at 10 (MAX_CROSS_FUND_FUND_IDS). For view=msl, capped at 5 (MSL_CROSS_FUND_FUND_LIMIT). User-supplied IDs are intersected with the caller's accessible fund set; inaccessible IDs are silently dropped. |
fund_category | multi |
vintage_from | int year, gte |
vintage_to | int year, lte |
MSL-only
| Param | Notes |
|---|---|
mfn_state | enum: elected / eligible / excluded |
target_lp_id | string — required when mfn_state is set; validated against the fund |
ISL-only
| Param | Notes |
|---|---|
commitment_id | required; validated against the fund |
unique | bool — only this LP's own clauses |
elected | bool — only clauses elected via MFN |
forked | bool — only clauses with a parent_id |
low_confidence | bool — ai_confidence < 0.6 |
Response shapes
JSON envelope (default)
{
"schema": "clauses_library_v1",
"schema_version": "1.0.0",
"generated_at": "2026-04-12T21:28:46.037Z",
"fund": { "id": "fund_northstar8", "name": "Northstar Equity Partners VIII" },
"filters_applied": { "view": "library", "format": "json", "limit": 3, "lifecycle_status": ["live"] },
"fields": ["id", "fund_id", "clause_name", "..."],
"count": 3,
"truncated": true,
"clauses": [ /* ... */ ]
}
fund is null for cross-fund responses.
NDJSON
One JSON object per line, no envelope. Metadata moves to response headers so single-line tools like jq -c and DuckDB's read_json_auto('/dev/stdin') can consume it directly.
| Header | Value |
|---|---|
Content-Type | application/x-ndjson |
Cache-Control | private, no-store |
X-Hugo-Schema | e.g. clauses_library_v1 |
X-Hugo-Schema-Version | e.g. 1.0.0 |
X-Hugo-Count | row count |
X-Hugo-Truncated | true / false |
X-Hugo-Filters-Applied | base64-encoded JSON of applied filters |
loadMslData / loadMfnData and is deferred.
CSV
Content-Type: text/csv; charset=utf-8. Header row from the active projection (fields= or the view's default). Multi-value cells (e.g. elected_by) are joined with |. RFC 4180 quoting (" escaped as ""). Same X-Hugo-* metadata headers as NDJSON.
Errors
400 responses use the apiBadRequest helper:
{
"error": "Unknown filter parameter \"bogus\"",
"code": "invalid_filter",
"parameter": "bogus",
"allowed": ["bpmsl", "clause_group_id", "clause_name_id", "..."]
}
Code values: invalid_filter, invalid_value, missing_required, limit_exceeded, not_found (e.g. wrong-fund target_lp_id or commitment_id).
Schema stability
- Additive changes are non-breaking under
1.x: new top-level fields, new filter params, new enum values forsource/lifecycle_status/ etc. - Breaking changes bump major: field removals, renames, type changes, meaning changes.
- Consumers MUST ignore unknown fields and unknown enum values.
- Field ordering in JSON objects is not guaranteed. Use object keys, not position.
Version is exposed both in the JSON envelope (schema_version) and as the X-Hugo-Schema-Version response header.
Examples
1. Library view, NDJSON, piped through jq
curl -sS "${AUTH[@]}" \
"$H/funds/$F/export/clauses?view=library&format=ndjson&lifecycle_status=live" \
| jq -c '{id, clause_name, lp_count: (.assignments|length)}'
2. MSL view as JSON, projected to a few fields
curl -sS "${AUTH[@]}" \
"$H/funds/$F/export/clauses?view=msl&fields=clause_id,clause_name,elected_by,eligible_lps" \
| jq '.clauses[] | select((.elected_by | length) > 0)'
3. Per-LP ISL view as CSV
curl -sS "${AUTH[@]}" \
"$H/funds/$F/export/clauses?view=isl&commitment_id=$COMMITMENT_ID&format=csv&fields=clause_id,clause_name,origin,source_lp_name,text" \
> clauses.csv
4. Cross-fund library across two funds
curl -sS "${AUTH[@]}" \
"$H/export/clauses?fund_id=$F1&fund_id=$F2&format=ndjson&lifecycle_status=live"
5. DuckDB ingest
curl -sS "${AUTH[@]}" \
"$H/funds/$F/export/clauses?view=library&format=ndjson" \
| duckdb -c "SELECT clause_name, count(*) FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC"
Operational notes
loadMslData()is cached for 60 s per fund (SCOPE_MFN_BUNDLE). MSL exports off a warm cache are cheap; cold exports do ~11 D1 round trips.- The cross-fund variant loops
loadMslDataper fund, so theMSL_CROSS_FUND_FUND_LIMIT = 5cap exists to bound latency. - Hugo has only one Worker environment (
hugo.nordiclawfirm.com); there is no preview env. Smoke testing is done by deploying thencurling.
Roadmap
Deferred from v1, considered for v1.1+:
resolve_definitions=true— inline-substitute[Defined Term]placeholders using the existingapplyDefinitionshelper. Reviewer flagged this as Hugo's potential killer differentiator.- True row-by-row NDJSON streaming — needs
loadMslData/loadMfnDatarestructuring to yield incrementally. - Cursor pagination — for now, large result sets get
truncated: trueat thelimitcap. - Saved exports — name a filter set, get a stable URL that re-runs on each fetch. Power users would use this as a webhook target.
- OpenAPI doc generated from the same allowlist the handler validates against, served at
/funds/:fundId/export/openapi.json. - ETag +
If-None-Matchso polling pipelines are cheap.