Documentation Index
Fetch the complete documentation index at: https://databridge-datetime-fix.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Installation
pip install --upgrade morphik
Breaking Changes
1. list_documents() Return Type
What changed: The method now returns a ListDocsResponse object instead of List[Document].
Before (v0.x)
docs = db.list_documents()
for doc in docs:
process(doc)
After (v1.0)
response = db.list_documents()
for doc in response.documents: # Access via .documents
process(doc)
Why? The new structure provides:
- Pagination metadata (
has_more, next_skip, total_count)
- Aggregates (status counts, folder counts)
- Sorting capabilities
- Better support for large datasets
Before (v0.x)
page1 = db.list_documents(skip=0, limit=10)
page2 = db.list_documents(skip=10, limit=10)
# No way to know if more pages exist
After (v1.0)
page1 = db.list_documents(skip=0, limit=10)
if page1.has_more:
page2 = db.list_documents(skip=page1.next_skip, limit=10)
Or with total count:
response = db.list_documents(limit=10, include_total_count=True)
print(f"Page 1 of {response.total_count // 10 + 1}")
New Features
Sorting
Sort documents by any field:
# Most recently updated first
response = db.list_documents(sort_by="updated_at", sort_direction="desc")
# Alphabetically by filename
response = db.list_documents(sort_by="filename", sort_direction="asc")
Available sort fields:
created_at - Creation timestamp
updated_at - Last modification timestamp
filename - Document filename
external_id - Document ID
Aggregates
Get document counts without retrieving all documents:
# Status breakdown
response = db.list_documents(
limit=0, # Don't need documents
include_status_counts=True
)
print(response.status_counts)
# {"completed": 100, "processing": 5, "failed": 2}
# Folder distribution
response = db.list_documents(include_folder_counts=True)
for folder in response.folder_counts:
print(f"{folder.folder}: {folder.count} docs")
Completed-Only Filter
Filter to only completed documents:
response = db.list_documents(completed_only=True)
# Only returns successfully processed documents
Total Count
Get total matching documents for pagination:
response = db.list_documents(
filters={"department": "sales"},
include_total_count=True
)
print(f"Found {response.total_count} sales documents")
Migration Checklist
Common Migration Patterns
Pattern 1: Simple Iteration
# Before
for doc in db.list_documents():
process(doc)
# After
for doc in db.list_documents().documents:
process(doc)
# Before
skip = 0
while True:
docs = db.list_documents(skip=skip, limit=100)
if not docs:
break
for doc in docs:
process(doc)
skip += 100
# After
skip = 0
while True:
response = db.list_documents(skip=skip, limit=100)
if not response.documents:
break
for doc in response.documents:
process(doc)
if not response.has_more:
break
skip = response.next_skip
Pattern 3: Count Documents
# Before (had to fetch all)
all_docs = db.list_documents()
count = len(all_docs)
# After (much more efficient)
response = db.list_documents(limit=1, include_total_count=True)
count = response.total_count
Getting Help
Rollback
If you need to rollback to v0.x:
pip install morphik==0.2.15