Palantir Foundry Design Patterns
Palantir Foundry Design Patterns
Section titled “Palantir Foundry Design Patterns”I’ve spent years working on Palantir Foundry across different domains. These are the patterns I reach for repeatedly — the design shapes that keep solving the same classes of problems regardless of the specific use case. Nothing here is proprietary or secret; it’s the kind of architectural intuition you develop from building real systems on the platform.
Ontology Design
Section titled “Ontology Design”Flatten When the Entity is the Unit of Analysis
Section titled “Flatten When the Entity is the Unit of Analysis”The most common ontology mistake is over-normalizing. If every query your users will run starts and ends with the same object type, that object should carry its context as properties, not as links to other objects.
The test: Ask “will anyone ever query this sub-entity independently?” If the answer is no — if a location only matters in the context of its parent transaction, if a category only matters as an attribute of an event — it’s a property, not a linked object.
When to link instead:
- The related entity has an independent lifecycle (accounts open and close, people join and leave)
- The related entity is shared across many parent objects (a vendor supplies many products)
- You need to track changes to the related entity over time independently
- Users will want to search for and view the related entity on its own
The cost of linking: Every link is a join. In Quiver, Workshop, and OSDK queries, links add latency and cognitive overhead. Five linked object types means every query is mentally a five-way join, even if the platform handles it efficiently. For analytical workloads where you’re filtering, grouping, and aggregating — properties win.
Model Actions Around Business Verbs, Not CRUD
Section titled “Model Actions Around Business Verbs, Not CRUD”Foundry Actions aren’t database operations — they’re business operations. “Approve Request” is an Action. “Update status field to ‘approved’” is not.
Design Actions around what users do, not what the data layer stores:
- Good:
Submit Expense Report,Approve Purchase Order,Flag Transaction for Review - Bad:
Update Object,Set Status,Modify Property
Each Action should encapsulate validation logic, state transitions, and side effects. If approving a purchase order should also notify the requester and update a budget tracker, that’s one Action — not three.
Use Computed Properties for Derived State
Section titled “Use Computed Properties for Derived State”If a property’s value can be calculated from other properties on the same object (or from linked objects), make it a computed property rather than storing it in the pipeline.
days_since_last_transaction— computed fromlast_transaction_dateand current datetotal_spend_this_month— computed by aggregating linked transactionsrisk_score— computed from multiple object properties
Computed properties stay current without pipeline reruns. They’re evaluated at query time, which means they reflect the latest state of their inputs.
When to precompute instead: If the computation is expensive (complex aggregation over millions of objects) or if you need the historical value (what was the risk score on January 15th, not what is it now), precompute in the pipeline and store as a regular property.
Pipeline Architecture
Section titled “Pipeline Architecture”Snapshot Sources, Incremental Transforms
Section titled “Snapshot Sources, Incremental Transforms”Many external APIs don’t provide change feeds — they return the current state of the world. Don’t fight this. Accept snapshots at the source and let Foundry’s incremental transform framework handle change detection downstream.
flowchart LR
api["External API\n(Full snapshot)"] --> raw["Raw Dataset\n(Snapshot)"]
raw --> dedup["Dedup Transform\n(Incremental)"]
dedup --> clean["Clean Transform\n(Incremental)"]
clean --> enrich["Enrich Transform\n(Incremental)"]
enrich --> ontology["Ontology\n(Backing Dataset)"]
The pattern:
- Raw dataset — snapshot sync, stores exactly what the API returned. No transformation. This is your audit trail.
- First transform — deduplication using natural keys. Outputs incrementally: only new or changed records propagate.
- Subsequent transforms — cleaning, enrichment, joins. All incremental, all building on the dedup layer.
Why this works: The snapshot-to-incremental boundary is clean and debuggable. If something looks wrong in the cleaned data, you can inspect the raw snapshot. If the API changes its response format, only the first transform breaks — everything downstream is insulated.
Separate Ingestion from Logic
Section titled “Separate Ingestion from Logic”Keep your Data Connection sources and your transform logic in separate repositories (or at minimum, separate folders within a repo). Source configuration changes (new API keys, endpoint URLs, schedule changes) should never require touching transform code, and vice versa.
This also enables different change cadences. Sources might need emergency reconfiguration (API endpoint migrated). Transforms change when business logic changes. Coupling them means every source change requires regression testing transform logic.
Use JDBC for Databases, REST for APIs, Native Connectors for Everything Else
Section titled “Use JDBC for Databases, REST for APIs, Native Connectors for Everything Else”Foundry’s connector ecosystem has three tiers of maturity:
| Tier | Connector Type | When to Use |
|---|---|---|
| Native | Gmail, S3, JDBC, Snowflake | First choice when available. Best integrated, most reliable. |
| REST | Generic REST API source | When no native connector exists. Configure auth, pagination, response parsing. |
| External Transform | Python/Java code in Foundry | When the source requires custom logic (IMAP, FTP, websocket, complex auth flows). |
Don’t build external transforms when a REST connector will do. The REST connector handles retries, pagination, credential management, and scheduling out of the box. External transforms are powerful but require you to handle all of that yourself.
Integration Patterns
Section titled “Integration Patterns”External API → Foundry (Ingestion)
Section titled “External API → Foundry (Ingestion)”The standard pattern for pulling data from an external API:
- Data Connection Source — configure the API as a REST source with auth credentials stored as Foundry secrets
- Sync Schedule — set appropriate cadence (real-time via streaming, or batch via scheduled sync)
- Raw Landing Dataset — JSON responses land unmodified
- Transform Chain — parse, clean, deduplicate, enrich
- Ontology Backing — cleaned dataset backs an object type
Auth patterns:
- API Key — simplest, store as source secret
- OAuth2 Client Credentials — for service-to-service, Foundry handles token refresh
- OAuth2 Authorization Code — for user-delegated access, requires interactive setup
Foundry → External (OSDK)
Section titled “Foundry → External (OSDK)”The Ontology SDK enables external applications to query Foundry’s ontology and call AIP Logic functions. The key architectural decisions:
Service principal vs. user-delegated auth. For automated systems (backends, agents, scheduled jobs), use OAuth2 confidential client (service principal). The application has its own identity and permissions. For interactive applications where actions should be attributed to specific users, use authorization code flow.
Read-only by default. Start with read-only OSDK access. Add write capabilities (Actions) only when you have a clear need and have validated the authorization model. An OSDK client that can modify ontology objects is a write path that bypasses Foundry’s UI-based review workflows.
Cache ontology queries, not Logic results. Ontology object queries against static or slowly-changing data can be cached aggressively. AIP Logic function results should generally not be cached — they may incorporate real-time data or model state that changes between calls.
Bidirectional Sync Patterns
Section titled “Bidirectional Sync Patterns”When you need data flowing both directions (external system ↔ Foundry), resist the urge to build a single bidirectional connector. Instead:
- Ingestion path: External → Foundry via Data Connection (source of truth for external data)
- Export path: Foundry → External via OSDK or data export (source of truth for Foundry-enriched data)
- Conflict resolution: Pick one system as authoritative per field. Don’t merge.
AIP Logic Patterns
Section titled “AIP Logic Patterns”When to Use Logic Functions vs. Direct API
Section titled “When to Use Logic Functions vs. Direct API”Use AIP Logic when:
- The query requires natural language interpretation (“what’s unusual about this month’s spending?”)
- The analysis involves pattern detection that’s hard to express as structured filters
- You need the function to compose multiple ontology queries dynamically
- The output benefits from natural language explanation alongside structured data
Use direct ontology queries (OSDK) when:
- The query parameters are known and structured (“get transactions between date X and Y for account Z”)
- You need deterministic, reproducible results
- Performance matters — Logic functions have higher latency than direct queries
- You’re building a UI that maps user inputs directly to query parameters
The hybrid approach: Expose both. Let structured tools handle known queries efficiently, and route open-ended questions to Logic functions. This gives you the best of both worlds — speed for common operations, flexibility for ad hoc analysis.
Design Logic Functions Around Capabilities, Not Screens
Section titled “Design Logic Functions Around Capabilities, Not Screens”Don’t create a Logic function for each UI view or dashboard panel. Create functions around analytical capabilities:
analyze_trends(entity_type, time_range)— notget_dashboard_chart_1_data()detect_anomalies(dataset, sensitivity)— notget_alerts_page_data()answer_question(question, context)— notsearch_bar_handler()
Capability-oriented functions compose well. A single analyze_trends function serves the dashboard, the mobile app, the agent, and the API. A screen-specific function serves one consumer and needs a sibling for every new consumer.
Logic Function Error Handling
Section titled “Logic Function Error Handling”AIP Logic functions can fail in ways that structured queries can’t — the underlying model might misinterpret the question, produce a hallucinated answer, or time out on a complex query. Design your Logic function consumers to handle:
- Timeout — Logic functions have execution time limits. Complex questions over large datasets may hit them.
- Low-confidence results — if the function returns a confidence score, set a threshold below which you surface the uncertainty to the user
- Fallback to structured query — if a Logic function fails, can you decompose the question into structured queries that approximate the answer?
Data Quality Patterns
Section titled “Data Quality Patterns”Validation Transforms
Section titled “Validation Transforms”Insert a validation transform between raw ingestion and the rest of your pipeline. This transform:
- Schema validation — expected fields present, correct types
- Value validation — amounts are positive, dates are parseable, required fields non-null
- Referential integrity — foreign keys (account IDs, category codes) resolve to known values
- Anomaly flagging — statistical outliers flagged but not rejected (let downstream logic decide)
Output two datasets: validated (clean records) and quarantine (records that failed validation with failure reasons). Never silently drop records — quarantine them and alert.
Reconciliation Transforms
Section titled “Reconciliation Transforms”When you have overlapping data sources (e.g., bank transactions from an API and email receipts from a connector), build explicit reconciliation:
- Match — join on natural keys (transaction ID, amount + date + merchant)
- Merge — combine properties from both sources, with explicit precedence rules
- Unmatched tracking — surface records that exist in one source but not the other
The reconciliation transform is where you encode your trust hierarchy: which source wins when they disagree on a property value? Document these decisions as comments in the transform code — they’re business rules, not implementation details.
Freshness Monitoring
Section titled “Freshness Monitoring”Don’t assume pipelines run. Build freshness checks:
- Expected sync cadence — if a source syncs daily, alert when the latest record is >36 hours old
- Row count monitoring — if a daily sync typically produces 50-200 records and today’s produced 0, that’s a signal
- Schema drift detection — if the source API adds or removes fields, catch it early
These checks can be simple transforms that output a health status dataset, consumed by an alerting mechanism. The key insight: pipeline monitoring is itself a pipeline.
How to Think About Foundry
Section titled “How to Think About Foundry”It’s an Ontology Platform, Not a Database
Section titled “It’s an Ontology Platform, Not a Database”The most important mental shift: Foundry’s primary abstraction is the ontology, not the dataset. Datasets are the storage layer. The ontology is the semantic layer — it defines what things mean, how they relate, and what you can do with them.
When designing a Foundry solution, start with the ontology: What are the entities? What are their properties? How do they relate? What actions can users take? Then work backward to what datasets need to exist to back those objects.
If you start with datasets and try to stretch an ontology over them after the fact, you’ll end up with object types that mirror table schemas rather than representing business concepts.
Pipelines Are Data Products
Section titled “Pipelines Are Data Products”Each transform chain should produce a data product — a dataset that has a clear consumer, a defined schema, a quality contract, and an owner. If you can’t name who consumes a dataset and what they use it for, question whether it should exist.
This mindset prevents the common failure mode of “transform sprawl” — dozens of intermediate datasets that nobody owns, nobody monitors, and nobody is sure whether they’re still needed.
Ontology Actions Are Your Write API
Section titled “Ontology Actions Are Your Write API”If external systems need to write to Foundry, route writes through ontology Actions, not direct dataset modifications. Actions provide:
- Validation — business rules enforced before the write lands
- Audit trail — who did what, when, through which Action
- Permissions — fine-grained control over who can execute which Actions
- Side effects — notifications, state transitions, dependent updates
Direct dataset writes bypass all of this. They’re appropriate for pipeline-managed datasets (where the pipeline is the write path), but external systems should always go through the Action layer.