Palantir Foundry Design Patterns

I’ve spent years working on Palantir Foundry across different domains. These are the patterns I reach for repeatedly — the design shapes that keep solving the same classes of problems regardless of the specific use case. Nothing here is proprietary or secret; it’s the kind of architectural intuition you develop from building real systems on the platform.

Ontology Design

Flatten When the Entity is the Unit of Analysis

The most common ontology mistake is over-normalizing. If every query your users will run starts and ends with the same object type, that object should carry its context as properties, not as links to other objects.

The test: Ask “will anyone ever query this sub-entity independently?” If the answer is no — if a location only matters in the context of its parent transaction, if a category only matters as an attribute of an event — it’s a property, not a linked object.

When to link instead:

The related entity has an independent lifecycle (accounts open and close, people join and leave)
The related entity is shared across many parent objects (a vendor supplies many products)
You need to track changes to the related entity over time independently
Users will want to search for and view the related entity on its own

The cost of linking: Every link is a join. In Quiver, Workshop, and OSDK queries, links add latency and cognitive overhead. Five linked object types means every query is mentally a five-way join, even if the platform handles it efficiently. For analytical workloads where you’re filtering, grouping, and aggregating — properties win.

Model Actions Around Business Verbs, Not CRUD

Foundry Actions aren’t database operations — they’re business operations. “Approve Request” is an Action. “Update status field to ‘approved’” is not.

Design Actions around what users do, not what the data layer stores:

Good: Submit Expense Report, Approve Purchase Order, Flag Transaction for Review
Bad: Update Object, Set Status, Modify Property

Each Action should encapsulate validation logic, state transitions, and side effects. If approving a purchase order should also notify the requester and update a budget tracker, that’s one Action — not three.

Use Computed Properties for Derived State

If a property’s value can be calculated from other properties on the same object (or from linked objects), make it a computed property rather than storing it in the pipeline.

days_since_last_transaction — computed from last_transaction_date and current date
total_spend_this_month — computed by aggregating linked transactions
risk_score — computed from multiple object properties

Computed properties stay current without pipeline reruns. They’re evaluated at query time, which means they reflect the latest state of their inputs.

When to precompute instead: If the computation is expensive (complex aggregation over millions of objects) or if you need the historical value (what was the risk score on January 15th, not what is it now), precompute in the pipeline and store as a regular property.

Pipeline Architecture

Snapshot Sources, Incremental Transforms

Many external APIs don’t provide change feeds — they return the current state of the world. Don’t fight this. Accept snapshots at the source and let Foundry’s incremental transform framework handle change detection downstream.

flowchart LR
    api["External API\n(Full snapshot)"] --> raw["Raw Dataset\n(Snapshot)"]
    raw --> dedup["Dedup Transform\n(Incremental)"]
    dedup --> clean["Clean Transform\n(Incremental)"]
    clean --> enrich["Enrich Transform\n(Incremental)"]
    enrich --> ontology["Ontology\n(Backing Dataset)"]

The pattern:

Raw dataset — snapshot sync, stores exactly what the API returned. No transformation. This is your audit trail.
First transform — deduplication using natural keys. Outputs incrementally: only new or changed records propagate.
Subsequent transforms — cleaning, enrichment, joins. All incremental, all building on the dedup layer.

Why this works: The snapshot-to-incremental boundary is clean and debuggable. If something looks wrong in the cleaned data, you can inspect the raw snapshot. If the API changes its response format, only the first transform breaks — everything downstream is insulated.

Separate Ingestion from Logic

Keep your Data Connection sources and your transform logic in separate repositories (or at minimum, separate folders within a repo). Source configuration changes (new API keys, endpoint URLs, schedule changes) should never require touching transform code, and vice versa.

This also enables different change cadences. Sources might need emergency reconfiguration (API endpoint migrated). Transforms change when business logic changes. Coupling them means every source change requires regression testing transform logic.

Use JDBC for Databases, REST for APIs, Native Connectors for Everything Else

Foundry’s connector ecosystem has three tiers of maturity:

Tier	Connector Type	When to Use
Native	Gmail, S3, JDBC, Snowflake	First choice when available. Best integrated, most reliable.
REST	Generic REST API source	When no native connector exists. Configure auth, pagination, response parsing.
External Transform	Python/Java code in Foundry	When the source requires custom logic (IMAP, FTP, websocket, complex auth flows).

Don’t build external transforms when a REST connector will do. The REST connector handles retries, pagination, credential management, and scheduling out of the box. External transforms are powerful but require you to handle all of that yourself.

Integration Patterns

External API → Foundry (Ingestion)

The standard pattern for pulling data from an external API:

Data Connection Source — configure the API as a REST source with auth credentials stored as Foundry secrets
Sync Schedule — set appropriate cadence (real-time via streaming, or batch via scheduled sync)
Raw Landing Dataset — JSON responses land unmodified
Transform Chain — parse, clean, deduplicate, enrich
Ontology Backing — cleaned dataset backs an object type

Auth patterns:

API Key — simplest, store as source secret
OAuth2 Client Credentials — for service-to-service, Foundry handles token refresh
OAuth2 Authorization Code — for user-delegated access, requires interactive setup

Foundry → External (OSDK)

The Ontology SDK enables external applications to query Foundry’s ontology and call AIP Logic functions. The key architectural decisions:

Service principal vs. user-delegated auth. For automated systems (backends, agents, scheduled jobs), use OAuth2 confidential client (service principal). The application has its own identity and permissions. For interactive applications where actions should be attributed to specific users, use authorization code flow.

Read-only by default. Start with read-only OSDK access. Add write capabilities (Actions) only when you have a clear need and have validated the authorization model. An OSDK client that can modify ontology objects is a write path that bypasses Foundry’s UI-based review workflows.

Cache ontology queries, not Logic results. Ontology object queries against static or slowly-changing data can be cached aggressively. AIP Logic function results should generally not be cached — they may incorporate real-time data or model state that changes between calls.

Bidirectional Sync Patterns

When you need data flowing both directions (external system ↔ Foundry), resist the urge to build a single bidirectional connector. Instead:

Ingestion path: External → Foundry via Data Connection (source of truth for external data)
Export path: Foundry → External via OSDK or data export (source of truth for Foundry-enriched data)
Conflict resolution: Pick one system as authoritative per field. Don’t merge.

AIP Logic Patterns

When to Use Logic Functions vs. Direct API

Use AIP Logic when:

The query requires natural language interpretation (“what’s unusual about this month’s spending?”)
The analysis involves pattern detection that’s hard to express as structured filters
You need the function to compose multiple ontology queries dynamically
The output benefits from natural language explanation alongside structured data

Use direct ontology queries (OSDK) when:

The query parameters are known and structured (“get transactions between date X and Y for account Z”)
You need deterministic, reproducible results
Performance matters — Logic functions have higher latency than direct queries
You’re building a UI that maps user inputs directly to query parameters

The hybrid approach: Expose both. Let structured tools handle known queries efficiently, and route open-ended questions to Logic functions. This gives you the best of both worlds — speed for common operations, flexibility for ad hoc analysis.

Design Logic Functions Around Capabilities, Not Screens

Don’t create a Logic function for each UI view or dashboard panel. Create functions around analytical capabilities:

analyze_trends(entity_type, time_range) — not get_dashboard_chart_1_data()
detect_anomalies(dataset, sensitivity) — not get_alerts_page_data()
answer_question(question, context) — not search_bar_handler()

Capability-oriented functions compose well. A single analyze_trends function serves the dashboard, the mobile app, the agent, and the API. A screen-specific function serves one consumer and needs a sibling for every new consumer.

Logic Function Error Handling

AIP Logic functions can fail in ways that structured queries can’t — the underlying model might misinterpret the question, produce a hallucinated answer, or time out on a complex query. Design your Logic function consumers to handle:

Timeout — Logic functions have execution time limits. Complex questions over large datasets may hit them.
Low-confidence results — if the function returns a confidence score, set a threshold below which you surface the uncertainty to the user
Fallback to structured query — if a Logic function fails, can you decompose the question into structured queries that approximate the answer?

Data Quality Patterns

Validation Transforms

Insert a validation transform between raw ingestion and the rest of your pipeline. This transform:

Schema validation — expected fields present, correct types
Value validation — amounts are positive, dates are parseable, required fields non-null
Referential integrity — foreign keys (account IDs, category codes) resolve to known values
Anomaly flagging — statistical outliers flagged but not rejected (let downstream logic decide)

Output two datasets: validated (clean records) and quarantine (records that failed validation with failure reasons). Never silently drop records — quarantine them and alert.

Reconciliation Transforms

When you have overlapping data sources (e.g., bank transactions from an API and email receipts from a connector), build explicit reconciliation:

Match — join on natural keys (transaction ID, amount + date + merchant)
Merge — combine properties from both sources, with explicit precedence rules
Unmatched tracking — surface records that exist in one source but not the other

The reconciliation transform is where you encode your trust hierarchy: which source wins when they disagree on a property value? Document these decisions as comments in the transform code — they’re business rules, not implementation details.

Freshness Monitoring

Don’t assume pipelines run. Build freshness checks:

Expected sync cadence — if a source syncs daily, alert when the latest record is >36 hours old
Row count monitoring — if a daily sync typically produces 50-200 records and today’s produced 0, that’s a signal
Schema drift detection — if the source API adds or removes fields, catch it early

These checks can be simple transforms that output a health status dataset, consumed by an alerting mechanism. The key insight: pipeline monitoring is itself a pipeline.

How to Think About Foundry

It’s an Ontology Platform, Not a Database

The most important mental shift: Foundry’s primary abstraction is the ontology, not the dataset. Datasets are the storage layer. The ontology is the semantic layer — it defines what things mean, how they relate, and what you can do with them.

When designing a Foundry solution, start with the ontology: What are the entities? What are their properties? How do they relate? What actions can users take? Then work backward to what datasets need to exist to back those objects.

If you start with datasets and try to stretch an ontology over them after the fact, you’ll end up with object types that mirror table schemas rather than representing business concepts.

Pipelines Are Data Products

Each transform chain should produce a data product — a dataset that has a clear consumer, a defined schema, a quality contract, and an owner. If you can’t name who consumes a dataset and what they use it for, question whether it should exist.

This mindset prevents the common failure mode of “transform sprawl” — dozens of intermediate datasets that nobody owns, nobody monitors, and nobody is sure whether they’re still needed.

Ontology Actions Are Your Write API

If external systems need to write to Foundry, route writes through ontology Actions, not direct dataset modifications. Actions provide:

Validation — business rules enforced before the write lands
Audit trail — who did what, when, through which Action
Permissions — fine-grained control over who can execute which Actions
Side effects — notifications, state transitions, dependent updates

Direct dataset writes bypass all of this. They’re appropriate for pipeline-managed datasets (where the pipeline is the write path), but external systems should always go through the Action layer.