Python SDK Introduction
Integrate and control your data transformation workflows with the Lume Python SDK
⚠️ Private SDK Documentation - This documentation is for customers with private SDK access. Some features and capabilities may vary based on your agreement.
The Lume Python SDK is your control plane for orchestrating data transformation workflows. It provides the essential tools to trigger, monitor, and manage your pipelines directly from your Python applications, delegating the heavy lifting of data processing to the managed Lume platform.
A New Paradigm for Data Transformation
Lume separates the ‘what’ from the ‘how’. You define the logic of your transformation—the schema, mapping rules, and validation—in a Flow Version within the Lume UI. The SDK is then used to orchestrate the end-to-end pipeline for a specific batch of data.
This approach allows you to:
- Centralize Transformation Logic: Maintain a single source of truth for your business logic in the Lume UI.
- Decouple Data Sources: Use Lume Connectors to securely sync data from sources like S3 or PostgreSQL without changing your application code.
- Simplify Operations: Your Python code remains clean and focused on workflow orchestration, not complex data syncs or transformations.
Core Capabilities
- Pipeline Orchestration: Trigger a full sync-transform-sync pipeline with a single function call.
- Asynchronous Monitoring: Track run status via polling or secure webhooks for event-driven workflows.
- Rich Metadata: Programmatically access detailed metrics, validation results, and performance data.
- Connector-Based Architecture: Works seamlessly with a growing library of connectors for object storage and databases.
- Secure & Scalable: Built on a secure, multi-tenant architecture that scales automatically to meet your processing demands.
Example: Triggering a Database Workflow
This simple example shows how to execute a pre-configured flow that processes a batch of data in a database. The flow_version
already knows its source and target connections.
Pro Tip: Use Webhooks for Production
While run.wait()
is great for simple scripts and getting started, we strongly recommend using Webhooks for production applications. They are more scalable and efficient than continuous polling.
Common Use Cases
- Automated Data Pipelines: Trigger transformations as new data arrives in your data lake or warehouse.
- Orchestration Tool Integration: Embed Lume into workflows managed by tools like Airflow, Prefect, or Dagster.
- Event-Driven Processing: Launch transformations in response to events from microservices or message queues.
Ready to build your first workflow? Head to the Quickstart guide.