Production Best Practices
Guidance for building robust, production-grade data pipelines with Lume
This guide provides best practices for integrating the Lume Python SDK into production environments, focusing on security, reliability, and monitoring.
Security: A Secure-by-Design Architecture
Lume’s architecture is fundamentally designed to minimize access to your sensitive systems. The Sync-Transform-Sync model ensures that the core transformation engine never has direct, standing access to your production data stores.
The Principle of Least Privilege
- Transactional Access: Lume’s access to your systems is transactional and short-lived. The Connectors only require temporary permissions to perform a specific data synchronization task. They connect, ingest the necessary data, and disconnect.
- Isolated Transformation: The actual data transformation process runs in a completely isolated Lume environment that has no network path to your infrastructure.
- Reduced Attack Surface: This model dramatically reduces the attack surface and simplifies security audits. Instead of granting broad permissions to a third-party service, you only need to authorize highly-scoped, temporary data sync operations.
Managing API Keys
Never hardcode your Lume API key in your source code. Use a secure mechanism to manage it.
-
Environment Variables: The SDK automatically detects the
LUME_API_KEY
environment variable. This is the recommended approach for most environments. -
Secrets Management Systems: For production systems, use a secrets manager like AWS Secrets Manager, Google Secret Manager, or HashiCorp Vault. Your application should fetch the key at runtime.
Orchestration: Integrating with Airflow
The Lume SDK is a lightweight client, making it perfect for orchestration tools like Airflow. The role of your DAG is to prepare the data batch and trigger the run.
This example shows an Airflow DAG that processes daily sales data.
Asynchronous Processing with Webhooks
For high-throughput, event-driven workflows, polling for run status is inefficient. Lume supports webhooks to notify your application as soon as a run completes.
You can configure a webhook endpoint for your Flow in the Lume UI. When a run finishes, Lume will send a POST
request to your URL with a payload summarizing the outcome.
For a complete, runnable example of a webhook-driven application, see the Webhook-Driven Application Example.
Webhook Payload Schema
The webhook payload contains the essential information about the completed run.
You can use the run_id
to call lume.run_status()
to retrieve the full, detailed metadata object if needed.
Securing Your Webhook Endpoint
To ensure that incoming webhook requests are genuinely from Lume, you must verify their signature.
- Secret Key: When you configure a webhook in the Lume UI, a unique secret key is generated. You must store this key securely in your application’s environment.
- Signature Header: Every webhook request from Lume includes a
X-Lume-Signature-256
header. This signature is an HMAC-SHA256 hash of the request body, created using your secret key. - Verification: In your application, you must compute the same HMAC-SHA256 hash of the received request body using your stored secret key. If your computed signature matches the one in the header, the request is authentic.
Rejecting requests with invalid signatures is critical to protect your system from forged payloads.
Idempotency and Reliability
To prevent processing the same data twice, ensure your source_path
is unique and deterministic for each batch of data.
- For time-based batches, include the timestamp in the
source_path
. Example:daily_invoices/2024-07-30
. - For event-driven workflows, use a unique identifier from the triggering event (e.g., a message ID or a transaction ID).
If you attempt to create a run with a source_path
that has already been successfully processed for a given Flow Version, the Lume platform will reject the request with an InvalidRequestError
, preventing duplicate pipeline executions.
To bypass this for a legitimate reprocessing, use the force_rerun=True
parameter in your lume.run()
call.
Monitoring and Alerting
Integrate Lume monitoring into your existing observability stack.
- Check the Final Status: After a run completes, always check the terminal
status
(SUCCEEDED
,PARTIAL_FAILED
,FAILED
). - Alert on Failures: Configure alerts in your orchestration tool (like Airflow’s
on_failure_callback
) or monitoring system to trigger when a run enters aFAILED
orCRASHED
state. - Log Key Information: In your application logs, always include the
run.id
,flow_version
, andsource_path
. This makes debugging much easier.
SDK Initialization
The SDK requires your Lume API key to authenticate. The recommended approach is to set the LUME_API_KEY
environment variable.
The SDK will automatically detect this key. If you need to manage keys for multiple environments (e.g., development, staging, production), we recommend using a secret management tool like HashiCorp Vault, AWS Secrets Manager, or Doppler, and injecting the appropriate key as an environment variable at runtime.
For cases where environment variables are not feasible, you can initialize the SDK programmatically. See lume.init()
for details.