Advanced Topics
Techniques for building complex, production-grade workflows with Lume
This guide covers advanced patterns and techniques for users building sophisticated, high-reliability data pipelines with the Lume Python SDK.
Controlling Run Behavior at Runtime
While most configuration lives in the Flow Version, you can override certain settings at runtime for specific, one-off tasks.
Forcing a Rerun (Idempotency Bypass)
By default, Lume prevents re-processing the same source_path
for a given Flow Version to ensure idempotency. For disaster recovery or reprocessing corrected data, you can bypass this check. See the Idempotency and Reliability section for more details on how Lume prevents duplicate runs.
Use with caution: Forcing a rerun can lead to duplicate data in your target system if not managed carefully.
Advanced Run Monitoring
For production systems, a simple time.sleep()
loop is not ideal. A better approach is to use an exponential backoff strategy for polling to reduce network load and handle transient API issues gracefully.
Understanding Run Metadata
The run.metadata
attribute contains a rich, structured object detailing the outcome of a completed run. It is essential for building robust monitoring, alerting, and automated downstream workflows.
For the complete, detailed schema of this object, see the LumeRun
Class Reference.
Handling Partial Failures
A PARTIAL_FAILED
status is not necessarily an error condition. It means some data was processed successfully, while some was rejected. This is a common outcome in production pipelines dealing with messy real-world data.
Your workflow should be designed to handle this state gracefully.
This pattern ensures that you maximize the value of successfully processed data while isolating problematic records for further investigation.