Imitation learning offers an efficient framework for robotic skill acquisition. However, current methods struggle with accurate action-target association, limited generalization, and long-horizon tasks with sequential constraints. To address these, we propose Task-DP3, a goal-focused framework for sequential multi-object manipulation. It includes a goal-conditioned point cloud sampler that extracts target-centric point clouds from segmentation masks, and a perception-driven skill scheduler that dynamically determines task states and plans skill sequences. This enables adaptive trajectory generation in response to environmental changes. Real-robot experiments show Task-DP3 achieves a 90 % success rate in multi-object tasks with only 30 demonstrations, outperforming state-of-the-art methods. It also demonstrates strong generalization to unseen clutter and backgrounds, and excels in long-horizon tasks with strict order constraints, proving highly suitable for real-world diffusion-based imitation learning.
When both a wooden cup and a plastic cup are present, the wooden cup is precisely placed onto the wooden saucer.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP
When both a toy duck and a toy chick are present, the yellow toy duck is grasped and immersed into water.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP
When three differently colored basketballs and a soccer ball are present, the soccer ball is placed into a cup.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP
When both a marker pen and a watercolor pen are present, the marker pen is inserted into a pen holder.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP
Three cups of different sizes are placed into a square box in descending order of size, followed by covering the box with its lid and fastening the leather buckle.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP
The cup holder is first placed on a tea tray, then a cup is positioned on the holder. A kettle is subsequently lifted to pour water into the cup, after which the kettle is placed back on the tray. Finally, the tea tray is pushed to the serving area.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP
Tea leaves are first poured into a teapot, followed by adding purified water into the teapot. Next, the lid is placed on the teapot, and finally the teapot is positioned on the tea stove.
Seen Background
Task-DP3(Ours)
Seen Background
iDP3
Seen Background
DP
Unseen Background
Task-DP3(Ours)
Unseen Background
iDP3
Unseen Background
DP
Unseen Distractors
Task-DP3(Ours)
Unseen Distractors
iDP3
Unseen Distractors
DP