We focus on the task of goal-oriented grasping, in which a robot is supposed to grasp pre-assigned goal object clutter and needs some pre-grasp actions such as pushes enable stable grasps. However, this task, gets positive rewards from environment only when successfully grasping object. Besides, joint pushing elongates action sequence, compounding problem reward delay. Thus, sample inefficiency...