Thought Preference Optimization