Open Source

PSA: Please stop using nohurry/Opus-4.6-Reasoning-3000x-filtered

A popular 3K-prompt reasoning dataset on Hugging Face is being misused, its creator says.

Deep Dive

A viral post on the r/LocalLLaMA subreddit has revealed a case of mistaken dataset popularity in the open-source AI community. Hugging Face user 'nohurry' has issued a public service announcement requesting that developers and researchers stop using his filtered version of the 'Opus-4.6-Reasoning-3000x' dataset. The dataset, which contains 3,000 high-quality reasoning prompts, was originally created by user 'Crownelius' to help train and fine-tune language models. nohurry's version was a quick, temporary filter applied to Crownelius's initial release to remove model refusals, but it gained widespread traction on the Hugging Face platform.

The creator, nohurry, explains that Crownelius has since updated and filtered his original dataset, making the intermediary version obsolete. Despite this, nohurry's filtered dataset continues to be downloaded and used, diverting attention and potential support from the original creator. In his post, nohurry directs users to the canonical source and emphasizes the significant expense Crownelius incurred in creating the dataset, providing a Ko-fi link for donations. He is keeping his version online to avoid breaking existing project links but has updated the README to redirect users, highlighting the collaborative and ethical considerations within open-source AI development.

Key Points
  • Creator 'nohurry' requests deprecation of his filtered 'Opus-4.6-Reasoning-3000x' Hugging Face dataset.
  • The dataset was a temporary fix; original creator 'Crownelius' has since released a corrected version.
  • nohurry encourages users to switch sources and donate to Crownelius for his costly original work.

Why It Matters

Highlights the importance of proper attribution and supporting original creators in the fast-moving open-source AI ecosystem.