Skywork-OR1 Models: Fine-tuned Distillations Rather Than Original Architecture, Community Points Out

BigGo Editorial Team

Skywork-OR1 Models: Fine-tuned Distillations Rather Than Original Architecture, Community Points Out

The recent release of Skywork-OR1 (Open Reasoner 1) models has sparked significant discussion in the AI community, particularly regarding how these models are being presented versus their actual development approach. While the models show impressive performance on mathematical and coding benchmarks, community members have raised concerns about transparency in how the models are described and marketed.

Fine-tuning vs. Original Architecture

The Skywork-OR1 series, which includes Skywork-OR1-Math-7B, Skywork-OR1-32B-Preview, and Skywork-OR1-7B-Preview, has been promoted for its strong performance on benchmarks like AIME24, AIME25, and LiveCodeBench. However, community members have highlighted that these models are fine-tuned versions of existing models rather than entirely new architectures - a fact mentioned only at the bottom of Skywork's announcement.

Not to take away from their work but this shouldn't be buried at the bottom of the page - there's a gulf between completely new models and fine-tuning.

The models are built on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B, which themselves are distilled versions of other models. This layered approach to model development has prompted discussions about naming conventions and transparency in the AI research community. Some commenters noted that other companies like Meta explicitly require derivative works to include the original model name (such as Llama) at the beginning of new model names.


The GitHub repository for the Skywork-OR1 models, showcasing their code and structure, relevant to the fine-tuning versus original architecture discussion

Benchmark Relevance Questioned

Another point of contention in the community discussion centers around the benchmarks used to evaluate the models. Some users questioned the relevance of using AIME24 scores when the model was likely trained on that same dataset. One commenter pointed out that this is a universal problem in AI model evaluation, as most benchmark datasets eventually make their way into training data.

The significant drop in performance between AIME24 and AIME25 scores (for example, Skywork-OR1-Math-7B scoring 69.8 on AIME24 but only 52.3 on AIME25) seems to validate this concern, suggesting that the model performs better on data it has likely seen during training.


A line graph depicting the performance of Skywork-OR1-Math-7B on the AIME24 dataset, illustrating the concerns about benchmark relevance raised in the discussion

Local Model Performance Trade-offs

The discussion also touched on the broader topic of running AI models locally versus using cloud-based services. Community members shared their experiences with various local models, noting that while they can be faster for certain tasks, there's often a trade-off between speed, accuracy, and versatility.

For users with specific hardware constraints, such as limited GPU memory, choosing the right model becomes crucial. Several commenters mentioned that while there's no local equivalent that does everything kinda well like cloud-based models such as ChatGPT or Gemini, specialized models can excel at specific tasks like coding (with models like qwen 2.5 coder 32b being recommended).

Open Source Commitment

Despite the concerns raised, the community has responded positively to Skywork's commitment to open-sourcing their work. The company has promised to release not only the model weights but also their training data and code, though as of the announcement, some of these resources were still listed as Coming Soon.

This open approach could potentially address some of the transparency concerns raised by the community, allowing others to better understand how these models were developed and potentially build upon them further.

The Skywork-OR1 models represent an interesting case study in the evolving landscape of AI model development, where the lines between original research, distillation, and fine-tuning continue to blur. As these practices become more common, the AI community seems to be calling for clearer standards around how such work is presented and credited.

Reference: Skywork-OR1 (Open Reasoner 1)