Sensitive content

This media may contain sensitive content.

正在加载视频...

视频加载失败

Helpfulness ...

13,097 次观看 • 1 年前 •via X (Twitter)

5 条评论

Mr.கணேஷ் 的头像
Mr.கணேஷ்1 年前

அந்த மனசு தான் சார் கடவுள்

Solar Heavy 的头像
Solar Heavy1 年前

take the journey

Samynathan Sa 的头像
Samynathan Sa1 年前

Me & my friend

Sethuraman G 的头像
Sethuraman G1 年前

சமயத்லநானும்ங்க

RAMYA PANDIAN 的头像
RAMYA PANDIAN1 年前

இது என்ன படம் ரொம்ப நாளா கண்டுபிடிக்க முடியல....

相关视频

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

elvis

14,740 次观看 • 1 年前