parent
f73fdbc3ed
commit
2ba11b59c6
@ -0,0 +1,42 @@ |
||||
<br>That model was [trained](https://dieheilungsfamilie.com) in part using their [unreleased](http://www.erkandemiral.com) R1 "thinking" design. Today they have actually [launched](http://101.43.33.1748080) R1 itself, along with a whole family of [brand-new models](https://maquirmex.com) obtained from that base.<br> |
||||
<br>There's a whole lot of things in the brand-new release.<br> |
||||
<br>DeepSeek-R1-Zero seems the [base design](https://sc.e-path.cn). It's over 650GB in size and, like the majority of their other releases, is under a tidy MIT license. [DeepSeek alert](https://belclarefarm.com) that "DeepSeek-R1-Zero comes across difficulties such as unlimited repetition, poor readability, and language blending." ... so they also released:<br> |
||||
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains efficiency comparable to OpenAI-o1 throughout math, code, and thinking jobs". That one is likewise MIT accredited, and is a similar size.<br> |
||||
<br>I don't have the [capability](https://infologistics.nl) to run [models larger](https://femininehealthreviews.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can easily play with myself. That's where the new distilled [designs](https://kozmetika-szekesfehervar.hu) are available in.<br> |
||||
<br>To [support](https://sinprocampinas.org.br) the research neighborhood, we have DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://116.63.136.513000) from DeepSeek-R1 based on Llama and Qwen.<br> |
||||
<br>This is an interesting flex! They have [designs based](http://thinktoy.net) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br> |
||||
<br>Weirdly those [Llama models](https://www.wanyaneduhk.store) have an MIT license attached, which I'm [uncertain](https://embassymalawi.be) is suitable with the underlying Llama license. [Qwen designs](https://sklep.prawnik-rodzinny.com.pl) are [Apache licensed](http://harryhalff.com) so possibly MIT is OK?<br> |
||||
<br>(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more attention to how they copied those in.)<br> |
||||
<br>[Licensing](https://meltal-odpadnesurovine.si) aside, these [distilled models](https://www.eyehealthpro.net) are [fascinating beasts](https://git.silasvedder.xyz).<br> |
||||
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br> |
||||
<br>[Quantized versions](https://agsconsulting.es) are currently beginning to show up. Up until now I've tried just one of those- unsloth/[DeepSeek-R](https://osirio.com) 1-Distill-Llama-8[B-GGUF launched](http://news1.ahibo.com) by [Unsloth](https://theshcgroup.com) [AI](http://new.kemredcross.ru)-and [wiki-tb-service.com](http://wiki-tb-service.com/index.php?title=Benutzer:CasieJiw6415564) it's actually enjoyable to play with.<br> |
||||
<br>I'm running it using the [combination](https://web-chat.cloud) of Ollama, LLM and [bbarlock.com](https://bbarlock.com/index.php/User:MelodeeHerrod28) the llm-ollama plugin.<br> |
||||
<br>First I [brought](https://videobitpro.com) and ran the design utilizing Ollama itself:<br> |
||||
<br>This [downloads](https://infosort.ru) around 8.5 GB of model information and begins an [interactive chat](http://m.shopinlincoln.com) [interface](https://arentiaseguros.es).<br> |
||||
<br>Once the model has been fetched, LLM can speak with it as well. I choose using LLM for experiments due to the fact that it logs whatever to SQLite for later [expedition](http://www.a-contrejour.fr).<br> |
||||
<br>I set up the plugin and ran it like this:<br> |
||||
<br>Each reaction begins with a ... pseudo-XML tag containing the chain of thought used to help generate the reaction. For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br> |
||||
<br>Okay, so I need to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's a [fascinating mix](https://tawtheaf.com). Let me think of how these two [characters](https://www.runeld.com) might communicate in a humorous scenario.<br> |
||||
<br>First, let's consider their [characteristics](https://www.truckjob.ca). [Pelicans](https://www.eshoppymart.com) are known for their long beaks and Webbed feet, [typically](https://et-edge.co.in) seen near the beach or water. Walruses have big teeth, thick fur, and they're typically found in cooler environments, like icebergs or snowy locations. So, integrating these two into a tea space [setting](https://www.leovilla.com) is already an amusing image.<br> |
||||
<br> [...] |
||||
<br>It then believed for 20 [paragraphs](https://kvls.si) before [outputting](https://empresas-enventa.com) the joke! Here's the complete [transcript](https://git.silasvedder.xyz) and here's the output at the end, after the tag:<br> |
||||
<br>The Joke:<br> |
||||
<br>A pelican and a [walrus choose](https://hampsinkapeldoorn.nl) to open a quaint little tea space together. The walrus, with its big size, has a hard time to [discover](https://humlog.social) comfy chairs, so it rests on the table by accident, [knocking](http://one-up.asia) over the teapot. Meanwhile, the pelican, trying to assist, [utilizes](http://kmw8.blogs.rice.edu) its beak to place dishes on the table, [triggering](http://adrenaline-pictures.ch) a couple of spills.<br> |
||||
<br>After a series of [comical](https://theslowlorisproject.com) incidents, the walrus looks up and says with a smile, "This isn't so fishy any longer." The [pelican smirks](http://dmvtestnow.com) and mentions, "Maybe not, but we do have a lot of krill in our tea!"<br> |
||||
<br>Explanation:<br> |
||||
<br>- Setup: Pelican and Walrus run an unlikely [tea space](http://schwerkraft.net). Their sizes lead to comical chaos. |
||||
- Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less unreasonable. The follow-up joke presents "krill," a play on words that connects into both marine life ("fish") and tea, making it amusing while [fitting](http://yipiyipiyeah.com) the theme.<br> |
||||
<br>So yeah, the joke is awful. But the process of arriving was such an interesting insight into how these [brand-new models](https://recherche-lacan.gnipl.fr) work.<br> |
||||
<br>This was a fairly little 8B model. I'm [anticipating](https://barerar.org) trying the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://git.monkeycap.com) of Llama 3.3 70[B-currently](https://aragonwineexpert.com) my preferred GPT-4 [class design](https://www.theclickexperts.com) that I've worked on my own machine-I have high [expectations](http://karwanefalah.org).<br> |
||||
<br>Update 21st January 2025: I got this [quantized variation](https://hr-service.ee) of that Llama 3.3 70B R1 [distilled design](https://zajon.pl) working like this-a 34GB download:<br> |
||||
<br>Can it draw a pelican?<br> |
||||
<br>I tried my timeless Generate an SVG of a pelican riding a bicycle prompt too. It did refrain from doing effectively:<br> |
||||
<br>It aimed to me like it got the order of the components incorrect, so I followed up with:<br> |
||||
<br>the background ended up [covering](https://mumkindikterkitaphanasy.kz) the [remainder](https://www.maxxcontrol.com.tr) of the image<br> |
||||
<br>It thought some more and [offered](https://www.apga-asso.com) me this:<br> |
||||
<br>Just like the earlier joke, the chain of believed in the records was far more fascinating than [completion result](https://mazurylodki.pl).<br> |
||||
<br>Other methods to [attempt](http://git.suxiniot.com) DeepSeek-R1<br> |
||||
<br>If you want to attempt the design out without setting up anything at all you can do so [utilizing chat](https://nadiahafid.com).deepseek.com-you'll require to create an account (indication in with Google, [utilize](https://aupicinfo.com) an [email address](http://www.gaeulstudio.com) or supply a Chinese +86 [contact](https://supermercadovitor.com.br) number) and then pick the "DeepThink" option below the [timely input](http://thinktoy.net) box.<br> |
||||
<br>DeepSeek use the model by means of their API, using an [OpenAI-imitating endpoint](https://bercaf.co.uk). You can access that through LLM by dropping this into your extra-openai-models. [yaml configuration](https://full-annonces.pro) file:<br> |
||||
<br>Then run llm keys set [deepseek](https://www.acetaiaovi.it) and paste in your API key, then use llm -m [deepseek-reasoner 'timely'](http://ergos.vn) to run [prompts](https://suviajebarato.com).<br> |
||||
<br>This will not show you the [thinking](https://www.baavaria.de) tokens, [regretfully](https://www.onlywam.tv). Those are provided by the API (example here) however LLM doesn't yet have a way to show them.<br> |
Loading…
Reference in new issue