LLM Experiments -- Yi-Coder: A Small but Mighty LLM for Code

Yi-Coder: A Small but Mighty LLM for Code

https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md

Claude 3.5 Sonnet still holds the LLM crown for code which I'll use when wanting to check the output of the best LLM, however my Continue Dev, Aider and Claude Dev plugins are currently configured to use DeepSeek Coder V2 236B (and local ollama DeepSeek Coder V2 for tab completions) as it offers the best value at $0.14M/$0.28M which sits just below Claude 3.5 Sonnet on Aider's leaderboard ^[1] whilst being 43x cheaper.
^[1] https://aider.chat/docs/leaderboards/
-- mythz Reply

DeepSeek sounds really good, but the terms/privacy policy look a bit sketch (e.g. grant full license to use/reproduce inputs and outputs). Is there anywhere feasible to spin up the 240B model for a similarly cheap price in private?
The following quotes from a reddit comment here https://www.reddit.com/r/LocalLLaMA/comments/1dkgjqg/comment...
> under International Data Transfers (in the Privacy Policy):
""" The personal information we collect from you may be stored on a server located outside of the country where you live. We store the information we collect in secure servers located in the People's Republic of China . """
> under How We Share Your Information > Our Corporate Group (in the Privacy Policy):
""" The Services are supported by certain entities within our corporate group. These entities process Information You Provide, and Automatically Collected Information for us, as necessary to provide certain functions, such as storage, content delivery, security, research and development, analytics, customer and technical support, and content moderation. """
> under How We Use Your Information (in the Privacy Policy):
""" Carry out data analysis, research and investigations, and test the Services to ensure its stability and security; """
> under 4.Intellectual Property (in the Terms):
""" 4.3 By using our Services, you hereby grant us an unconditional, irrevocable, non-exclusive, royalty-free, sublicensable, transferable, perpetual and worldwide licence, to the extent permitted by local law, to reproduce, use, modify your Inputs and Outputs in connection with the provision of the Services. """
-- dsp_person Reply

It's a 236B MoE model with only 21B active parameters that ollama is reporting having 258k downloads ^[1] (for 16/236 combined) whilst Hugging Face says was downloaded 37k times last month ^[2], which can run at 25 tok/s on a single M2 Ultra ^[3].
At $0.14M/$0.28M it's a no brainier to use their APIs. I understand some people would have privacy concerns and would want to avoid their APIs, although I personally spend all my time contributing to publicly available OSS code bases so I'm happy for any OSS LLM to use any of our code bases to improve their LLM and hopefully also improving the generated code for anyone using our libraries.
Since many LLM orgs are looking to build proprietary moats around their LLMs to maintain their artificially high prices, I'll personally make an effort to use the best OSS LLMs available first (i.e. from DeepSeek, Meta, Qwen or Mistral AI) since they're bringing down the cost of LLMs and aiming to render the technology a commodity.
^[1] https://ollama.com/library/deepseek-coder-v2
^[2] https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-In...
^[3] https://x.com/awnihannun/status/1814045712512090281
-- mythz Reply

There’s no company info on DeepSeek’s website. Looking at the above, and considering that, it seems very sketchy indeed.
Maybe OK for trying out stuff, a big no no for real work.
-- yumraj Reply

> There’s no company info on DeepSeek’s website.
It's backed solely by a hedge fund who do not want to draw attention to their business. So yeah, as sketchy as DESRES.
-- rfoo Reply

Might be good for contributing to open source projects. But not for clients' projects.
-- dotancohen Reply

I'm making a small calendar renderer for e-ink screens (https://github.com/skorokithakis/calumny) which Claude basically wrote all of, so I figured I'd try DeepSeek. I had it add a small circle to the left of the "current day" line, which it added fine, but it couldn't solve the problem of the circle not being shown over another element. It tried and tried, to no avail, until I switched to Claude, which fixed the problem immediately.
43x cheaper is good, but my time is also worth money, and it unfortunately doesn't bode well for me that it's stumped by the first problem I throw at it.
-- stavros Reply

> Continue pretrained on 2.4 Trillion high-quality tokens over 52 major programming languages.
I'm still waiting for a model that's highly specialised for a single language only - and either a lot smaller than these jack of all trades ones or VERY good at that specific language's nuances + libraries.
-- theshrike79 Reply

An unfortunate fact is, similar to human with infinite time, LLMs usually have better performance on your specific langauge when they are not limited to learn or over-sample one single language. Not unlike the common saying "learning to code in Haskell makes you a better C++ programmer".
Of course, this is far from trivial, you don't just add more data and expect it to automatically be better for everything. So is time management for us mere mortals.
-- rfoo Reply

Unclear how much of their coding knowledge is in the space of syntax/semantics of a given language and how much in the latent space that generalizes across languages and logic in general. If I were to guess I'd say 80% is in the latter for the larger capable models. Even very small models (like in Karpathy's famous RNN blog) will get syntax right but that is superficial knowledge.
-- imjonse Reply

Yep, been waiting for the same thing. Maybe at some point it’ll be possible to use a large multilingual model to translate the dataset into one programming language, then train a new smaller model on just that language?
-- karagenit Reply

Isn't microsoft phi specifically trained for Python? I recall that Phi 1 was advertised as a Python coding helper.
It's a small model trained only by quality sources (ie textbooks).
-- terminalcommand Reply

I’d be interested to know if that trade off ends up better. There’s probably a lot of useful training that transfers well between languages, so I wouldn’t be that surprised if the extra tokens helped across all languages. I would guess a top quality single language model would need to be very well supported, eg Python or JavaScript. Not, say, Clojure.
-- richardw Reply

If the LLM training makes the LLM generalize things between languages, then it is better to leave it like it is...
-- wiz21c Reply

I wonder what those 52 languages are.
-- kamphey Reply

According to the repo README:
'java', 'markdown', 'python', 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html', 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin', 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml', 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala', 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly', 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell', 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang', 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r', 'prolog', 'verilog'
https://github.com/01-ai/Yi-Coder
-- richardw Reply

They're playing a dangerous game if they assume that a single language or even family of similar languages is referred to by e.g. "assembly", "shell", "lisp".
(I also note that several of these are markup or config languages which are explicitly not for programming.)
-- Y_Y Reply

The difference between (A) software engineers reacting to AI models and systems for programming and (B) artists (whether it's painters, musicians or otherwise) reacting to AI models for generating images, music, etc. is very interesting.
I wonder what's the reason.
-- Palmik Reply

Is it really? I know people who love using LLMs, people who are allergic to the idea of even taking about AI usability and lots of others in between. Same with artists hating the idea, artists who spend hours crafting very specific things with SD, and many in between.
I'm not sure I can really point out a big difference here. Maybe the artists are more skewed towards not liking AI since they work with medium that's not digital in the first place, but the range of responses really feels close.
-- viraptor Reply

Coding Assistants are not good enough (yet). Inline suggestions and chats are incredibly helpful and boost productivity (and only to those who know to use them well), but that's as fast as they go today.
If they can take a Jira ticket, debug the code, create a patch for a large codebase and understand and respect all the workarounds in a legacy codebase, I would have a problem with it.
-- rty32 Reply

But that’s not that far. Like sure, currently it’s not. But "reading a ticket with a description, find the relevant code, understand the code (often better than human), test it, return the result" is totally doable with some more iterations.
It’s already doable for smaller projects, see GitHub workspaces etc.
-- mrklol Reply

Except they can't do the equivalent for art yet either, and I am fairly familiar with the state of image diffusion today.
I've commissioned tens of thousands of dollars in art, and spent thousands of hours working with Stable Diffusion, Midjourney, and Flux. What all the generators are missing is intentionality in art.
They can generate something that looks great at surface level, but doesn't make sense when you look at the details. Why is a particular character wearing a certain bracelet? Why do the windows on that cottage look a certain way? What does the engraving mean in this elaborate knife design I am generating?
The diffusers do not understand what they are generating, so they just generates what "looks right." Often this results in art that looks pretty but has no deeper logic, world building, meaning, etc.
And of course, image generators cannot handle the client-artist relationship as well (even LLMs cannot), because it requires an understanding of what the customer wants and what emotion they want to convey with the piece they're commissioning.
So - I rely on artists for art I care about, and image generators for throwaway work (like weekly D&D campaign images.)
-- xvector Reply

Because code either works or it doesn't. Nobody is replacing our entire income stream with an LLM.
You also need a knowledge of code to instruct an LLM to generate decent code, and even then it's not always perfect.
Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence
Any layperson can describe what they want a picture to look like so the barrier to entry and successful exit is a lot lower for LLM image generation than for LLM code generation.
-- suprjami Reply

It would be good if LLMs were somehow packaged in an easy way/format for us "novice" (ok I mean lazy) users to try them out.
I'm not interested so much with the response time (anyone has a couple of spare A100s?), but it would be good to be able to try out different LLMs locally.
-- NKosmatos Reply

With Mozilla's llamafile you can run LLMs locally without installing anything: https://github.com/Mozilla-Ocho/llamafile
-- PhilippGille Reply

One Docker command if you don't mind waiting minutes for CPU-bound replies:
https://localai.io/
You can also use several GPU options, but they are not as easy to get working.
-- suprjami Reply

LM Studio is pretty good: https://lmstudio.ai/
-- senko Reply

This is already possible. There are various tools online you can find and use.
-- nusl Reply

You should try GPT4all. It seems to be exactly what you’re asking for.
-- hosteur Reply

I'm new to this whole area and feeling a bit lost. How are people setting up these small LLMs like Yi-Coder locally for tab completion? Does it work natively on VSCode?
Also for the cloud models apart from GitHub Copilot, what tools or steps are you all using to get them working on your projects? Any tips or resources would be super helpful!
-- mtrovo Reply

You can run this LLM on Ollama ^[0] and then use Continue ^[1] on VS Code.
The setup is pretty simple:
* Install Ollama (instructions for your OS on their website - for macOS, `brew install ollama`)
* Download the model: `ollama pull yi-coder`
* Install and configure Continue on VS Code (https://docs.continue.dev/walkthroughs/llama3.1 <- this is for Llama 3.1 but it should work by replacing the relevant bits)
^[0] https://ollama.com/
^[1] https://www.continue.dev/
-- cassianoleal Reply

If you have a project which supports OpenAI API keys, you can point it at a LocalAI instance:
https://localai.io/
This is easy to get "working" but difficult to configure for specific tasks due to docs being lacking or contradictory.
-- suprjami Reply

Is there an LLM that's useful for Terraform? Something that understands HCL and has been trained on the providers, I imagine.
-- cassianoleal Reply

Weird they're comparing it to really old deepseek v1 models, even v2 has been out a long time now.
-- smcleod Reply

My barely-informed guess is that they don't have the resources to run it (it's a 200b+ model).
-- bubblyworld Reply

[dead]
-- butterfly42069 Reply

Beats deepseek 33. That’s impressive
-- Havoc Reply

They used DeepSeek-Coder-33B-Instruct in comparisons, while DeepSeek-Coder-v2-Instruct (236B) and -Lite-Instruct (16B) are available since a while: https://github.com/deepseek-ai/DeepSeek-Coder-v2
EDIT: Granted, Yi-Coder 9B is still smaller than any of these.
-- tuukkah Reply

Are coding LLMs trained with the help of interpreters?
-- ziofill Reply

Google's Gemini does.
I can't find a post that I remember Google published just after all the ChatGPT SQL generation hype happened, but it felt like they were trying to counter that hype by explaining that most complex LLM-generated code snippets won't actually run or work, and that they were putting a code-evaluation step after the LLM for Bard.
(A bit like why did they never put an old fashioned rules-based grammar checker check stage in google translate results?)
Fast forward to today and it seems it's a normal step for Gemini etc https://ai.google.dev/gemini-api/docs/code-execution?lang=py...
-- willvarfar Reply

That's interesting! Where it says that is will "learn iteratively from the results until it arrives at a final output" I assume it's therefore trying multiple LLM generations until it finds one that works, which I didn't know about before.
However, AFAIK it's only ever at inference time, an interpreter isn't included during LLM training? I wonder if it would be possible to fine tune a model for coding with an interpreter. Though if noone has done it yet there is presumably a good reason why not.
-- redeyedtreefrog Reply

> Though if noone has done it yet there is presumably a good reason why not.
The field is vast, moving quickly and there are more directions to explore than researchers working at top AI labs. There's lots of open doors that haven't been explored yet but that doesn't mean it's not worth it, it's just not done yet.
-- littlestymaar Reply