Why do you use Llama models? and what support do you need to build on Llama
I work at Meta's AI Partnerships team and we're trying to understand why startups use Llama models. Was wondering if I could pick the community's brain on the following topic
- what criteria did you follow that led you to choose Llama for your use case? - how important is fine-tuning small models for your strategy? - what are the major pain points of using Llama models that Meta can streamline? - what documentation needs would be most useful to help you build an open source AI stack with Llama?
I'm not a start-up, but I want to use Llama and related model to transcribe historical handwritten documents, and if possible to extract structured data from them which aren't directly visible in a word for word transcription (many of the documents are forms).
I've tried many different models, but vision models are overwhelmingly oriented towards pictures rather than writing, and results aren't good.
I was pleasantly surprised by the "OCR" results MiniCPM-V 2.6 gives on any kind of text, including handwritten, given an image and trivial prompt. I'll be sure to keep an eye out on this family of models.
It's no replacement for OCR of printed text, of course, due to sometimes generating random text, but it looked very useful for handwritten text and all kinds of decorative fonts (e.g. "inspirational posters"). I imagine this could work:
although keep in mind that MiniCPM-V can't identify pixel positions in the image like Gemini Pro here: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...We use llama models as the comparative cost, at the time of decision making (9 months ago?), was cheaper++ then closed source models and comparative to other models available through bedrock (we use via aws bedrock).
Strong reasoning at the time was also to, at some point, take this on-prem/self-host - for privacy. More a comfort blanket for some of our customers/partners and a future requirement rather than a right now thing.
From a capability perspective it's everything we need - tho we are not taxing or pushing any boundaries... Our use cases are mainly processing/structuring/summarising incoming text and we run a few agents doing a variety of stuff. We have a bit of technical jargon (motorsports engineering related) and we saw good performance with llama over our previous use of Mistral/Claude/OpenAI.
We develop tooling for low-resource languages for millions of underserved people, and Llama and other resources for Meta have been vital for us.
We picked Llama among the models we use for its ubiquity in the ecosystem and because many practitioners are getting quite familiar with it and its performance profile, strengths and limitations.
The major challenge we have is that given the speculative nature of our work (some of our research is on the frontier), we have to conduct multiple experiments in parallel (finetunes).
That has been proven prohibitive cost and compute wise which is bottlenecking our progress. An official finetuning platform from Meta or some other form of support from Meta that would help alleviate the impact of compute requirements would help us fly and deliver concrete results much faster.
Are you considering open a program for projects downstream of Llama?
Strengths of open-weight models:
* Privacy: Can use sensitive data, can be used in companies/industries that have strict regulations
* Static model: Can "pin" the model version. Don't need to worry about underlying model being changed but having the same name
Why I use Llama:
- Ability to self host. This unlocks few things: (1) Customized serving stack with various logit processors, etc. (2) More cost efficient inference.
- Ability to fine tune. Most stock instruct models are quite lame at AI story-writing and role-play and produce slop.
There aren't really any pain points specific to Llama, but if we are creating a wish list:
- Keep the pre-training data diverse. There is a worrying trend where some companies apply heavy handed filtering on the pre-training data that's not just based on quality, but also on content. Quality based filtering is understandable and desirable, but please, keep the pre-training dataset diverse :)
- Efficient inference. Open source is way behind closed source here. TensorRT-LLM is probably the most efficient from what's out there, but it's mostly closed source. Maybe Meta could contribute to some of the open source projects like vLLM (or maybe something lower level...).
- A lot of the improvements we saw recently came from post-training, post-SFT improvements. And it's not just the datasets (which clearly you can't just release), but also algorithms -- and most labs are quite secretive about the details here. The open-source community relies on DPO a lot (and more recently, KTO), since it's easy, but empirically it's not that great.
Love to see the use cases that the community has been sharing. Meta wants Llama to address some of the worlds’ most pressing challenges, including improving education, environment, and health care outcomes. This link: https://www.llama.com/llama-impact-grants/ includes a calendar of events (hackathons, workshops, trainings) around the world and links to register, and it would be great to see you apply!
On my side, the not really free open weight license is what is preventing me to use Llama for prototype and small projects for companies.
Availability to run within our own environment to avoid data being transited to third parties in direct violation of various client agreements/desires.
The flexibility of Llama models is crucial, for startups in my opinion! Have you thought about taking a product focused strategy, for assistance ?
What do you mean by this?
I'm the CTO of Ameelio, a non-profit tech startup that is taking on the very broken for-profit model for incarcerated communications. Incarcerated people are routinely price gouged for call time, and the current model eliminated any competition and allows the sole provider to charge whatever they want. It's not unusual to see someone pay a 25 cent connection fee and then 15 cents a minute after that. We have built basically a Zoom or Google Meet for the Corrections space, and we give it freely to inmates and their family members. It's only a matter of time before the for-profit providers are forced to compete in a free(er) market or go out of business, and either way we'll be growing and expanding our service as much as possible.
We are using Llama for some small things now, but have plans to increase it in our stack, particularly as it's capabilities increase. Here are some of the things we're doing:
1. A better "keyword" analysis: Many facilities use and want the ability to check for certain types of comms. For example, if someone is talking about committing suicide or other self-harm, or is discussing harming another person, some intervention could literally save their life. Historically keyword analysis was about the best that could be done, but thanks to Llama we're able to get a deeper insight into it. This is better for accuracy to reduce false positives, and it's also better for privacy because the improved accuracy results in less need for intrusion.
2. Privacy Preserving analysis: The incarcerated space comes with constant invasions of a person's privacy, and while that is not going away, things can be improved. At this point I've met a lot of DoC people, and the majority of them have no interest in invading a person's privacy. Their concerns are about security and safety. This is an area where AI can be of great assistance, but only if the AI itself is privacy respecting. This is where Llama shines very bright for us. Because we can self-host it, we can be very confident about where the data is going and what it is used for.
3. Using it for organizational productivity: We have OpenWeb UI and Ollama deployed internally for our team, and they are able to use Llama and other models to accomplish things, whether it be asking ChatGPT style questions, getting coding help, or anything else. The ability to run different models with different strengths depending on the application is huge.
4. Using it for document analysis: For example, we frequently deal with massive documents containing regulations (such as FCC documents), contracts, and other long but sensitive things that Llama can greatly assist us with understanding. A great RAG setup is very helpful (though still somewhat challenging. That could be an area where Meta could help!)
I know this can kick up a firestorm, but less "safe" models would be very helpful. Some of the things we need to understand better (like suicide in the example above) frequently trigger "safety" things which ironically make everybody a lot less safe IRL. We've gotten better at working around these, but having to workaround it in the first place feels immensely silly at best and deeply frustrating at worst. I understand the significant challenge you're facing here, and I know it's very difficult (impossible IMHO) to find a perfect balance, but at least in my experience we're quite too far on the "safe" side of things and it would great to move more toward a reasonable center. For what it's worth, Llama is in good company. Compared to OpenAI Llama's safety causes more friction, but compared to Gemini's ridiculous over-safety, Llama is a breath of fresh air :-)
In closing, thank you so much for Llama and for all the work you've done and continue to do! I believe open source is extremely important, and without Meta's generosity, I believe we would only have toys to play with. Meta's overall commitment to open source and understanding and support of open systems is becoming increasingly vital in my opinion. Being frank, for many years there I had some pretty negative feelings about Meta/Facebook. However, that has really turned around and I am very grateful that Meta is around!
Thank you for writing up the details of your impactful work. I hope Meta pick this up gives you all tons of exposure.
Thank you so much for sharing this detailed writeup! I'll try to find you on LinkedIn!
This is the most interesting comment in the thread, with concrete technical details of how Llama is being used in a startup. I found it informative. Not sure why the downvotes, maybe they think the comment itself was written by Llama. :o
I don't use Llama, because we've no hardware to run this on.
We use Gemini Flash and recommend this for 90% of tasks.
Dirt cheap, blazingly fast.
1) Quality 2) COST 3) Speed
privacy, 'open', free?
[dead]
[dead]