Why do you use Llama models? and what support do you need to build on Llama

29 points by diegoguerra96 10 months ago

I work at Meta's AI Partnerships team and we're trying to understand why startups use Llama models. Was wondering if I could pick the community's brain on the following topic

- what criteria did you follow that led you to choose Llama for your use case? - how important is fine-tuning small models for your strategy? - what are the major pain points of using Llama models that Meta can streamline? - what documentation needs would be most useful to help you build an open source AI stack with Llama?

vintermann 10 months ago

I'm not a start-up, but I want to use Llama and related model to transcribe historical handwritten documents, and if possible to extract structured data from them which aren't directly visible in a word for word transcription (many of the documents are forms).

I've tried many different models, but vision models are overwhelmingly oriented towards pictures rather than writing, and results aren't good.

aargh_aargh 10 months ago
I was pleasantly surprised by the "OCR" results MiniCPM-V 2.6 gives on any kind of text, including handwritten, given an image and trivial prompt. I'll be sure to keep an eye out on this family of models.
It's no replacement for OCR of printed text, of course, due to sometimes generating random text, but it looked very useful for handwritten text and all kinds of decorative fonts (e.g. "inspirational posters"). I imagine this could work:
```
  * if you're going to check the output manually or

  * somehow make it part of a pipeline where this model recognizes the rough layout of the page and to get reliable text you cut it up and run traditional OCR on the blocks or

  * somehow diff the VLM output and the OCR tool output
```
although keep in mind that MiniCPM-V can't identify pixel positions in the image like Gemini Pro here: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...

ivorbuk 10 months ago

We use llama models as the comparative cost, at the time of decision making (9 months ago?), was cheaper++ then closed source models and comparative to other models available through bedrock (we use via aws bedrock).

Strong reasoning at the time was also to, at some point, take this on-prem/self-host - for privacy. More a comfort blanket for some of our customers/partners and a future requirement rather than a right now thing.

From a capability perspective it's everything we need - tho we are not taxing or pushing any boundaries... Our use cases are mainly processing/structuring/summarising incoming text and we run a few agents doing a variety of stuff. We have a bit of technical jargon (motorsports engineering related) and we saw good performance with llama over our previous use of Mistral/Claude/OpenAI.

omneity 10 months ago

We develop tooling for low-resource languages for millions of underserved people, and Llama and other resources for Meta have been vital for us.

We picked Llama among the models we use for its ubiquity in the ecosystem and because many practitioners are getting quite familiar with it and its performance profile, strengths and limitations.

The major challenge we have is that given the speculative nature of our work (some of our research is on the frontier), we have to conduct multiple experiments in parallel (finetunes).

That has been proven prohibitive cost and compute wise which is bottlenecking our progress. An official finetuning platform from Meta or some other form of support from Meta that would help alleviate the impact of compute requirements would help us fly and deliver concrete results much faster.

Are you considering open a program for projects downstream of Llama?

rkwz 10 months ago

Strengths of open-weight models:

* Privacy: Can use sensitive data, can be used in companies/industries that have strict regulations

* Static model: Can "pin" the model version. Don't need to worry about underlying model being changed but having the same name

DreamGen 10 months ago

Why I use Llama:

- Ability to self host. This unlocks few things: (1) Customized serving stack with various logit processors, etc. (2) More cost efficient inference.

- Ability to fine tune. Most stock instruct models are quite lame at AI story-writing and role-play and produce slop.

There aren't really any pain points specific to Llama, but if we are creating a wish list:

- Keep the pre-training data diverse. There is a worrying trend where some companies apply heavy handed filtering on the pre-training data that's not just based on quality, but also on content. Quality based filtering is understandable and desirable, but please, keep the pre-training dataset diverse :)

- Efficient inference. Open source is way behind closed source here. TensorRT-LLM is probably the most efficient from what's out there, but it's mostly closed source. Maybe Meta could contribute to some of the open source projects like vLLM (or maybe something lower level...).

- A lot of the improvements we saw recently came from post-training, post-SFT improvements. And it's not just the datasets (which clearly you can't just release), but also algorithms -- and most labs are quite secretive about the details here. The open-source community relies on DPO a lot (and more recently, KTO), since it's easy, but empirically it's not that great.

diegoguerra96 10 months ago

Love to see the use cases that the community has been sharing. Meta wants Llama to address some of the worlds’ most pressing challenges, including improving education, environment, and health care outcomes. This link: https://www.llama.com/llama-impact-grants/ includes a calendar of events (hackathons, workshops, trainings) around the world and links to register, and it would be great to see you apply!

greatgib 10 months ago

On my side, the not really free open weight license is what is preventing me to use Llama for prototype and small projects for companies.

garciasn 10 months ago

Availability to run within our own environment to avoid data being transited to third parties in direct violation of various client agreements/desires.

cammeron_burton 10 months ago

The flexibility of Llama models is crucial, for startups in my opinion! Have you thought about taking a product focused strategy, for assistance ?

diegoguerra96 10 months ago

What do you mean by this?

freedomben 10 months ago

I'm the CTO of Ameelio, a non-profit tech startup that is taking on the very broken for-profit model for incarcerated communications. Incarcerated people are routinely price gouged for call time, and the current model eliminated any competition and allows the sole provider to charge whatever they want. It's not unusual to see someone pay a 25 cent connection fee and then 15 cents a minute after that. We have built basically a Zoom or Google Meet for the Corrections space, and we give it freely to inmates and their family members. It's only a matter of time before the for-profit providers are forced to compete in a free(er) market or go out of business, and either way we'll be growing and expanding our service as much as possible.

We are using Llama for some small things now, but have plans to increase it in our stack, particularly as it's capabilities increase. Here are some of the things we're doing:

1. A better "keyword" analysis: Many facilities use and want the ability to check for certain types of comms. For example, if someone is talking about committing suicide or other self-harm, or is discussing harming another person, some intervention could literally save their life. Historically keyword analysis was about the best that could be done, but thanks to Llama we're able to get a deeper insight into it. This is better for accuracy to reduce false positives, and it's also better for privacy because the improved accuracy results in less need for intrusion.

2. Privacy Preserving analysis: The incarcerated space comes with constant invasions of a person's privacy, and while that is not going away, things can be improved. At this point I've met a lot of DoC people, and the majority of them have no interest in invading a person's privacy. Their concerns are about security and safety. This is an area where AI can be of great assistance, but only if the AI itself is privacy respecting. This is where Llama shines very bright for us. Because we can self-host it, we can be very confident about where the data is going and what it is used for.

3. Using it for organizational productivity: We have OpenWeb UI and Ollama deployed internally for our team, and they are able to use Llama and other models to accomplish things, whether it be asking ChatGPT style questions, getting coding help, or anything else. The ability to run different models with different strengths depending on the application is huge.

4. Using it for document analysis: For example, we frequently deal with massive documents containing regulations (such as FCC documents), contracts, and other long but sensitive things that Llama can greatly assist us with understanding. A great RAG setup is very helpful (though still somewhat challenging. That could be an area where Meta could help!)

I know this can kick up a firestorm, but less "safe" models would be very helpful. Some of the things we need to understand better (like suicide in the example above) frequently trigger "safety" things which ironically make everybody a lot less safe IRL. We've gotten better at working around these, but having to workaround it in the first place feels immensely silly at best and deeply frustrating at worst. I understand the significant challenge you're facing here, and I know it's very difficult (impossible IMHO) to find a perfect balance, but at least in my experience we're quite too far on the "safe" side of things and it would great to move more toward a reasonable center. For what it's worth, Llama is in good company. Compared to OpenAI Llama's safety causes more friction, but compared to Gemini's ridiculous over-safety, Llama is a breath of fresh air :-)

In closing, thank you so much for Llama and for all the work you've done and continue to do! I believe open source is extremely important, and without Meta's generosity, I believe we would only have toys to play with. Meta's overall commitment to open source and understanding and support of open systems is becoming increasingly vital in my opinion. Being frank, for many years there I had some pretty negative feelings about Meta/Facebook. However, that has really turned around and I am very grateful that Meta is around!

LifeOverIP 10 months ago

Thank you for writing up the details of your impactful work. I hope Meta pick this up gives you all tons of exposure.
diegoguerra96 10 months ago

Thank you so much for sharing this detailed writeup! I'll try to find you on LinkedIn!
lioeters 10 months ago

This is the most interesting comment in the thread, with concrete technical details of how Llama is being used in a startup. I found it informative. Not sure why the downvotes, maybe they think the comment itself was written by Llama. :o

faangguyindia 10 months ago

I don't use Llama, because we've no hardware to run this on.

We use Gemini Flash and recommend this for 90% of tasks.

Dirt cheap, blazingly fast.

Dickie101 10 months ago

1) Quality 2) COST 3) Speed

segmondy 10 months ago

privacy, 'open', free?

yt1d 10 months ago

[dead]

shadkhan 10 months ago

[dead]