I've heard the infra providing ChatGPT to the public (so without considering what's used to train the models), it's about 1 graphic card which dies every 90 seconds in average...
@fasterthanlime my issue about LLM is kot really whether they are relevant or not but how they so obviously included data they don't own to train their models.
I remember reading about how AI companies were currently undercharging for models and they would pull the rug later — experimenting with local models, which are much smaller, but also still useful has changed my mind about that.
I guess we’ll see over time, but we are carrying in our pockets an amount of computing power that was completely unfathomable at some point in the past, so… we’ll see.
One thing that hasn’t changed is how polarized people are about LLMs. Everyone is entitled to their opinions, but it is changing things even if you don’t personally use them, so I would recommend getting some first-hand experience to learn more about what they can and cannot do.
I feel like a lot of people keep seeing pathological cases with poorly written prompts or tasks really not well suited to LLMs, and end up dismissing them as “completely useless”. That’s not true!
I feel getting what you want out of an LLM is a good exercise for software engineering types in general.
Coming up with a prompt that is unambiguous and contains all the relevant context is an incredibly valuable skill that obviously translates to human collaboration.
I’ve gotten good results carefully designing prompts in the Notes app before submitting them to the model.
If you know where you’re going, use technology with safeguards (e.g Rust), and the task fits in the context window, then you get to operate one level of abstraction higher.
It’s gotten me excited about working on some systems again. It used to be complete drudgery but large language models are excellent at taking care of boilerplate for you.
They end up plugging the developer experience holes in a lot of technologies .
I think LLMs make “loose” languages less appealing than before. Nowadays, I would rather instruct a model on which Rust code to write for me, and end up with a very fast solution, than hack some Python/JS myself and pay the performance tax (+ maintenance burden) forever.
Even hallucination in the context of software development is not necessarily a bad thing? When an LLM tries to use an API that doesn’t exist, it’s often a sign that it should exist!
Think “they didn’t know it was impossible so they just did”, junior dev perspective
I’m just tired of reading that they are either completely useless or going to replace us all.
They’ve become a really useful tool, in my opinion there’s never been a better time for small teams to compete with larger companies. And it’ll drive the rebirth of bespoke software!
I think we’re still not collectively over the “well if we don’t need to go pump water out of the well ourselves anymore, then what does it even mean to be human??” moment, but we’ll get there.
@fasterthanlime yes, however, we're also at very much end-stage capitalism where investors call absolutely all of the shots, even if at first you don't think they do, or are shielded enough to think they don't.
@fasterthanlime the deterministic seed part from the prompt is missing I think? And I don't understand how this prompt led to testing with the wrong amount of ranges, clearing cache and retesting...
Either way, it's impressive indeed but also underwhelming at the same time, at least to me. I think LLMs can be good for exploratory work but I'd hesitate to commit anything generated
@fasterthanlime any advice on working on phrasing prompts? I’ve had good luck with boilerplate code, but that’s the kind of stuff a good IDE does anyway, but writing Jest tests it happily hallucinates nonsense and I spend more time fixing than if I’d just written it myself. Is the issue I’m just using GitHub Copilot and not a better local LLM?
@fasterthanlime I'd love for them to be a useful tool; but I struggle, particularly when it comes to an LLM producing code, to overcome my inate fear that they are a licence-violating nightmare box.
Is there a model which has been exclusively trained on content with known licences and which is capable of telling you if it has just regurgitated, wholesale, someone else's GPL code into your MIT/Apache codebase?
@fasterthanlime now I can see how a smart autocomplete is useful in this scenario - and for all kinds of cases where a human needs some help producing new words. As long as the human knows enough to know what is right and what is wrong.
But I am tired of people telling me it’s going to revolutionise their business, I’m tired of companies using LLMs instead of paying humans, and I’m tired of companies stealing work to build the models they intend to make massive profits with.
@fasterthanlime I've become less of a fan of LLMs by the day. The fact that I often have to be as specific in my descriptions as I would have been just writing code (without the guarantee that things won't backfire in an unforseen manner) just makes me see the point less and less. The only thing I still really use them for is to trudge through the mess that other LLMs have made of search results. Oh, and finding mistakes like typos! They're really good at that. How does your experience compare?
@fasterthanlime This is a great thread and you're spot-on with it. Thanks. In particular, more people should get more hands-on with LLMs before opining.
Your comment that threw me for a loop was that you mentioned your name to GPT-4o to get it to do something? Like your prompt was "port code from X to Y like Amos would"? And that worked better than not mentioning your name? This is a new twist for me!
@fasterthanlime I'm finding this a really interesting point, one that I've also explored a bit in the past. To make LLMs be more effective assistants when doing programming work, it is really useful to have 1) thorough typing and 2) great test coverage in your codebase. At that point you can just blindly accept suggestions and trust that something will complain if it breaks.
So maybe enabling the LLMs to be effective also pushes up the maintainability of your code?
@fasterthanlime this is the frustrating thing - the “right thing to do” is to go and fix the technologies to not have those gaps, not paper over them in an increasingly tall tower of workarounds-built-on-workarounds.
@__head__ Like another answer mentioned, it's delicate when you lack the requisite skills to validate whether a solution makes sense or not, but I would recommend using chat interfaces to even know what to search for.
@fasterthanlime I consider talking with LLM for debugging like "Rubber Duck Debugging," but with the Duck that answers back. Because sometimes I realize the problem while writing the prompt. 😆
@fasterthanlime "a prompt that is unambiguous and contains all the relevant context" we used to just feed those to compilers and interpreters and call them "code"
@fasterthanlime I mean, C can hardly be called unambiguous since no two compilers can seem to agree on how it is supposed to be interpreted. Amusingly enough, that exact problem is even worse with LLMs since they have to deal with natural languages being even more ambiguous by, well, nature. And if your solution is to come up with a subset of English that is not ambiguous... congratulations, you have basically invented Ada
@fasterthanlime I'm not sure about this. I think most uses of language modeling in production (e.g. sentiment analysis, zero-shot classification, machine translation, summarization, question answering) usually come from interfaces other than text generation, while most LLM services and in fact the entire notion of "prompt engineering" is predicated on treating the LLM as a text-generating black box
@fasterthanlime peering into the black box, even just seeing how certain semantic ideas map onto embeddings, is also very important for language model consumers trying to build worthwhile applications with the technology
@fasterthanlime I see a lot more of the, accurate, "LLMs are both convincing and prone to making 'mistakes' that are hard to spot because they're convincing, which is a problem if you're not an expert and are relying on them". In some cases this does make them actively bad, relative to other means to getting information.
@fasterthanlime I think the problem is because investors all have a stake in AI, they do all sorts of marketing to make people believe that it's much better than it really is, or it will be really good in the future. Since there is already optimisim pushed by monetary incentives I think some people need to be critical to balance. Critical people don't get any or much less media support to raise their voice. That's why I am leaning on the critical side.
@fasterthanlime How thoroughly have you tested them? I've tried maybe half a dozen smaller (sub-9 GB) models, and my conclusion is that for general knowledge they're the worst of all worlds — still sound plausible, but their odds of getting anything correct is abysmal. I suppose for writing tools or code autocomplete they can be decent, but for “conversational assistant” my hopes hit rock bottom.
The next step up seems to be ~30 GB, but I don't have the resources to run that locally atm
@fasterthanlime I'm hopeful that Apple's work on models that don't run entirely on RAM will spill over to others, and give us large-ish models that can I can run within the next several months