Depends on what you want to do. But my 2 cents are that like all new technology, LLMs will become a commodity. Which means that everybody uses them but few people are able to develop them from scratch. It's not different from other things like databases, GPU drivers, 3D engines for games, etc. That all involves a lot of hardcore computer science and math. But lots of people use these things without being hindered by such skills.
It probably helps a little to understand some of the internals and math. Just to get a feel for what the limitations are.
But your job as a software engineer is probably to stick things together and bang on them until they work. I sometimes describe what I do as being a glorified plumber. It requires skills but surprisingly few skills related to math and algorithms. That stuff comes in library form mostly.
So, get good at using LLMs and integrating what they do into agentic systems. Figure out APIs, limitations, and learn about different use cases. Because we'll all be doing a lot of work related to that in the next few years.
1. Learn basic NNs at a simple level, build from scratch (no frameworks) a feed forward neural network with back propagation to train against MNIST or something as simple. Understand every part of it. Just use your favorite programming language.
2. Learn (without having to implement with the code, or to understand the finer parts of the implementations) how the NN architectures work and why they work. What is an encoder-decoder? Why the first part produces an embedding? How a transformer works? What are the logits in the output of an LLM, and how sampling works? Why is attention of quadratic? What is Reinforcement Learning, Resnets, how do they work? Basically: you need a solid qualitative understanding of all that.
3. Learn the higher level layer, both from the POV of the open source models, so how to interface to llama.cpp / ollama / ..., how to set the context window, what is quantization and how it will affect performances/quality of output, and also, how to use popular provider APIs like DeepSeek, OpenAI, Anthropic, ... and what model is good for what.
4. Learn prompt engineering techniques that influence the qualtily of the output when using LLMs programmatically (as a bag of algorithms). This takes patience and practice.
5. Learn how to use AI effectively for coding. This is absolutely non-trivial, and a lot of good programmers are terrible LLMs users (and end believing LLMs are not useful for coding).
6. Don't get trapped into the idea that the news of the day (RAG, MCP, ...) is what you should spend all your energy. This is just some useful technology surrounded by a lot of hype of all the people that want to get rich with AI and understand they can't compete with the LLMs themselves. So they pump the part that can be kinda "productized". Never forget that the product is the neural network itself, for the most part.
Agreed with most of this except the last point. You are never going to make a foundational model, although you may contribute to one. Those foundational models are the product, yes, but if I could use an analogy: foundational models are like the state of the art 3D renderers in games. You still need to build the game. Some 3D renderers are used/licensed for many games.
Even the basic chat UI is a structure built around a foundational model; the model itself has no capability to maintain a chat thread. The model takes context and outputs a response, every time.
For more complex processes, you need to carefully curate what context to give the model and when. There are many applications where you can say "oh, chatgpt can analyze your business data and tell you how to optimize different processes", but good luck actually doing that. That requires complex prompts and sequences of LLM calls (or other ML models), mixed with well-defined tools that enable the AI to return a useful result.
This forms the basis of AI engineering - which is different from developing AI models - and this is what most software engineers will be doing in the next 5-10 years. This isn't some kind of hype that will die down as soon as the money gets spent, a la crypto. People will create agents that automate many processes, even within software development itself. This kind of utility is a no-brainer for anyone running a business, and hits deeply in consumer markets as well. Much of what OpenAI is currently working on is building agents around their own models to break into consumer markets.
I agree that instrumenting the model is useful in many contexts, but I don't believe it is something so unique to value Cursor such valuation, or all the attention it gets. If people say LLMs are going to be commodities (we will see) imagine the layer about RAG, tool usage, memory...
I see this a lot, but I think it's irrelevant. Even if this is a bubble, and even if (when?) it bursts, the underlying tech is not going anywhere. Just like the last dotcom bubble gave us FAANG+, so will this give us the next letters. Sure, agentsdotcom or flowsdotcom or ragdotcom might fail (likely IMO), but the stack is here to stay, and it's only gonna get better, cheaper, more integrated.
What is becoming increasingly clear, IMO, is that you have to spend some time with this. Prompting an LLM is like the old google-fu. You need to gain experience with it, to make the most out of it. Same with coding stacks. There are plenty of ways to use what's available now, as "tools". Play around, see what they can do for you now, see where it might lead. You don't need to buy into the hype, and some skepticism is warranted, but you shouldn't ignore the entire field either.
I come from a more traditional (PhD) ML/DL background. I wouldn't recommend getting into (1) because the field is incredibly saturated. We have hundreds of new, mostly low quality, papers each day. If you want to get into AI/ML on a more fundamental level now is probably the worst time in terms of competition. There are probably 100x more people in this field than there are jobs, and most of them have a stronger background than you if you are just starting out.
Looks like OP’s curiosity isn’t just about deep diving LLMs —he’s probably itching to dig into adjacent topics like RAG, AI pipelines, and all the other adjacent LLM rabbit holes.
I just wanted to second the previous comment, and this is even for adjacent fields. Also a PhD AI/ML grad, and so many of us are out of work at the moment that we'll happily settle for prompt engineering roles, let alone RAG etc., just to maintain appearances on CVs/eligibilty for possible future roles.
I’d recommend you simply follow your curiosity and not take this choice too seriously. If you’re simply doing this for career purposes, then the honest answer is that absolutely no one knows where these fields will go in the next couple years so I wouldn’t take anyone’s advice too seriously.
But as for my 2 cents, knowing machine learning has been valuable to me, but not anywhere near as valuable as knowing software dev. Machine learning problems are much more rare and often don’t have a high return on investment.
As an MLE I get a decent amount of LinkedIn messages. I think I got on someone’s list or something. I would bucket the companies into two groups:
1) Established companies (meta/google/uber) with lots of data and who want MLEs to make 0.1% improvements because each of those is worth millions.
2) Startups mostly proxying OpenAI calls.
The first group is definitely not hype. Their core business relies on ML and they don’t need hype for that to be true.
For the second group, it depends on the business model. The fact that you can make an API call doesn’t mean anything. What matters is solving a customer problem.
I also (selfishly) believe a lot of the second group will hire folks to train faster and more personalized models once their business models are proven.
Building AIs has always been there - it's a (fuzzy, continuous to its complement) way to engineer things. Now we have a boom over the development of some technologies (some next-layer NN implementations).
If you are considering whether the future will boost the demand to build AIs (i.e. for clients), we could say: probably so, given regained awareness. It may not be about LLMs - and it should not, at this stage (it can hit reputation - they can hardly be made reliable).
Follow the Classical Artificial Intelligence course, MIT 6.034, from Prof. Patrick Winston - as a first step.
If you're good at what you're doing right now and you enjoy it — why change? Some might argue that AI will eventually take your job, but I strongly doubt that.
If you're looking for something new because you are bored, go for it. I tried to wrap my head around the basics of LLMs and how they work under the hood. It’s not that complicated — I managed to understand it, wrote about it, shared it with others, and felt ready to go further in that direction. But the field moves fast. While I grasped the fundamentals, keeping up took a lot of effort. And as a self-taught “expert,” I’d never quite match an experienced data scientist.
So here I am — extensively using AI. It helps me work faster and has broadened my field of operation.
From my prespective it's a bubble, very similar to the dot com bubble. All businesses are integrating it into everything, often where it's unnecessary or just confusing.
But I believe that the value will come after the bubble is burst, and the companies which truly create value will survive, same as with webpages after the dot com bubble.
It's your choice, but it's definitely not ,,just another tool''.
Most of my LLMs made lots of mistakes, but Codex with $200 subscription changed my workflow totally, and now I'm having 40 pull requests/day merged.
Treat LLMs as interns, increase your test coverage with them to the point that they can't ruin your codebase and get really good at reviewing code and splitting tasks up to smaller digestible ones, and promote yourself as team leader.
What kind of tasks you give Codex?
I gave it an honest chance, but couldn’t get a single PR out of it. It would just continue to make mistakes. And even when it got close I asked it a minor tweak and it made things worse. I iterated 7 times on the same small problem.
You can review and approve 40 PR's a day from intern quality work?
My recommendation would be to use them as a tool to build applications. There's much more potential there, and it will be easier to get started as an engineer.
If you want to switch fields and work on LLM internals/fundamentals in a meaningful way, you'd probably want to become a research scientist at one of the big companies. This is pretty tough because that's almost always gated by a PhD requirement.
When I was in my postdoc (applied human genetics), my advisor's rule was that you needed to understand the tools you were using at a layer of abstraction below your interface with them.
For example, if we wanted to conduct an analysis with a new piece of software, it wasn't enough to run the software: we needed to be able to explain the theory behind it (basically, to be able to rewrite the tool).
From that standpoint, I think that even if you keep with #2, you might benefit from taking steps to gain the understanding from #1. It will help you understand the models' real advantages and disadvantages to help you decide how to incorporate them in #2.
> my advisor's rule was that you needed to understand the tools you were using at a layer of abstraction below your interface with them.
Very wise advice! And the more complex systems are, the more this is truly needed.
1/ There aren't many jobs in this space. There are still far more companies (and roles) that need 'full-stack development' than those focused on 'AI/LLM internals.' With low demand for AI internals and a high supply of talent—many people have earned data science certificates in AI hoping to land lucrative jobs at OpenAI, Anthropic, etc.—the bar for accessing these few roles is very high.
2/ The risk here is AI makes everyone good at full-stack. This means more competition for roles, less demand for roles (now 1 in-experienced engineer with AI, can output 1.5x the code an experience Senior engineer could do in 2020).
In the short/medium term, 2/ has the best risk/reward function. But 1/ is more future proof.
Another important question is where are you in your career? If you're 45 years old, I'd encourage you to switch into leadership roles for 2/. This wont be replaced by AI. If you're early in your career, it could make more sense to switch.
I posted a recent Show HN[1] detailing why I felt the need to understand the basics of what LLMs do, and how they do it. Even though I've no interest in building or directly training LLMs, I've learned the critical importance of preparing documentation for LLM training to try and stop AI models generating garbage code when working with my canvas library.
IMO, you're a woodworker, a craftsman that builds solid products. You've been using a hacksaw and hammer all these years, now someone invented a circular saw and drill and people can move a lot faster. And now even relatively previously inept people are able to do woodwork.
Do you need to understand how the circular saw and drill are made?
To continue with your analogy: maybe they don't need understand every detail, but they should know how they function, what safety precautions to take, and when it is a better/more useful tool compared to what they're currently using.
That doesn't mean knowing every single bit there is to know about it, but a basic understanding will go a long way in correctly using it.
Well its easy: dive into 1 and you will see if you like it and persist. I don’t think it’s a bubble - the benefits are obvious and immediate, and I don’t think there’s a single developer around the planet doing 2 and not using AI tools.
I believe you should do what you genuinely find interesting. Go for 1, dig into internals, read some papers, and see how it goes. Even if you decide not to get into ML/AI, learning how stuff works is always rewarding.
Option 2 for sure. Make use of them if you find them useful, or don't if you don't. Personally I find LLMs to be pretty much useless as a tool so I don't use them, but if you get use out of them then more power to you (just be careful that their inherent unreliability isn't costing you more effort than they save). I think you should in no way consider option 1 - this is very much a hype bubble that is going to burst sooner or later. How much later I can't say, but I don't see any way it doesn't happen. I certainly wouldn't advise anyone to hitch their career to a bubble like that.
To piggyback on this discussion, what do you all think about option 3:
Work for companies (as a consultant?) to help them implement LLMs/AI into their traditional processes?
I don’t think we should ever put “implement LLMs/AI” as the goal. Process transformation should be defined in terms of user or business goals (reduce turnaround time, reduce costs, improve customer experience, …). In the course of doing that the places where LLMs have a use will be apparent, but more often something a lot less clever will be the better solution.
You’ll find LLMs need for precision prompts at odds with business concepts and requirements. You’ll struggle to unravel decades of process that is little understood in its entirety in order to build a workflow for it. This is the current state of Enterprise AI/ML.
Focussing on the inner workings of them may well end up being a type of programming you don’t enjoy: endless tweaking of parameters and running experiments.
Learning to work with the outputs of them (which is what I do) can be much more rewarding. Building apps based around generative outputs, working with latency and token costs and rate limits as constraints, writing evals as much as you write tests, RAG systems and embeddings etc.