Prompt Engineering 101: Tasks & The Hierarchy of Focus
This is the first lesson in a series built from my learnings as a prompt engineer covering how to understand tasks and draw LLM’s attention to specifics.
Welcome to Hasmo’s Point students, I am Soham, also a student.
Firstly, I won’t assume the stature of an all-knowing teacher because I am also new to the field, and every day, I learn something new so, if anything, think of me as an AI Teaching Assistant, and hopefully my intelligence isn’t artificial. I have experience as a Data Scientist and analyst and have recently taken on becoming a full-time prompt engineer for a health-tech startup, where I develop, test, deploy, and manage multiple enterprise prompt-based systems.
Secondly, the topics we cover will be reduced to their simplest so even if you aren’t from a tech background, you can grasp all the topics we cover (I hope!).
So, if you are getting started with working with LLMs for personal work or are building production-level flows with multiple GenAI nodes, this first lesson will help you get your bearings on how to approach prompting problem-solving.
This will be a set of lessons covering both practical examples of prompting and the theory of problem-solving approaches to building professionally and accurately with LLMs.
Introduction
I’ll get to the juicy stuff but first, let me tell you a small story. I promise it’s related. So — many years ago, I was in the 11th grade and as part of my school curriculum, I was required to complete 120 hours of “community service” for a total of 240 across 11th and 12th. So begrudgingly, I took on tutoring a student in math for free. I wasn’t amazing at math but I was decent and my student had just begun the 6th grade and was struggling with the sudden demands of introduction to algebra when no one had spent time brushing up her multiplications and divisions. So across the hour twice a week we spent, I spent almost 30 minutes of her time going over basics before even getting to algebra.
Now she wasn’t gifted in mathematics; in fact, I think she was owed a little more than she was given. So week after week, I did my best to teach, pulling up my old rigorous sheets from Kumon (a salute to my fellow Kumon veterans) to help strengthen her basics. But it was to no avail. Then she stopped coming regularly, often making me waste time in school waiting for her. I had been trying my best but honestly, I lacked maturity in dealing with someone who couldn’t grasp the complexity of mathematics where instruction was highly conditional and there wasn’t enough time to explain to her why each step in solving algebra behaved as it did. I will admit I was curt with her at times because I saw zero effort from her, so I was comfortable saying “She’s just a bad student”. Now, years later I see how delusional and inept I was as a teacher. My techniques were not the issue, it was my attitude. I refused to understand her limitations and iterate on my teaching style so she could understand it better.
The reason I am sharing this story before our first lessons is because working with LLMs to get them to follow your instructions reminds me vividly of those days. The frustration is still the same but my approach to it has shifted. So if there is any takeaway from this, it is that no matter the issues with the LLMs, if you are not seeing the results you want, it is time to step back and assess your methods of instruction, or reassess your architectural flow. The buck begins and stops with you.
With that fruitful deviation sorted, let's begin Lesson 1:
Lesson 1: Understanding LLM Capability in Tasks
LLMs aren’t intelligent so I dislike the word “Artificial Intelligence”. They are at best, 10²⁰ World Series Poker players with impeccable ability to guess packed into a multidimensional neural net. There is a joke among ML engineers and researchers that if you lift the hood, everything is just regression, and it essentially is.
Pre-lesson Assignment
I am setting this video as an introduction to LLMs so that we can all get on the same page about how these systems function. If you want to understand how Convolutional Neural Networks, Transformers, and Vector Encoders process and generate text, this video excellently breaks down these concepts into digestible language for all backgrounds. I will provide definitions for all tough words I use but ensure you have watched it beforehand so you don’t miss anything.
Defining Instruction
Whether it's ChatGPT, Gemini, Claude, MetaAI, or even Grok, they are first and foremost generation machines and so the fundamental axiom of LLMs is that they cannot say no. What that means is, that if your instruction looks like this, it is unreliable:
INPUT DATA
ID: {}
USER NAME: {}
…
USER PROMPT
[x] = "What is the meaning of life?"
SYSTEM PROMPT
IF [x] is (a simple question):
[respond plainly]
ELSE determine if more information is needed to answer [x] by [asking specific questions]
This is because the model will be highly unpredictable in executing the second part of this instruction. After all, it doesn’t understand that it needs more information to answer [x] reliably. It's designed to generate an answer regardless of whether that answer is worth giving. It may work most of the time but when you are building prompt for large-scale production, the same principles of error rates and accuracy matter as in other engineering. So hence we need to define a task as cleanly as possible for LLMs to reliably execute. As such, we have to take a step back and understand the types of tasks LLMs are good at and which they are not.
Tasks
There are mainly 3 types of tasks that LLMs excel at and should be used for. These are Generation, Parsing, and Classification. You may think there are other tasks but almost any task you can think of that an LLM is capable of falls under these three. Most overarching tasks will involve more than 1 and likely all 3 but learning to separate the 3 is extremely necessary so let's dive deep into each.
Generation
This is the bread and butter of LLMs and what most people use them for, from college essays, emails, and grifters who use them to pass stories off as their own (as a writer myself, this hurts to see).
Although most believe this is their strongest capability, it is often their weakest when it comes to quality. Your one email or paragraph of an essay may not stand out like a sore thumb but when you generate 100+ outputs on the same prompt, you start to notice patterns that once seen, become unignorable. So how do you solve staleness in novel text generation? We’ll get to that in a future lesson.
Parsing
This is the second strongest ability of LLMs because this is how you narrow down the “hierarchy of focus”. Yes, I said the phrase I put the title, but let’s understand all the tasks before we get to that. Parsing is the concept of getting LLMs to extract salient details from a passage of text or image, allowing them to gain access to more data to help them execute the prompt. Coupling Generation with Parsing gives you composite tasks like Translation, Document Review, and email replies.
Classification
This. This is the magnum opus of LLM functionality. You may be thinking that classifiers have existed for years before transformers and LLMs, and you may be right, but none are as adept at NLP (Natural Language Processing) classification as LLMs.
What LLMs are uniquely good at is reading a passage of text and mathematically sorting that data at a multidimensional scale. You no longer have to rely on the “bag of words” approach or simplistic sentiment classifiers. You can now provide a clear set of instructions for sorting, a passage of text, and classifying any data based on contextual importance, sentiment in intention as well as outcome, and customized thresholding. In simple words, it can reliably classify at a more complex level than ever before, albeit with a few caveats.
Now let’s put this learning together with an example:
Hypothetical Scenario 1.0
You are building an AI agent that is supposed to help solve customer issues. You need this agent to not just respond to the user’s question based on available solutions but to account for that specific user; whether they are a paid user or a free one, a high activity or a dormant one, and whether they are repeatedly facing issues or this is a new instance for them. So you take the learnings from your testing on ChatGPT, and you decide to write a long and detailed prompt, telling the LLM everything you know about how to be a good customer service agent.
You push this new prompt to production, with all user data ready to be piped into the prompt, confident that this will solve all the company's customer service problems at a fraction of the cost.
A week in, you are called into the CEO's office because your costs are down but your resolution rate has plummeted. Customers are posting 1-star reviews on the app store and you are flummoxed as to why your automated agent isn’t working even though you gave it all the data and instructions to do its job correctly. The single most critical mistake most people make when creating LLM agents at scale is not accounting for…
The Hierarchy of Focus
The issue is that requiring this agent to handle so many diverging tasks at once causes it to suffer from one of the most common problems with large prompts called the “needle in the haystack” problem: for a detailed understanding, review this great piece on the topic. In simplicity, the “needle in the haystack” test was used to understand how older NLP models and now LLMs behave when provided with large contexts and how adept they are at picking out or responding to a needle (a specific question or vital piece of information) from a haystack of context.
This is where, what I’ve coined“The Hierarchy of Focus” (HoF), becomes important. At the basic level, The Hierarchy of Focus refers to breaking down any task into its constituent parts comprised of the 3 tasks listed above while clearly defining the input data for each part to draw focus only to the context that is necessary for that part of do its job. This enables you to do two things:
ONE. Utilize existing prompt engineering techniques like Chain-of-Thought, Zero-Shot, etc. for specific sections of the task.
TWO. Control each part of the task flow better and more accurately since you can change and test specific sections to ensure you are getting the expected output.
It's vital to fragment the larger task into smaller prompts handling no more than 2 of the 3 fundamental tasks in each step to ensure control and scalability. But focusing on the Hierarchy of Focus doesn’t come without its caveats; mainly: Latency and Juggling
Latency
Splitting the task into its constituents requires more robust data flow and engineering because you have to define and refine what data is flowing into each node of the system. LLMs like most older ML models, struggle from a structural issue of overfitting. Meaning giving too much data to a prompt and not requiring all of it to complete its task will lead to unpredictable outputs and poor quality as it tries to fit all of its context into completing the task. So when you have 5 prompt nodes in your task flow, if you don’t define and refine the input data, the Time to First Token (TTFT) will be significantly higher than with a single prompt as each prompt is an API call.
This is a tug-of-war between quicker answers and better answers will be ongoing even if the LLMs get significantly better because of the underlying architecture of LLMs is inherently monodirectionally linear, as in it can only take in input and produce an output. It doens’t have the capability to do completely robust logic calculations yet, which doesn’t seem to be changing any time in the near future. So how do we solve this? Well here is where the hierarchy part of The Hierarchy of Focus comes into play. You must establish a hierarchy, like in any other software flow of which tasks will be your bottleneck and which ones can be simplified for quicker outputs. In addition, in this Hierarchy, we can define which tasks can be performed in parallel to further mitigate latency. I will get to how exactly to design this in a future lesson but for now, just make sure you understand the Hierarchy of Focus.
Juggling
The other caveat of HoF is that any time you change a prompt or a code logic in your flow, you will have to ensure that the rest of the flow is still working adequately. I refer to this as juggling because imagine suddenly changing the size of only one ball while a juggler is performing: they will have to quickly adjust to the new dimensions of the ball while maintaining all balls aloft.
With LLMs when you change one part of the flow, because these are indeterministic systems, any change in the step before will require re-testing and validation for every subsequent step in the flow. This isn’t as much of an unavoidable problem but rather something you have to account for when building such systems. Having a robust testing criteria and test set is crucial when fragmenting tasks into individual nodes.
Hypothetical Scenario 2.0
Now that you are more adept at building LLM-based systems, you sit down and redesign your flow. You decide to use PROMPT 1 to only read all of the user’s interaction data with your platform and extract only necessary info for that specific issue and then have PROMPT 2 generate the customer-facing output while taking the data from P1 and answer the issue with a personalized solution.
In this example above, you are breaking down the task of Responding to the user with a personalized solution into two separate multi-stage tasks. P1 = is a composite task of Parsing and Classification because it has to read previous interactions and then classify by selecting only the necessary data. P2 therefore is a only Generation task. P2 would not be considered a Parsing task because the extracting necessary data is being done by P1.
Suddenly, your system is performing better. Your boss asks you what you did differently, and you respond confidently with “I broke the tasks into manageable chunks” and feel such an answer is enough for him. But your engineers need to know that you defined a hierarchy of focus, fragmented the process based on fundamental tasks, defined and refined the input data into each node, and leveraged chain-of-thought prompting to achieve better results.
In this new design not only are you keeping the individual nodes of the flow to 2 or fewer fundamental tasks, but also giving yourself the freedom to interrupt the flow and assess the outcomes of P1 and P2 separately which would not be possible within the black-box of a single prompt.
Conclusion
Well we are a the end of the first lesson. Though most of this was a theory heavy lesson, I felt getting these fundamentals down would be necessary before we delve into prompting techniques, evaluations and recursive loop prompting. All in due time.
Overall, it is crucial to understand how to define any given process as a subset of the 3 fundamental tasks: Generation, Parsing and Classification. Once that is defined, utilize The Hierarchy of Focus to group tasks into composite tasks and do the following: define and refine input data for each composite task or node, manage latency and juggling, and make your process more manageable.
In the next lesson, we will begin by learning prompting techniques and writing our first production level prompt.
If you made it this far, thank you and I hope you found this helpful! Please feel free to comment, provide feedback or ask any question. See you next time.