Building a Simple Code Generator with Hugging Face Transformers and CodeGen

Introduction

In today’s world, Code generation using AI is transforming how developers write software. With just a prompt, AI models generate code snippets in various languages. In this blog, we will explore how to create our code generation agent using Python. We will use “Salesforce/codegen-350M-mono” via the HuggingFace transformers library. This agent is beginner-friendly and hands-on approach to AI-assisted coding. Whether you are a beginner or a hobbyist, this agent will help you.

Key Features and Techniques

To create this agent, we use Salesforce’s codegen-350 M-mono model. Prompts must be given in Natural Language to generate code from plain English instructions. This agent employs the HuggingFace pipeline for easy integration. It also allows control over creativity in output. It sets max_new_tokens to manage output length and uses pad_token_id to avoid tokenizer warnings. Also, a simple command-line interface is used to enter prompts.

Pros and Cons

Pros	Cons
Lightweight model, fast and easy to load	350M model is small; quality varies.
Straightforward Python implementation	May require manual clean-up of generated code.
Real-time prompt-response time loop	Limited control over code structure
HuggingFace simplifies the model integration	Some prompts return irrelevant outputs

Advantages

The following are the advantages of this agent:

Speeds Up Coding: Quickly generate boilerplate or example code.
Educational Tool: Helps learn syntax and coding patterns.
Low Resource Needs: Runs on local machines without a GPU.
Customizable: You can modify parameters for different use-cases.
Open Source: Built with freely available tools and models.

Step-by-Step Process

1. Setup & Requirements

At first, install the required libraries as shown below.

pip install transformers accelerate torch

The transformers library including its modules AutoTokenizer and CodeGenForCausalLM by Hugging Face simplifies working with pre-trained models. The torch library is used becauseit is indirectiy needed by transformers.

2. Load the Model

model_name = "Salesforce/codegen-350M-mono"

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)

tokenizer.pad_token = tokenizer.eos_token

model = CodeGenForCausalLM.from_pretrained(model_name, trust_remote_code=True)

These lines load a pretrained code generation model from Hugging Face called "Salesforce/codegen-350M-mono". First, the tokenizer is loaded, which converts text into token IDs and vice versa. Since the model lacks a padding token, we set pad_token to be the same as the end-of-sequence token (eos_token) to avoid errors during tokenization. Finally, the CodeGenForCausalLM model itself is loaded for generating code from text prompts

3. Code Generation Function

def generate_code(prompt, max_tokens=512, temperature=0.7):

inputs = tokenizer(prompt, return_tensors="pt", padding=True)

outputs = model.generate( **inputs, max_new_tokens=max_tokens, do_sample=True, temperature=temperature, pad_token_id=tokenizer.eos_token_id )

code = tokenizer.decode(outputs[0], skip_special_tokens=True)

return code[len(prompt):].strip()

A function called It generate_code is defined to handle code generation from user prompts. It takes a prompt string and tokenizes it into model-readable format using the tokenizer. Padding is enabled to ensure the input is correctly shaped, which is now possible due to the earlier padding fix. The result is a set of PyTorch tensors (input_ids and attention_mask) suitable for input into the model. This step ensures the prompt is cleanly processed before feeding it into the neural network.

print(“Welcome to the Code Generative AI”)

while True:

input1 = input(“\nEnter your prompt or type exit to quit: )

if input1.lower() == “exit”:

print(“GoodBye!”)

break

print(“\nGenerated Code:\n” , generated_Code(input1))

4. Main Loop

In the main loop, the agent welcomes the user. And then asks for the prompt, or type exit to quit. If the user types exit, it will stop. But if the user enters a prompt to generate a code, the code will be generated along with one sample.

Example Usage

After installing the libraries, the Command Line Interface looks as shown below:

As you can see that the agent asks the user to enter a prompt or type exit to quit, so the user enters a prompt to generate a code for checking whether the given input is even or odd. There are multiple codes that you can generate by using this agent.

Here, the user enters the prompt and presses enter, the agent automatically generates a code for that prompt. The agent creates a function called evenodd() and then generates the main function.

Conclusion

In conclusion, this agent offers a simple and interactive way to generate Python code using natural language prompts. Powered by a pretrained AI model, it’s ideal for learning, quick prototyping, and exploring AI-assisted coding. While it handles basic tasks well, users should review the output, as it may need refinement. Overall, it's a practical and accessible example of code generation with AI.

References

[1] https://huggingface.co/Salesforce/codegen-350M-mono

[2] https://www.salesforce.com/blog/codegen/

Loading

Monday, July 7, 2025