Using AI to Write Unit Tests: A Practical Guide for Developers

using AI tools for writing unit tests has shifted from a futuristic concept to a practical reality. While AI won’t replace the critical thinking and domain knowledge required for solid testing, it can significantly accelerate the mundane aspects of test creation, reduce boilerplate, and help maintain high test coverage. This guide will walk through using AI to generate and refine unit tests, focusing on practical workflows and honest limitations. By the end, we will be able to integrate AI assistance into our testing process, freeing up valuable time to focus on complex test scenarios and business logic.

Prerequisites

Before diving into AI-assisted test generation, ensure we have the following:

Basic understanding of unit testing: Familiarity with concepts like assertions, test fixtures, and mocking.
An Integrated Development Environment (IDE) with AI integration:

GitHub Copilot (or Copilot Chat): Excellent for inline suggestions and chat-based interactions within VS Code, JetBrains IDEs, etc.
Cursor IDE: Built with AI capabilities deeply integrated.
General LLM access: ChatGPT, Claude, Gemini, or similar models accessed via a web interface can also be used by copying and pasting code.

A project with existing code: We will use Python with pytest for our examples, but the principles apply broadly to other languages and frameworks.
pytest installed (for Python examples):

   pip install pytest
   ```

## Step-by-step sections

Let's explore several practical scenarios for using AI to write unit tests.

### Scenario 1: Generating Tests for an Existing Function

This is the most common starting point: we have a function, and we need tests for it.

#### Step 1: Identify a function to test

Consider a simple Python function that performs an arithmetic operation.

```python
# calculator.py
def add(a: int, b: int) -> int:
   """Adds two integers and returns the sum."""
   return a + b

Step 2: Invoke the AI assistant to generate tests

Using GitHub Copilot (or similar IDE integration):

Open calculator.py in your IDE.
Create a new file, test_calculator.py, in the same directory or a tests/ subdirectory.
Add a comment or docstring to prompt the AI. For example:

   # test_calculator.py
   import pytest
   from calculator import add

   # Write unit tests for the 'add' function using pytest
   ```
4. Wait a moment. Copilot should start suggesting test cases. Alternatively, if using Copilot Chat, select the `add` function, right-click, and choose "Copilot" -> "Generate Tests" or use a chat prompt like `/test`.

**Using a general LLM (e.g., ChatGPT):**

1. Copy the `add` function's code.
2. Paste it into the LLM's chat interface with a prompt:

Given the following Python function, write unit tests for it using pytest:

def add(a: int, b: int) -> int:
    """Adds two integers and returns the sum."""
    return a + b
```

The AI will generate test code similar to this:

# test_calculator.py (AI-generated)
import pytest
from calculator import add

def test_add_positive_numbers():
    assert add(1, 2) == 3

def test_add_negative_numbers():
    assert add(-1, -2) == -3

def test_add_positive_and_negative_numbers():
    assert add(1, -2) == -1

def test_add_zero():
    assert add(0, 5) == 5
    assert add(5, 0) == 5
    assert add(0, 0) == 0

Step 3: Review and refine the generated tests

AI-generated tests are a starting point. It’s crucial to review them for correctness, completeness, and adherence to project standards.

Correctness: Do the assertions make sense? Are the expected values accurate?
Completeness: Does the AI cover sufficient edge cases? For add, it might miss large numbers or floating-point considerations (if the function were designed for them).
Redundancy: Are there duplicate tests or tests that provide little additional value?

For our add function, the AI did a decent job. However, we might want to explicitly test the limits of integer types if this were a low-level language, or consider sys.maxsize in Python. For simplicity, we’ll assume standard integer behavior.

Let’s add a test case that the AI might miss if not explicitly prompted, for instance, testing with maximum and minimum values if relevant, or simply ensuring type hints are respected if the function could theoretically accept other types.

# test_calculator.py (Refined)
import pytest
from calculator import add

def test_add_positive_numbers():
    assert add(1, 2) == 3

def test_add_negative_numbers():
    assert add(-1, -2) == -3

def test_add_positive_and_negative_numbers():
    assert add(1, -2) == -1

def test_add_zero():
    assert add(0, 5) == 5
    assert add(5, 0) == 5
    assert add(0, 0) == 0

# Manually added edge case (or prompted AI for it)
def test_add_large_numbers():
    assert add(1_000_000_000, 2_000_000_000) == 3_000_000_000

Step 4: Execute the tests

Navigate to your project’s root directory in the terminal and run pytest:

pytest

All tests should pass. If not, debug the generated test or the original function.

Step 5: Iterate

As the add function evolves or new functions are added, repeat the process. AI can also help update tests when function signatures change, though this requires more careful prompting and review.

Scenario 2: Test-Driven Development (TDD) Assistance

AI can also assist in TDD by generating tests before the implementation.

Step 1: Define the function’s contract

Start by writing the function signature and a clear docstring or comments outlining its behavior, but leave the implementation empty.

# calculator.py
def divide(a: int, b: int) -> float:
    """
    Divides a by b and returns the result.
    Raises ValueError if b is zero.
    """
    pass # Implementation yet to be written

Step 2: Ask AI to generate tests based on the contract

Using Copilot Chat: Select the divide function, then use the chat window to prompt: “Generate pytest tests for the selected function.”

Using a general LLM:

Given the following Python function signature and docstring, write unit tests for it using pytest:

def divide(a: int, b: int) -> float:
    """
    Divides a by b and returns the result.
    Raises ValueError if b is zero.
    """
    pass

The AI should generate tests covering normal division and the specified error condition:

# test_calculator.py (AI-generated for divide)
import pytest
from calculator import divide

def test_divide_positive_numbers():
    assert divide(10, 2) == 5.0

def test_divide_negative_numbers():
    assert divide(-10, 2) == -5.0
    assert divide(10, -2) == -5.0
    assert divide(-10, -2) == 5.0

def test_divide_by_one():
    assert divide(7, 1) == 7.0

def test_divide_zero_by_number():
    assert divide(0, 5) == 0.0

def test_divide_by_zero_raises_error():
    with pytest.raises(ValueError, match="Cannot divide by zero"):
        divide(10, 0)

Step 3: Implement the function to pass the tests

Now, write the function implementation, using the generated tests as a guide.

# calculator.py (Implementation added)
def divide(a: int, b: int) -> float:
    """
    Divides a by b and returns the result.
    Raises ValueError if b is zero.
    """
    if b == 0:
        raise ValueError("Cannot divide by zero")
    return a / b

Run pytest. All tests should now pass. This workflow ensures that the implementation directly addresses the specified behavior and error conditions.

Scenario 3: Generating Tests for Components with Dependencies (Mocking)

Testing functions with external dependencies (e.g., database calls, API services) requires mocking. AI can often help set up basic mocks.

Step 1: Present a function with a dependency

Consider a UserProcessor class that uses a UserService to retrieve user information.

# user_processor.py
class UserService:
    def get_user_email(self, user_id: int) -> str:
        # Imagine this calls a database or external API
        if user_id == 1:
            return "alice@example.com"
        elif user_id == 2:
            return "bob@example.com"
        else:
            raise ValueError("User not found")

class UserProcessor:
    def __init__(self, user_service: UserService):
        self.user_service = user_service

    def process_user_data(self, user_id: int, data: str) -> str:
        """
        Retrieves user email and processes data.
        Returns a formatted string including the user's email and processed data.
        """
        try:
            email = self.user_service.get_user_email(user_id)
            processed_data = data.upper() # Simple processing
            return f"User {email} processed data: {processed_data}"
        except ValueError as e:
            return f"Error processing user {user_id}: {e}"

Step 2: Prompt for tests, specifically mentioning mocking

Using Copilot Chat: Select the UserProcessor class and its process_user_data method. Prompt: “Generate pytest tests for this UserProcessor class, ensuring to mock the UserService dependency.”

Using a general LLM:

Given the following Python classes, write pytest unit tests for the `UserProcessor` class, specifically focusing on the `process_user_data` method. Ensure that the `UserService` dependency is mocked.

# (Paste both UserService and UserProcessor classes here)

The AI will likely generate tests using unittest.mock.MagicMock or similar techniques.

# test_user_processor.py (AI-generated)
import pytest
from unittest.mock import MagicMock
from user_processor import UserProcessor, UserService

def test_process_user_data_success():
    mock_user_service = MagicMock(spec=UserService)
    mock_user_service.get_user_email.return_value = "test@example.com"

    processor = UserProcessor(mock_user_service)
    result = processor.process_user_data(1, "hello world")

    assert result == "User test@example.com processed data: HELLO WORLD"
    mock_user_service.get_user_email.assert_called_once_with(1)

def test_process_user_data_user_not_found():
    mock_user_service = MagicMock(spec=UserService)
    mock_user_service.get_user_email.side_effect = ValueError("User not found")

    processor = UserProcessor(mock_user_service)
    result = processor.process_user_data(99, "some data")

    assert result == "Error processing user 99: User not found"
    mock_user_service.get_user_email.assert_called_once_with(99)

def test_process_user_data_empty_data():
    mock_user_service = MagicMock(spec=UserService)
    mock_user_service.get_user_email.return_value = "empty@example.com"

    processor = UserProcessor(mock_user_service)
    result = processor.process_user_data(2, "")

    assert result == "User empty@example.com processed data: "
    mock_user_service.get_user_email.assert_called_once_with(2)

Step 3: Evaluate and adjust mocking strategies

The AI usually provides a solid foundation for mocking. However, complex mocking scenarios (e.g., mocking multiple methods, specific return values based on arguments, or using patch decorators) might require manual refinement or more precise prompting. Always verify that the mocks accurately simulate the dependency’s behavior relevant to the test.

Common Issues

While AI is a powerful assistant, it’s not infallible. Here are common issues and how to address them:

AI generates incorrect assertions or expected values: The AI predicts, it doesn’t truly understand the business logic. Always manually verify the expected outcomes in assert statements.
Fix: Correct the expected values in the test code.
Tests lack coverage for specific edge cases: AI might miss obscure edge cases, performance considerations, or highly domain-specific scenarios.
Fix: Explicitly prompt the AI for specific edge cases (e.g., “add tests for empty list input,” “test with maximum integer value”) or add these tests manually.
Tests are too simplistic or redundant: AI might generate many basic tests that don’t add significant coverage or are overly verbose.
Fix: Refine prompts to ask for diverse test cases or more complex scenarios. Manually prune redundant tests to keep the test suite concise and meaningful.
AI struggles with complex mocking setups: While AI can handle basic MagicMock usage, intricate mocking of multiple object layers or specific side effects can be challenging for it.
Fix: Provide more context about the dependency structure in your prompt, or manually implement the more complex mocking logic.
Generated tests don’t follow project style or conventions: AI might use a generic style, which might conflict with your team’s established patterns (e.g., test function naming, fixture usage).
Fix: Include style guidelines in your prompts (e.g., “use pytest fixtures for setup,” “name test functions test_feature_scenario”). Integrate linters and formatters (like Black or Prettier) into your workflow to automatically clean up AI-generated code.
AI “hallucinates” non-existent functions or modules: Occasionally, the AI might invent methods or classes that aren’t part of your codebase.
Fix: This is a clear sign that manual review is essential. Discard or correct any hallucinated code.

Next Steps

After mastering the basics of AI-assisted unit test generation, consider exploring these advanced areas:

Advanced Prompt Engineering: Experiment with more detailed and structured prompts (e.g., few-shot prompting where you provide examples, or chain-of-thought prompting asking the AI to explain its reasoning) to get higher-quality and more specific test cases.
AI-assisted Test Refactoring: Use AI to help refactor existing tests, make them more readable, or convert them to use fixtures more effectively.
Integration with CI/CD: Explore how AI tools can be integrated into your continuous integration pipeline to automatically suggest or generate tests for new code changes, subject to human review.
Experiment with Different AI Models: Different AI models (e.g., specialized code models vs. general-purpose LLMs) may excel at different aspects of test generation. Try various tools to find what works best for your specific language and project type.
Beyond Unit Tests: Apply similar AI techniques to generate integration tests, API tests, or even basic end-to-end test scenarios, recognizing that the complexity and need for human oversight will increase.
Understand When Not to Use AI: Recognize scenarios where AI is less effective. Highly complex business logic, tests requiring deep domain expertise, or tests for non-deterministic behavior might still be faster and more accurate to write manually.

By thoughtfully integrating AI into our test-writing workflow, we can significantly boost productivity, improve code quality, and maintain a solid safety net of unit tests. Remember, the AI is a co-pilot, not an autopilot; human judgment and expertise remain essential.