Getty Images/iStockphoto

Can AI write code? A developer experiments in two languages

AI tools are becoming a realistic option to help with coding tasks. This developer put his own coding abilities to the test.

In recent years, AI has been increasingly used to write code. A developer might use AI as half of a pair programming team or even allow AI to write complex code on their behalf.

AI's prominence in the development process raises the question: How do AI coding tools perform compared to human coders?

In an attempt to answer that question, I wrote a simple program in a language I am highly proficient in and asked an AI tool to write the same program. Next, I wrote the same program in a language I know almost nothing about to see how long it would take me to create a functional program without AI assistance. I then asked the AI tool to write the same program in the same language that I used.

I compared the results on three metrics:

  1. Does the AI-written code run?
  2. How closely does the AI-written code resemble mine?
  3. How long did the AI model take to write the code compared to the time it took me?

Setting up the coding experiment

For the purpose of my test, I wanted to create an application that is more complex than just a basic "Hello, World" program, but not so complex that I couldn't complete the programming task in an unfamiliar language.

As such, I decided to create a GUI-based program that displays the words "Hello World" within a window and includes an Exit button that, when clicked, causes the window to close. The program isn't overly complex, but it should be sufficient to compare my skills to that of AI.

Hardware considerations

An AI tool's speed is dependent on its underlying hardware. For this experiment, I am running the AI on my own computer hardware. The system is equipped with a 14th-generation Intel i9 processor, 192 GB of RAM, and an Nvidia GeForce RTX 4090 GPU with 24 GB of dedicated video memory. The AI code generation process would take significantly longer on less powerful hardware, especially if the GPU did not have enough integrated memory to fully load the entire AI model.

What AI model will I be using?

One of the most difficult decisions I had to make when performing this experiment was which AI tool to use, as quite a few AI tools can write code. One option was OpenAI's ChatGPT, which is well known for its ability to write code (shown in Figure 1).

User prompts ChatGPT to output Python script capabilities.
Figure 1. ChatGPT users can prompt the tool to write code in a variety of languages.

Another option was GitHub Copilot, a cloud-based service that can write code, assist in coding processes and integrate with various development tools. Figure 2, for example, shows GitHub Copilot in use within the code editor Visual Studio Code.

User prompts GitHub Copilot to demonstrate Python script coding capabilities using VS Code.
Figure 2. GitHub Copilot offers coding capabilities and integration with Visual Studio Code.

Ultimately, I decided to use qwen2.5-coder, a large language model (LLM) specifically designed for coding projects. Better still, qwen2.5-coder can run on local hardware using the Ollama platform. This means that you can use it as heavily as you want without worrying about cloud usage limits or incurring costs related to overages.

For my experiment, I used the 32-billion parameter model. Figure 3 shows what qwen2.5-coder looks like.

Qwen2.5-coder runs Python script through PowerShell.
Figure 3. You can run the qwen2.5-coder through PowerShell on your own hardware.

The main reason I decided to use qwen2.5-coder is that, when used with the --verbose switch, it can tell you exactly how long it took to finish the job. This is helpful because one of my goals is to quantify the time it takes AI to write code.

Comparing human-written vs. AI-generated PowerShell code

For my first test, I decided to write the code using PowerShell. I write PowerShell scripts almost daily and have recorded instructional courses on PowerShell development. As such, I consider myself a PowerShell expert.

Human-written PowerShell code

Below is the code that I wrote:

Add-Type -AssemblyName System.Windows.Forms

$Form = New-Object Windows.Forms.Form

$Form.Text = "Poseys PowerShell App"

$Form.Size = New-Object Drawing.Size(1024, 768)


$Label = New-Object Windows.Forms.Label

$Label.Text = "Hello World"

$Label.Font = New-Object System.Drawing.Font("Arial", 25)

$Label.AutoSize = $True

$Label.Location = New-Object Drawing.Point(400, 300)

$ExitButton = New-Object System.Windows.Forms.Button

$ExitButton.Location = New-Object System.Drawing.Point (450, 350)

$ExitButton.Text = "Exit"

$ExitButton.Font = New-Object System.Drawing.Font("Arial", 14)

$ExitButton.AutoSize = $True

$ExitButton.Add_Click({

   $Form.Close()

   })

 

$Form.Controls.Add($Label) | Out-Null

$Form.Controls.Add($ExitButton) | Out-Null


$Form.ShowDialog() | Out-Null
Desktop screengrab of PowerShell script and resulting
Figure 4. This is the code and resulting application I wrote in PowerShell.

As a basis of comparison, it took me 8 minutes and 40 seconds to write and test the code, as shown in Figure 5 (although I lost a few seconds to dismissing a scam call that came in while I was writing the code).

Image of ACCUSPLIT timer stopped at eight min. 40 sec. and 6 one hundredth of a second.
Figure 5. Writing the code took me eight minutes and forty seconds.

AI-generated PowerShell code

Now, let's look at the AI-generated code. Figure 6 shows my request and the output produced.

Qwen2.5-coder is prompted to write a PowerShell script and outputs code for the application.
Figure 6. When I prompted the AI tool to write a PowerShell script with the detailed parameters, it output the following code.

Here is the code that the AI tool produced:

# Load necessary assembly for Windows Forms

Add-Type -AssemblyName System.Windows.Forms

 

# Create a new form object

$form = New-Object System.Windows.Forms.Form

$form.Text = 'Hello World GUI'

$form.Size = New-Object System.Drawing.Size(300,200)

$form.StartPosition = 'CenterScreen'

 

# Create a label to display "Hello World"

$label = New-Object System.Windows.Forms.Label

$label.Location = New-Object System.Drawing.Point(100,50)

$label.AutoSize = $true

$label.Text = 'Hello World'

$form.Controls.Add($label)

 

# Create an Exit button that closes the form

$exitButton = New-Object System.Windows.Forms.Button

$exitButton.Location = New-Object System.Drawing.Point(100,100)

$exitButton.Size = New-Object System.Drawing.Size(100,30)

$exitButton.Text = 'Exit'

$exitButton.Add_Click({

    $form.Close()

})

$form.Controls.Add($exitButton)

 

# Display the form

[System.Windows.Forms.Application]::Run($form)

There are several minor differences between the AI code and my code. For example, the AI model used different sizes and positions for the various GUI elements, and it added comments to explain code blocks. It also -- annoyingly -- used a lowercase letter for the first character of each variable name.

Aside from those differences, the AI code is largely similar to mine. Figure 7 shows what the AI-written script looks like when run.

Desktop screengrab of PowerShell script and resulting right-aligned
Figure 7. Here is what the AI-written PowerShell code does when executed.

So, how long did the AI tool take to write its code? Looking back at the total duration line in Figure 6, you can see that the AI completed the task in less than 14 seconds, whereas it took me 8 minutes and 40 seconds. It took me, a PowerShell expert, 37 times longer than the AI to write the code.

Comparing human-written vs. AI-generated Python code

For my second test, I repeated the experiment using Python. Although I've worked with Python in the past as part of a robotics build, it had been nearly a year since I'd last written anything in Python at the time of my test.

In addition, my Python skills are extremely basic and I'd never written any GUI-based code in Python before, meaning that this experiment required me to do some research. There's no way I could have written this app off the top of my head, as I did with the PowerShell app. For the sake of maintaining the integrity of this test, I did not do any research ahead of time.

Human-written Python code

Here is the code that I came up with:

import tkinter as tk

root = tk.Tk()

 

def exit_program():

   root.destroy()

 

root.title("Poseys Python GUI App")

root.geometry("1024x768")

 

exitbutton = tk.Button(root, text="Exit", font="Arial,25", command=exit_program)

exitbutton.place(x=525, y=400)

 

label = tk.Label(root, text="Helo World", font="Arial,25")

label.place(x=512, y=384)

 

root.mainloop()

 

You can see my Python code running in Figure 8.

Desktop screengrab of Python code and resulting
Figure 8. This is the code and resulting application I wrote in Python.

As shown in Figure 9, it took me 19 minutes and 33 seconds to write, test and debug this app.

Image of ACCUSPLIT timer stopped at 19 min. 30 sec. and 40 one hundredth of a second.
Figure 9. Creating a functional Python application took me nineteen minutes and thirty-three seconds.

AI-generated Python code

In comparison, here is the code the AI tool wrote:

import tkinter as tk

 

def on_exit():

    root.destroy()

 

# Create the main window

root = tk.Tk()

root.title("Hello World GUI")

 

# Set the size of the window (optional)

root.geometry("300x200")  # Width x Height

 

# Create a label widget with "Hello World"

label = tk.Label(root, text="Hello World", font=("Arial", 16))

label.pack(pady=50)  # Center the label vertically by adding padding

 

# Create an exit button

exit_button = tk.Button(root, text="Exit", command=on_exit)

exit_button.pack(pady=20)

 

# Run the application

root.mainloop()

The scripts are largely similar to one another, aside from some minor aspects like the window and font sizes. The AI tool also added comments to the Python code, as it did in the PowerShell program, whereas I did not. You can see the AI-generated Python code running in Figure 10.

Desktop screengrab of Python code and resulting
Figure 10. Here is what the AI-written Python code does when executed.

Once again, the AI wrote the code in less than 14 seconds, as shown in Figure 11. It took me 19 minutes and 33 seconds to write code that was very similar, meaning the AI tool completed the task more than 70 times faster than I did.

Qwen2.5-coder is prompted to write a Python script and outputs code for the application.
Figure 11. The AI tool wrote the code in less than 14 seconds, according to the total duration.

Wrapping up: Is AI coding the way to go?

In all honesty, I've only recently begun using qwen2.5-coder, so I don't yet know how well it works on larger, more complex code. But I have little reason to doubt its abilities; I already occasionally use ChatGPT and GitHub Copilot to assist me in creating complex code.

I've had mixed experiences with those tools. Sometimes, they did an excellent job, while other times, they produced code that was not even close to being correct. This often happens when AI tools hallucinate, meaning their output is false or misleading but presented as correct.

AI-generated code has also recently been criticized for possibly violating open source licensing. Because many LLMs train on internet data, their output could sometimes resemble that of protected code, raising ethical questions and potential legal ramifications.

So, while it's evident that AI coding tools are gaining traction as useful counterparts to human coders, it's best that teams use AI to supplement, rather than replace, software developers. Human oversight is essential to check output accuracy, identify possible flaws or risks, and ensure that AI-generated code is used effectively in larger systems.

Brien Posey is a 22-time Microsoft MVP and a commercial astronaut candidate. In his more than 30 years in IT, he has served as a lead network engineer for the U.S. Department of Defense and a network administrator for some of the largest insurance companies in America.

Dig Deeper on Enterprise applications of AI