Can Generative AI Replace Coding? A Data Scientist's Attempt at Web Development

Creating a Website from Scratch with No Prior Knowledge Using ChatGPT: An Experiment and Reflective Insights

Jun 06, 2024

A vibrant digital art of a monkey coding on a laptop, looking super strong and energetic. The monkey has exaggerated muscular arms and a determined expression. It's seated in a modern home office environment, surrounded by high-tech gadgets and digital screens. The monkey is wearing headphones and a sporty outfit, enhancing the humorous and exaggerated theme of the artwork. The color palette is bright and lively, with a playful touch to the overall scene.

Generative AI will replace coding—this claim has recently gained traction. Developers already use tools like GitHub Copilot or ChatGPT. Devin, GPT Engineer, and others even promise to (almost) fully automate the whole process.

As a Data Scientist, my daily work involves substantial coding and the question about the future of this skill inevitably comes up. Since I already have too much implicit knowledge about Data Science, I undertook an experiment in a related field—Web Development—to test the hypothesis: Could I guide ChatGPT into building a simple website with minimal oversight?

So, hold tight - here comes the Data Scientist with his AI going for software development, specifically web development. Spoiler alert: I encountered significant inherent limitations that challenge the feasibility of fully automating this process.

The Experiment: Building a Journalist Assistant Web App

Objective

The goal was to construct a simple, yet fully hosted web app serving as a Journalist Assistant. I’ve seen a couple of those assistants, for example from my colleagues at Aftonbladet (and other media in Schibsted), but also others like OVB Media, JP/Politikens or NOZ/mh:n (KI Buddy). So, the question was whether I could rebuild a reduced version - just a text interface and some selectable prompts to GPT4.

This application would feature a login system, a text input page utilizing various GPT-based tools, and another page integrating components from Airops. The frontend would be developed using React, the backend with Node.js, and Firebase for authentication, all deployed on GCP Cloud Run via Docker and Cloud Build. The disclaimer here is that this a very simplistic app, not comparable to existing systems with much more complexity.

Background

My expertise as a data scientist has primarily been in Python, with most of my applications built using Streamlit. My knowledge of web development (JavaScript, HTML, CSS), based on a university course taken ten years ago, was rusty. Nonetheless, I had a basic understanding of security, DevOps, and cloud practices, which proved somewhat helpful.

How it Went: A Four-Stage Journey

Stage 1: Initial Setup and Quick Wins

Using ChatGPT, I quickly established my development environment with WebStorm on WSL and asked for the components that would work well. Then I began to lay out the foundational code.

I quickly got the hang of how to do that best:

Generate skeleton files (“I am building ABC, how would you structure that, what files do I need”)
Create individual components (“I am building ABC, create a component that does ABC”)
Iteratively refine components. Create a new chat for every refinement. Add the files as context to GPT. (“I am building ABC. This is component A. I want to change this and that. I attached the following files as context”)

This stage was marked by rapid progress and immediate gratification as the basic structure of the application took shape. I addressed smaller issues myself, with 95% of the code generated by ChatGPT. I felt like a “code monkey on acid” - not thinking about what I did, but progress was great.

Stage 2: Technical Challenges and Learning

As the development progressed, I encountered several technical challenges:

Containerization Issues: My initial approach to squeeze both the front-end and back-end into a single container led to confusion, particularly with how the front-end communicates with the back-end inside a Docker container.
Build Optimization: The transition from a development server setup using npm start to a production-ready build, configuration revealed gaps in my understanding of web development best practices.
Cross-Origin Resource Sharing (CORS): Understanding CORS and its implications on resource sharing between domains was a significant hurdle.

Overall, I spend a lot of time debugging these issues. Which was pretty difficult, since I was not even aware of what issues I was looking for.

Stage 3: Deployment Success

After overcoming the initial challenges, deploying the containers to GCP Cloud Run was straightforward and successful, aided by my prior experience.

Stage 4: Security Concerns and Reflections

My app was online, but suddenly doubts creeped in:

Authentication Needs: I was a bit worried about my GCP bill and my own OpenAI keys. If I don’t want to face existential risk - better add a layer of authentication.
Backend Security: Now I restricted the front-end with a Firebase Authentication, but what about the backend?

After fixing those, I got even more paranoid - what if I oversaw something important? Key leaking somewhere, possible attack vectors, injection opportunities, or whatever. I know there are SAST and DAST tools that I would use in the next step to inspect the code.

Furthermore, I did not think about:

Performance: I don’t know what the code is doing. Yes, on a conceptual level, but not in detail. I have no clue whether it is somewhat performance, or works well on my machine.
Scalability & Availability: What if a lot of people would use it? Would it be able to scale? Is it really?
Cost: What could potentially happen to my cloud bill? DDoS Attacks, Bots, unexpected behaviour in general?
Testing: Ehhm, yes, no.

So, after 20-25 hours I had a running app that was potentially in a danger zone.

Conclusion: Insights and Implications

Does this mean the end of traditional software development? Since I am not a software developer, I don’t feel confident to speak a verdict. Overall, I definitely would see an issue if people like me start building important applications in unfamiliar territory. Internal tools behind firewalls - yes! Something complex in production - better not.

Limitation

Lack of Context: ChatGPT was not making any deliberate trade-offs. It just happily churned out code. But most of the time, you need to make those decisions between simplicity, performance, security, etc. To make informed calls, you will need to understand the requirements. ChatGPT will only get those right if you explicitly spell every little business/technical context that is required to derive requirements. This might be impossible because we humans often carry implicit knowledge that we are not even aware of. You might say - just make a sufficiently good prompt, but then making that prompt by including all the context in a structured way becomes an art.
Incoherent Outputs: If you don’t ask for best practices you often won’t get them from ChatGPT. I saw it with the css part of my code - sometimes it put it directly into the react component, sometimes into a dedicated CSS file, sometimes into app.css. Apart from that, there seem to be quite strong opinions between software developers in terms of how to make code simple, modular, re-usable or whatever, ChatGPT is not making that explicit.
Context Length: The more complex your project becomes (and mine was super simple), the more code you often need to provide to the models. There is a limit to context length in LLMs. Even if that is becoming quite large, there is still the “needle in the haystack” problem - models struggle with large inputs.
Unknown unknowns: The AI often made assumptions or omitted critical aspects like CI/CD unless explicitly queried. You can’t point it to that if you don’t know about it. Partially this is an AI problem, partially a human one.
Recency of Knowledge: I sometimes ran into issues when ChatGPT did not include the latest docs. Probably I could have asked it to use the web interpreter, but that would have maybe confused it only further. Also, it would probably be too much for it to scan the source code in its entirety or crawl through all the GitHub issues and pull requests.

Final Thoughts

Overall, I think that we are very far from low-skilled wannabe web devs (myself) replacing real solid software development just by using GenAI tools. Especially, when there is a lot of complexity, security, performance, cost, availability, scalability, extensibility and similar requirements. I think, that those limitations are not going to be fixed by an incremental update of LLMs. Formulating the right requirement from a complex context, making deliberate trade-offs and ensuring best practices might not be automated anytime soon.

However, it still elevated my skills by quite a lot. For internal tools or prototypes shielded behind robust security measures, the use of ChatGPT can be extraordinarily empowering. This experiment expanded my technical skillset and highlighted the importance of a meticulous approach to security and cost management in software development.

Daniel’s Substack

Discussion about this post