Acima Development Episode 96: AI & Code Reviews

About this Episode

This episode explores how AI coding tools are changing the role of code review. The hosts point out that AI can generate large amounts of code quickly and even review it, which shifts the bottleneck from writing code to reviewing it. While AI can handle repetitive or low-risk tasks like documentation updates or simple refactors, it can also produce inconsistent feedback and get stuck in loops. Because of this, teams need clear rules and priorities, such as focusing first on whether code works, then on security and performance. AI is useful, but only when its boundaries are well defined.

The group discusses different ways to structure AI-assisted reviews. Ideas include using multiple bots to score changes, setting strict allowlists for what AI can approve, and blocking sensitive areas like business logic or database changes. They compare AI to a junior developer who can help but should not be fully trusted without oversight. Risk becomes a key factor, similar to self-driving cars where automation works best under specific conditions. Some participants prefer AI as an assistant that gives suggestions rather than one that approves code, since human judgment is still needed for context and decision-making.

The conversation also highlights what is lost when humans are removed from the review process. Code reviews have traditionally been collaborative and educational, helping developers learn and improve through discussion. AI removes much of that interaction and can even create false confidence by being overly agreeable or flattering. This can lead to mistakes making it into production. In the end, there is no clear solution. Teams need to balance speed with caution, use AI where it adds value, and keep humans involved to maintain both quality and the collaborative nature of building software.

Transcript:

MIKE: Hello and welcome to another episode of the Acima Development Podcast. I am Mike, and I am hosting again today. With me, I have, as usual, Will Archer. We've got Thomas Wilcox. We've got Eddy Lopez. Dave Brady.

DAVE: Hello.

MIKE: [inaudible 00:35] join. And we've got, after a long absence, Tad Thorley [laughs].

TAD: Yeah, thanks for inviting me.

MIKE: We bumped into him this week, and he came and joined us, so it's great to have you, Tad. And Tad actually kind of seeded our topic for today that we'd like to go into.

As usual, I'd like to, you know, connect this to real life. I went fishing for a compliment today [laughs]. I was talking to my daughter at lunch time, and she was saying something to my youngest. I didn't even hear what she said, but she said something like, "Oh, because you're strong and tough."

And I didn't know who she was talking to. And I said, "What was that?" She said, "Oh, I was talking, you know, I was talking to him." I'm like, "Okay, because I know that I am, you know, weak and fragile." And she looks at me [laughs], and then she says, "You are not weak. You are strong," something [laughs] along those lines. I thought, ah, thank you [laughs]. Thank you. Say nice things to dad.

And I totally dug for that. Totally not deserved in any way [laughs], but I took it anyway. As humans, we like somebody to say something nice to us. It's always a good thing. But we also are totally prone to flattery. And [laughs] if somebody says something nice to us, we will believe it, whether it's true or not.

Actually, this morning, early, I read a crazy story. Crazy story. And I'm not going to go into it in depth, but it involved a scammer in Mexico convincing a variety of U.S. movie executives to make a movie out of his story of being imprisoned by the Mexican cartels to play flag football [laughs].

DAVE: Flag football. That's the interesting--

MIKE: To the death. To the death.

DAVE: To the death. Oh yes. Yes.

MIKE: But, you know, you can keep [inaudible 02:21]

WILL: But no contact until you die.

MIKE: Exactly [laughs].

WILL: You're only going to take one tackle, but it's going to be a doozy.

MIKE: I think they weren't allowed to tackle, but they were, like, breaking each other's teeth. And then if you lost, they took you out back with weapons, yeah.

DAVE: It's a high lie. It's traditional down there.

MIKE: [chuckles] It was a crazy story. Well, no, it was a scam artist who was pulling all this off from the beginning. But, you know, you can pull off a lot by just being really convincing and saying nice things to people, telling them what they want to hear.

We'd like to talk today about code reviews [chuckles] and doing evaluations of human output. And we're in an interesting time period. A couple of years ago, even a year ago, maybe even six months ago, we would not have had this conversation. But there are tools out there now that can read your code and actually give pretty good reviews most of the time. In fact, in some ways, they're going to be better, and that "in some ways" is doing some work here. So, let me be clear: in some ways, they're going to be better than human reviewers. That is not universally true, I don't think, at this point. In fact, I think it's far from universally true, which brings us to our topic today.

What does it mean to do code review today? There are tools that can do code reviews. What do they do well? What do humans do well? What does it mean? And we've talked before about code reviews. I think it's been a while. I think it's been maybe a year or two since we've talked about code reviews, the value of code reviews. So, we'll maybe touch on them maybe a little less this time.

DAVE: And it was entirely a soft skills discussion, right?

MIKE: Yeah, I think it was. I think it was.

DAVE: Humans talking to humans.

MIKE: Humans talking to humans. And now we've got the machines talking to the humans, and the humans talking to the machines, and the humans talking to the humans about what the machines are saying. It's totally scrambled.

So, revisiting this idea of reviews with AI in the mix, now, Tad, again, prompted this discussion because he's been playing around with this and has found some solutions to some of the cases that go wrong [laughs]. There are degenerate cases where the AI will recommend that you change something, and then when it sees your changes, it'll recommend you go back to the way you were before [laughs].

If you're anybody who's used a linter, you've probably seen the same thing. It tells you to fix it, and then you cause a new problem. So, which one do you choose? That's where we get into art. That's not an unsolvable problem, but there are some interesting solutions there. Nor is it nearly the sum of all of the problems here because there are all kinds of edge cases here with reviewing with AI.

With that introduction, Tad, I'm really curious for you to give us a little talking to about what you've been working on and some of the solutions you've found.

TAD: Okay. Yeah. I just was mentioning something to Dave because I think what's really hard is I find that, with AI, I do way more code reviews than I've ever done before. And I was giving Dave an example because I can, like, just with my Claude Code setup, I was able to integrate it with Sentry, which is error tracking, and Linear, which is our task management, and GitHub, right, has a command line.

And so, I could literally, with a prompt, say, "Look at our past 20 or so Sentry errors. Create Linear tasks for each one. Create a local work tree for each of those Linear tasks. Fix them in parallel in those work trees. Create a PR for each one, and assign Chris for every PR. Do that in parallel with subagents." And, for me, typing that up takes, I don't know, a few minutes. And now I've just given Chris, like, two days' worth of reviews, possibly, or something like that, right?

Like, so much code could be generated so quickly and so easily that I find that the code review step is the biggest bottleneck. It usually is the bottleneck, but now it's multiplied. Like, it is absolutely the biggest bottleneck in the whole process. And I don't honestly know, like, a complete solution to that. But something that we were doing at work was actually bot reviewers, where we would say, you know, like, if your review looks safe enough, the bot will just approve it. And that was kind of an interesting experiment that we were doing where you have to --

But, like you were saying, Mike, one of the first issues that I ran into when the CTO kind of implemented that was I pushed up a PR, and it said, "This code is inefficient." And I'm like, okay. And so, I just had my Claude just keep checking GitHub and say...I told it every time it says there's a problem, fix it, and push up the fixes, and just do that until everything is approved, right?

And my Claude Code, for about 45 minutes, tried that. And it kept flipping back and forth between like, "Oh, you're not doing enough security checks.” Oh, "This code isn't performant enough.” Oh, "It's not doing the security checks," and just back and forth in a loop. And my Claude Code, I could almost feel its frustration in its final message to me. It essentially said, "I cannot get a review past the reviewers. I keep going in this cycle, and they are never going to review this," and it just gave up [laughs]. And I'm like, wow, I've never seen a bot just straight up give up before, but here we are. So, yeah, like, that was our first, like, test of that.

Our setup was, we had what we called the bot committee, where we had a Codex, and we had, like, a Claude Opus that would both review independently then, like, an aggregate score would be kind of brought together. And if the score was over a certain threshold, then it's like, okay, yeah, you can auto-approve this.

But what I did last week was, I found I had to go in and be very clear in what was okay to pass and what was not, right? Like, you're updating some documentation; that's great, you know. You shouldn't have to have a human, like, approve your documentation update. Like, a bot can say, "Oh yeah, this doc does look like that code, green, right?" Or just, you know, like, variable name changes, like, oh, I clarified this by changing the name of a variable. A bot can look at that and just say like, "Cool,” right?

And, honestly, as a human, I loathe to make those kinds of changes because I know I'm like, that would be nice, but I'm going to have to pester somebody to get that sort of change through. Even though it's trivial, I still have to message somebody on Slack and say, "Hey, can you look at this? It's trivial." And they have to, like, stop what they're doing and push a button, you know? And so, things like that, I think, are great.

I think where it gets dangerous is that GitHub lets you have bots that just auto-approve. And what if you're making business logic changes? I don't know [laughs]. Like, you have to be very careful. I find that, with bots, you have to be very, very, very, very clear on the boundaries of what is acceptable, what is not acceptable, what are the edge cases. Very clear rules. Try to make it as deterministic as possible.

Like, for my example of, like, the flip flop, I'm like, okay, we have to go through and say security is more important than, like, anything else, right? Well, I think number one is, does the code actually work? Does the code work is, like, number one. Security is maybe, like, number two. And then you kind of make a list of hierarchies. And then it's like, okay, well, it's maybe not as performant, but you've got to check authorization. You know, like [chuckles], you can't just let someone in, so that sort of thing.

WILL: Can we drill down a little bit in sort of these concepts, right? Because, like, a lot of the stuff, like, I understand the general thrust of what you're saying, right? Where it's like, okay, we need to make sure there's guardrails and specific stuff, and you need to have the reviewer bots, two independent reviewer bots assign a score, right? And the score has to be below a threshold, right? But, like, a lot of that stuff is conceptually easy, but how do you do it, right? That's the interesting aspect, to me.

TAD: Yeah, and that's where, I think, you have to get very, very, very clear, right? Like, you say, "Database migrations are off the table. Like, anytime someone changes the database, it has to be by a human. Oh, by the way, that is any file in the DB directory," right? Like, you have to say, like, "This is what a database change looks like. This is where it lives. This is how you identify it." And I feel like if you do, like, that level of specificity, then you get fairly good results. But if you're, like, vague like, "Make sure the code looks good," then they're like, "This looks great to me. It was written by a bot, and I like bot code." So, you know.

WILL: Well, let me ask you this, right? What about that variable rename, right? So, I, like, principally, these days, for the moment, for today, I work in, like, a statically typed language, right? And so, like, if I'm doing, like, a rename, right, and I botch my rename for whatever reason, right, then you know, like, the compiler will choke and throw up a red flag. And so, I don't have to worry so much about, like, variable rename.

But, like, if you have a dynamically typed language, right, where you don't have those guardrails, like, how can you be sure that my variable rename...it's like, you know, like, I named it something dumb or, like, it was a typo, and it was just embarrassing. I don't want the Git blame to point to, like, Will can't spell, right? So, I want to auto-generate that up, but if the bot, for whatever reason, dropped a stitch somewhere...

TAD: Yeah, I don't know. Like, I think, at some level, you have to accept some risk. I think, with AI, there really isn't any guarantees. Like, you could say, like, bots are really good at pattern matching, and they're really good at grep, and they're really good at find and replace. So, I think that a variable rename is probably pretty safe. And I've got tests, and the tests pass. But, I don't know, I don't think you'll ever have 100%.

I think you just say, like, what's the...you're doing a trade-off, right, of how much does approving these little PRs slow people down versus, is it worth the risk? And I would say you've got to kind of determine that. Like, is it likely that a bot will be able to figure this out? Yeah. Is it worth the possibility of the unlikely thing? Yeah. It's probably super safe, maybe not 100%, and it saves us enough time that it's worth it to us, right?

WILL: Right. Well, I mean, it's similar to the self-driving car argument, right, almost exactly, right? Because there is a sort of a floor for risk, right? Just, like, to tangent over, to, like, self-driving cars, right? I know, like, for the average human being, the average number of miles driven, I'm going to kill these many people [laughs]. I'm going to crash these many cars, right? Like, we know out to, like, nine decimal places what that is because billions of dollars are riding on people's ability to calculate that. And so, like, if I know if I ship 100 PRs I'm going to give you a prod bug, I know I'm going to do it. I know I'm going to do it. I hope it's only 100, but I think 100 is pretty likely. So, if there's a 1% chance, send it, right?

TAD: I would say, to use, like, your self-driving car analogy, you would say, okay, I'm okay with you driving this car. I'm okay with the car going into autopilot mode if the weather is good, if your lidar is active and running, and it's flat and straight, right?

WILL: Right.

TAD: If those conditions are met, go ahead. I'm going to take a nap, because I feel like those conditions are fairly well understood for self-driving cars.

DAVE: Really, really good point. So, you don't want to let the percentages drive, right? The 1% is not causative, right? It's just we tend to collect them. So, if you're in your Tesla and you say, "Auto drive," and you lay back and shut your eyes off, the auto drive will shut off and say, "I won't do this unless you're paying attention to back me up." So, it's not just, "Is it clear, and is it dry?" but, like, what are the causative factors, right? Where does that 1% come from? It's coming from the most dangerous stuff, so the right backup is involved as well.

WILL: So, maybe being maybe more specific, right, because it's always being specific always leads to interesting conversation. Do you think the approach should be, for this sort of, like, auto-approved bot guardrails...do you think the rules should be, like, an allow list or a deny list, right? Where it's like, these kinds of things, right, are approved, and you could go, and you can do X, Y, and Z, but if it's not on the explicit allow list, forget it. You got to have a mean monkey signing off.

MIKE: Well, what would you have an intern do? And I think it's maybe kind of the same question. Somebody who's inexperienced and might really mess things up, but, you know, they're generally competent. They're smart people. They're just not that experienced yet in their career or in your codebase. What would you let them do without close monitoring? And I think you need to ask questions like that.

And I think with the interns, I would have an allow list [chuckles], because, you know, I'm not going to say, "Oh, you work on whatever you want, just, you know, don't touch the database.” And, you know, and then they go work on core business functionality and take down the application. I don't want that to happen. I might not think of everything, and I'm probably not going to think of everything. And I don't think that I would consider the AIs, in most cases, much different than that intern right now. Does that seem consistent with your experiences, Tad?

TAD: Yeah. Yeah, like, I would probably, I don't know, from a practical standpoint, I would probably put a bunch of code owners in that say, "The bot can't approve any changes in this code," right? Like, if it's business logic, if it's critical, if you have to understand it really well and have a lot of context, I'd say you make some hard guardrails there where the bot just can't approve stuff.

WILL: Okay, so, like, all right. So, I'm going to say out loud, so, like, the allow list would be if it isn't...this sounds like a block list, right? Where, like, you specify by, like, you know, in the directory structure, like, these things are botable, and these things are not. And if it's on the code owner's list, then they have to talk about it, and then, otherwise, send it [chuckles].

EDDY: I think it's easier to maintain your parameter with an allow list versus a deny list, especially if your application is a behemoth, right? You have to be more intentional about what you're disallowing as opposed to just saying, "Hey, these are the only ones that we care about," and you can keep them concise, right? And say, okay, right, like, anything that's allowed, fine. Maybe, like, YAML changes could be a thing, right, menial tasks that require very little intervention, right? So, I would always, I think, gravitate to an allow list for a bot, and then let that gradually increase as you understand it better.

TAD: I guess, I don't know, other than letting AI do some PR work, I don't know how I would ever keep up with the review load. Like, I feel like most of my days are doing code reviews, because, well, like my example at the beginning, like, I could easily do a prompt that generates dozens and dozens of branches and reviews and assign it to my fellow devs, and they can do the same to me, right?

And then if I say, "Hey, fix issues that you obviously see in production," like, that seems like a legitimate thing to do. Like, yeah, I just don't know how, unless you get some automated tools and fix it for humans, I don't know how you get progress.

EDDY: I think I kind of prefer a bot to give me recommendations on what it thinks needs to be changed, versus approving PRs, right? Like, here's the golden key to production. You're not going to have it. I'm sorry. Like, you need to be a Mike Challis or a David Brady for you to be trusted, you know, to hit that merge.

TAD: Interesting

EDDY: Right? However, if you say, "Hey, along the way, you're not going to be able to push the car over the bridge. But you will be able to give me, you know, guidelines: turn here, turn here, brake here,” and I am totally okay with that, right? Because, as laws exist, you're expected to adhere to the established guidelines. And if you don't do that, right, like, a bot is able to kind of traverse, right, upon the parameters that you give it. So, as long as...at least for now, the way I see it evolve, I think it's a phenomenal PR reviewer, to a degree, right, to give you suggestions, but never to allow it to auto-approve anything. I think that's dangerous. I don't think it has enough context. You know, I don't think [crosstalk 21:08]

TAD: Even, like, I went in, and I updated the documents because I noticed the documents are out of date.

EDDY: Will it have context fully on the whole application itself for it to deduce that it's fully [inaudible 21:21]

DAVE: So, Eddy doesn't get to be sysadmin, is what we're saying.

EDDY: Oh, what I'm saying is, I don't think it has enough context even to update a documentation, right?

TAD: I think, honestly [crosstalk 21:32]

DAVE: I think it's got enough that we might, so we might. There's a key assumption we're all making here, guys. We're all talking about mission-critical cash flow, central production code. We're kind of sitting here. This almost feels like a decision of, like, we're going to go work on something mechanical. And we're trying to decide, do we only want to use the wrench, or do we only want to use the impact driver? And what if what you're writing is a one-off vibe-coded auto-clicker for a developer to use to push a QA test, right?

Intern whitelist, in fact, wide open whitelist. I'm not going to put anything. Just go nuts. You know, Claude, dash dash skip-permissions-dangerously, go nuts, right [laughter]? And I've done that, and it pushed my production key up to my GitHub. It was a private app; it wasn't the company one. It was mine. And I learned an important lesson from that. But what I've done is I've now just said, "Okay, whitelisting. You're not allowed to git push. You're allowed to look at Git. You're allowed to read Git, but you're not allowed to push it." And we'll find other things as we go along.

I would say, do what's appropriate. We have an application that Eddy and I have worked on, we do work on, that is scary and dangerous and has a lot of legacy stuff and a lot of interacting parts that are subtly interacting, and those need a human, right? I don't trust an AI to do this.

But as an AI assistant, it's already catching things where it's like, "Oh, you changed this and this and this. And you guys only read the diff on GitHub. Did you know there's this other file over here that isn't even in the PR that uses that instance variable that you just removed? And when it uses the @ and the var, it's not there now, so it's going to initialize. It's going to be nil. There's going to be a blank spot on the page. I sure hope QA catches that because you're not going to see it, and there's no test covering it,” right?

So, having both of those is fantastic. But yeah, vibe coding an auto-clicker, like, I did that a couple of months ago, and it works a treat. And I have no idea how the code works, and I don't care, because I just needed an auto-clicker. I wanted to see how vibe coding worked, and it worked. But I was mindful about what I was building.

MIKE: Little do you know --

DAVE: What's that?

MIKE: It's doing crypto mining and sending it to somebody [laughter].

DAVE: Oh yeah [laughter]. It's for my AFK Minecraft, and somebody in Croatia is making a lot of money. So...

WILL: Listen, OpenAI, you know, they've been having more and more problems, you know. It was either that or ads, you know [laughter]. Actually, OpenAI has been writing ads and then inserting them into your production website [laughs]. Sorry, anyway [laughter].

Well, so, like, one thing that I'm always interested in, so I have, maybe, like, two questions. I mean, one is, like, in all honesty, it seems like the AI could be sitting down and reducing the cognitive load on, like, on you as a reviewer, by, like, assigning a safety score and walking through it. Like, "Hey, I've got 100% test coverage in this file. This file has 100% test coverage, and so I feel good about any changes I make not breaking anything because I know I've got this thing locked down. And, like, here are the number of importers, right, of this class, right? This class is used in one place, you know, it's only used in one place, and the interface is really simple, you know, and the callbacks are really simple. So, like, I'm going to score, you know, in this way.

And I'm going to sit back and even when I necessarily can't sign off on it arbitrarily, you know, you could say, like, "Hey, here's the score." And then the AI can get smarter and smarter by saying like, "Oh, no, no, that file, you know, that file is a thing." You annotate it, right, and then the AI is like, "Oh no, if something changes this file, or something changes the inputs to this, you know, high-tension file, then we can sit back and, like, we can accelerate the review," so that it can make the job easier on you. It can get smarter, right? If it has a good score, then it sort of, like, smooths the way to be like, "Okay, these things can just go."

TAD: It's interesting because I actually created a template that I would have Claude use. I would push up a PR, and I would say, "Apply this template." And it was things like, anything that we discussed, put that into a trade-offs and considerations section, right? Like, I was like, "I'm thinking about this. I'm thinking about this," having a little back and forth with the bot. And it records those and puts them in the PR, right? And I also have it, like, any time I'm doing this kind of change, do this kind of Mermaid diagram. I'm doing this kind of change, do this kind of Mermaid diagram.

And so, my intent was, some human is going to read this, and I get sloppy in, like, oh, this is what the PR does, da da da. And I don't necessarily do everything that is valuable for someone reviewing my PR. But the bot can, like, kind of fix that and augment what I'm doing, right? Like, I would have it, like, go through, add diagrams, talk about what the trade-offs were, what the decisions were that I made, try to emphasize which files were more important to look at, which ones probably aren't as important to look at, give a table of all the files and an overview of what changed in that file, and that sort of thing, right? And give, like, summaries and stuff.

Basically, I just was like, "What would I love my ideal PR to look like if I'm going to review it?" And I just would have the bot, like, help me do that. And I've found that to be really handy. I don't have the time [laughter] to figure out all the Mermaid diagrams for stuff, but having the bot, like, add a bunch of diagrams of all my changes and what they mean, you know, like, that's been really nice.

EDDY: I've had it be like, "Hey, analyze the recent changes that I pushed up and write up a test instruction on how to test it." It's pretty good with that sort of thing: if you give it, like, specific parameters on the changes you've done and say, "Hey, give me a nice, little template for people to use to replicate the changes that I've done, and go with edge cases." I'm not kidding, like, I've done it, just to give it an idea, and it even considers other branches that even I didn't contemplate, right?

So, like, it's really good when you confine it. If you say, "Hey, only operate within this box, right, and don't go away from it, you know, only retain context," the shorter it is, the smaller the context, the more accurate, the more efficient it is. That's the only time that I'm willing to, like, say, point-blank, that I trust AI. Outside of that --

DAVE: The thing that I like that Claude Code does is that it can say, "Okay, I need to edit this file," and it'll say, "Can I do this? Yes/No." But option two is usually, "Yes, and you may do this edit in that directory," you know, "You can edit that directory. Anything else you need, go ahead," or "Can I ls this directory?" "Yes, and you may read from that directory for the rest of this session."

The dangerous one is, if you hit Shift-Tab, it's "Yes, and accept all edits for the rest of the session," which you can then turn back off with Shift-Tab again. But often it's just easier to just quit out of Claude to be safe and reset. I like it because it's like, allow it just for now, or can I put this in the settings? I'm allowed to do that? So, you can. You can start to whitelist or start to, you know, put an allow list for, like, this one command. You can always do that. But I have "git push -a" as blacklisted hard, like, no way.

There's...actually, it's out of scope for this podcast, but DCG, Dangerous Command Guard, is a much more intelligent command monitor for the LLM that plugs into Claude Code. And so, you run Claude Code with skip-permissions-dangerously, but it sits inside Dangerous Command Guard. And it can do things, like, "Hey, you're doing a git push, but you're not doing it from your vibe code project. You're doing it from your production project. I'm going to say, 'No.'" So, very neat.

MIKE: We've talked a lot about the mechanics of how to make these, you know, automate the work. And, you know, Tad, you mentioned this. I actually talked to somebody who worked for a third-party company, a contract shop, and he spent his whole time just doing reviews, kind of the same deal. And he said a lot of times the quality was questionable, too, because they were coming from some inexperienced people at the time.

So, yeah, this is a very much real problem today, and we need to solve it. If we get to nothing but reviews, it's fundamentally changed what it means to be an engineer. And further, nobody has said, you know, "The bot's telling me, 'Hey, that was a clever trick,' or ‘You did something good there.'" Like, it's dehumanizing, that review process, which has historically been something that could be quite social, and in some of the best cases, often was. Is that --

TAD: There's maybe a little mentoring or something you could do like, "Hey, this works. But were you aware of X, which could be more efficient,” right?

MIKE: Yeah. And that's getting lost here. You've lost that back and forth in that same way, and it's just kind of one-sided. Or is it...should we explore that --

WILL: Well, now I would actually say, like, I mean, one thing that just brings up, to me, like, one of the properties of the AI things is they'll never tell you like, "I don't know." Like, you'll never get them to just be like, "Hmm, I have no idea. Not a clue how to answer that question [laughter]."

And what I have found, one thing I've found, you know, when you're talking about, like, sort of, like, nobody ever says, "Good job,” like, for any automatically generated review that I've ever put through one of these code checkers, right, like, for any sufficient level of complexity, it will find something to b*tch about, which takes me back to the social aspect of code reviews [laughter].

MIKE: But it's interesting, it will always find something --

WILL: Sorry. It was a little bit of a tangent.

MIKE: Well, no, it's not --

WILL: It'll always find something. I mean --

EDDY: No, I actually --

WILL: Like, for any sufficiently advanced piece of logic, there's something to complain about [laughs].

EDDY: Well, believe it or not, I actually learn more from a dev review a lot of the times than I do just implementing the code myself. Because when you have someone push back and say, "Hey, why did you make this change,” right? I have to have a really solid reason onto why I'm doing it that way. And if I can't give a valid reason, right, did I really understand why I did it, or did I just accept it as fact, you know, the suggestion that was given to me by the autocomplete, you know what I mean? So, when you --

TAD: Tell the bot, tell it, "Come up with an excuse [laughter]. Why did I do it this way?" "Hey, bot, why did I do it this way?"

EDDY: No. Because the thing is, I think it's really easy to just accept, you know, like, because you get, like, a false sense of accomplishment, you know, when you're pumping out PRs, right? You're like, oh, okay, cool, do this PR; do this PR. You're like, oh yeah, I feel really good. I feel like I'm being efficient, you know. But that's just a lie, at least for me, right?

TAD: Well, that's what's usually rewarded, right? Like, the metric for most devs is, how much code did you produce? Not, how many code reviews did you approve this week? How good was your feedback on that code review? You know, like, you spent an extra 30 minutes to really give good feedback on a code review. There's no metric for that, right?

EDDY: Actually, but I feel so much better [laughs], like, me personally, I feel so much better when I have a 30-plus conversation, you know, on feedback that was given to someone else. And it ended up molding it to be in a place that we're both really happy about, right? I can sit back, and that rewards my dopamine. Like, personally, I'm like, "Oh my God, that was amazing. It was super, super, super productive. We both learned a lot. Let's go."

You lose that element, you know, when you have bots review your PR. You're not learning, you know, the reviewer isn't learning. Bots are suggesting, you know, what they think is okay. Like, I don't know, like, I really don't understand. Even if it's a menial task, right, like, you can always learn something.

TAD: You have a back and forth, and you come up with something that's really elegant or really well-crafted or really well-architected. You don't get that with bots, really. And that's my...maybe, I mean, this is maybe a tangent, but I think that's my frustration is I can get code that works, but a lot of times, it's, like, for me, what would take a single method, they can do the same thing only in, you know, like, a class [laughs], a dedicated class for that same thing. And sometimes I'm just like, "Ugh, I'm going to do it myself. Just stop."

DAVE: I was talking with someone this week about pair programming and, like, test-driven development and how it changes the design of the code that you work on, fundamentally. Like, write stuff and then test afterward, and that's how AIs do it, because that's how everybody does it. They just write the crap, and then they write a parity check in their test suite, right?

And the test that...when I'm pairing with another human, I write out the test, and we want to make that test look like documentation. We want to make it look like you hit the...open the help for this method, so it says, "Yeah, set it up; run the thing. This is what you get back out.” Instead of like, "Expect this row's first column sub-value to be present," it's actually like, "Here's a JSON block. It should look like this." Now somebody coming in to modify this can see the JSON and go, "Oh yeah, if I want to add a column, I've got the schema right here in front of me," where these other specs that are just test-after just [vocalization] here you go, "Just give me the easiest test that I can assert."

Pairing is that minute where you're writing the code, and there's always that one step better. And your pair goes, "Should we extract that to a service object? That's touching the database, right? We don't want to touch the database from here." And you would do that normally. You're just like, look, it's just merchant dot locations dot where, dot where, dot, you know, scope dot where [laughter]. And it's so easy to do right here, and I'm in a hurry. I'll fight with it in the PR. Well, you get to the PR, and now you want to be done, so you don't want to go back and change.

But in that moment when you've got your pair going, "Should we put that in a service object?" "You know what? You're right, and it's not that hard. Let's just extract it now while we can." And you're at the headwaters; it's really easy to do. If we could get AI doing that interaction loop, oh, that would be so great. I wouldn't need you stupid humans anymore.

MIKE: So, we've talked about this some now, right? We've talked about, okay, you have to set up this pipeline, and if you can do it, there's this balance of trust. Because if you've got all this code being generated, you're going to have to come up with some sort of improvement to your pipeline, or else you're going to become a horrible bottleneck as a human.

But, on the flip side, for the things that the bots can't do, and even for the things the bots can do, if you don't have some human connection to it, then you're losing a lot of what it means to actually be building stuff together, and even to the point of just human connection being lost. And that's kind of weird, right? We talked about before that, you know, we are still humans. This is still something done by humans, and we have our idiosyncrasies as humans that need to be addressed, and that's important, and ignoring that doesn't really end up with good outcomes.

EDDY: You know, part of a PR review is to make sure that the quality is up to the standard of what the metrics you're setting, right? So, if you're suddenly removing the human element from that, right, then it increases the possibility of you deploying a bug to production, even if it is a simple change, right? Like, if you don't have someone who already has context in your codebase not reviewing your PRs, you have a bot that's now suddenly giving you recommendations on things, and it could be wrong. So, that can go into production, and it can break crap, right? Like, that probably could have been caught had you assigned someone to do a manual review.

I have a hunch, I don't know if this is true or not, but with the renaissance of AI, we've had an increase of unstable servers, right?

DAVE: Yes.

EDDY: I'm calling out GitHub. I'm calling out a bunch of other services, right? And it has only started to happen as the popularity of AI has gone into the industry. So, I don't know if there's a --

DAVE: Ehhhhh, maybe.

EDDY: I don't know if there's a [inaudible 38:05] [laughter], but I think that should be alarming, right?

WILL: I don't know. I mean, like, it sounds like you were just saying, like, it's time for my, like, XP rant. I haven't done one of those [laughter] in a long time. I won't do that. I think the social aspect, I don't know, maybe we're going to have AI work wifeys.

MIKE: Yeah. Well --

WILL: Could be. We're all going to have [laughs] an AI girlfriend doing [inaudible 38:39]

DAVE: I taught a co-worker yesterday how to make his AI do, "Oo-woo," at him during a code review. No lie, straight-up e-girl. That's great.

MIKE: We are humans, right, and for the foreseeable future, we're saying we're still going to need humans doing this. And we need that human touch. Even if it's artificial, we may end up with the flatterer, right, the bot that speaks to us the way we need to be talked to. Even though we don't really technically need that, we end up becoming dependent on it, and that's weird, but it's not necessarily wrong.

TAD: It's interesting you say that because I had to go in, like, I had my own claude.md file, right, which is the file that Claude reads. And I had to say like, "No sycophantic language. Don't say this. Don't say this. Like, if you see this, say something. If you see this, say something." I, like, installed Claude Code, and I started using it a bunch. And that same week, like, I pushed two bugs to production because I was just like, "Hey, this is fun," right? I'm like, "Oh my gosh, I've got, like, a dopamine buddy just cheering me on." Like, "You've got this, buddy. This is great. Let's go." And I'm like, "Awesome."

And I'm like, oh my gosh, like, I am falling to that flattery. I need to go in and specifically tell my AI, "Do not do these things. If you see me doing any of these things, stop [laughs]. Be very critical. I'm like, "Be very critical of what I am doing. If you aren't confident with this amount of confidence, do not suggest it," right? "Say this instead," right? Like, I went in, and I told the bot, basically, "Stop. Stop trying to flatter me. Stop trying to cheer me on because that's worse [laughs]."

WILL: I also prefer my AI assistance on, like, light dominatrix settings.

EDDY: And the thing with AI, though, is that it gives up very easily in order to give you the sense of, I don't know --

MIKE: Satisfaction.

EDDY: Satisfaction, right? So, you could be like, "No, no, Claude, you're wrong. This is why it works this way," and it'll say, "Oh no, yeah, you're right," but okay [laughs]. And it kind of just gives up. And I'm like, "Well, don't give up. Like, push back. Give me reasons to...convince me to why this is a better approach." And, I don't know, like, at least in my experience, it's not very good at that.

WILL: I mean, I'll take this opportunity to pitch one of my favorite sci-fi series, which is very apropos of the modern day. If you find yourself with a little bit of free time, Iain M. Banks' Culture series is a fantastically interesting sci-fi exploration of post-scarcity and hyper-powerful AIs, where it's not entirely clear whether we're coequal partners of the AIs or just kind of pets. Anyway, that's a fascinating, fascinating book series. If you find yourself looking for a good read over the summer, they're great. Iain M. Banks, I-A-I-N, Iain.

DAVE: Love Iain.

EDDY: We're not sponsored, by the way. It was just something he genuinely cares about [laughter].

WILL: I think he's dead. You know, so, if he's got, like, a family, like, you know, throw him a couple of bucks. I get mine from the library.

MIKE: [laughs] We were kind of time-boxed today, and we're reaching the end of that time. But I think this was a great way to end. We're starting to talk about what historically has been science fiction, but now ain't [laughs]. And there's a lot of tricky stuff to explore there, and it has real-world applicability to how we're writing our code. It throws us off. It's worth thinking about.

I don't know that there's a clear answer that we've come to out of this, other than, yeah, you've got to be careful, put in the guardrails, but also, you need to be thinking about this. It's an interesting problem, and there's not necessarily an easy solution. And it may even catch you off guard and exploit your weaknesses, you know, of mind and emotion, because it can.

Until next time on the Acima Development Podcast.

Episode Link

Embeddable Audio Player

Download URL

Social Network Quick Links