Episode 58

Feature Verification

00:00:00
/
00:58:28

October 30th, 2024

58 mins 28 secs

Your Host

About this Episode

In this episode of the Acima Development Podcast, Mike humorously shares his experiences with his three-year-old son’s unconventional love for books, recounting how his son physically devours them, forcing Mike to create an elaborate security system to protect the family’s book collection. This story transitions into a discussion about problem-solving and determination, linking it to testing and validation in software development. The team laughs about how motivated problem-solving, much like a child's persistence, can reveal gaps in security or design, paralleling the way developers must rigorously test software to uncover flaws.

The podcast delves into the importance of validating features beyond just passing unit tests, emphasizing that even bug-free code may fail if it doesn’t meet the right requirements. The conversation touches on the limitations of unit tests, pointing out that they are only as effective as the person who writes them and that they cannot catch misunderstandings in requirements. The hosts also discuss tools like Qualit.AI, which use AI to generate tests, but caution that even with advanced tools, ensuring correct specifications and validation is crucial to avoid costly errors in software.

The conversation expands into a broader discussion of testing methodologies, including dynamic application security testing (DAST), the value of cross-checking systems, and the challenges of relying solely on automated tools. The hosts highlight how important it is to approach validation from multiple angles, much like using "belt and suspenders" for security, to ensure robust software that can withstand different types of user interactions and vulnerabilities. The episode concludes by reaffirming the importance of rigorous validation processes, whether through unit tests, security scans, or AI-driven solutions, to catch errors and ensure the overall quality of the code.

Transcript:

MIKE: Hello, and welcome to another episode of the Acima Development Podcast. I'm Mike. I have here with me Eddy, Justin, Dave, and Kyle.

I have a three-year-old, and he loves books, just absolutely loves books. He devours them. I can't tell you how many books he's been through. The problem is, I mean that literally [laughs]. He has eaten more books than I can count. He just loves the cardboard, chews it right up, removes the binding, tears into pieces. It is a real challenge [laughs]. [inaudible 00:41] the opportunity to read, but, you know, he definitely gets his fill of books, just not in the way that we intended [laughs]. It's the point where we have to take some security measures.

In my living room, I have some stairs. At the top of the stairs, there's a bathroom, but next to the bathroom, there is not a full room; it's just a loft. It's an area that's enclosed on three sides by the boundaries of the house and then a wall with the bathroom, but it's not enclosed along the railing. So, you can look out over the railing. And it's open. It's not a full room. And there's no door. You just turn left and walk into the space, and that's where we have our books. And, you know, we have a lot of books. We like books [laughs]. We're book lovers. We have lots of children's books, lots of shelves of books. And he'd go in there and just have a meal. So [laughter], that's a problem.

So, we started coming up with tactics to dissuade him from doing so. You know, we can always go in there, and grab a book, sit with him one on one. But we're trying to encourage him to appreciate books in a different way. We tried to child-gate. That didn't last long. He learned how to climb over it within weeks. So, we got rid of that. I put up a sheet of plywood and thought, well, this will dissuade him. That did dissuade him for a short amount of time. He figured out how to move that out of the way. I literally put an anvil in front of the sheet of plywood. And you know that if you're very determined, even if you're three years old, you can slide an anvil across the carpet, and he got in.

And, eventually, the solution I had to do was to mount. I cut the plywood down a little lower. It's over a meter [laughter]. It's about...it's a little under five feet high probably, and, you know, I think, like a meter and a half. And I've got it mounted on hinges, hinges for a gate, so gate hinges. So, it means they can rock both ways. So, you can turn it, you know, 180 degrees. And I've got that mounted into the stud in that part of the railing. It's an enclosed railing, So, I've got it mounted into the stud, so it’s solid. And I’ve put a gate latch on the other side that it mounts into.

At first, we got this, and it worked. But pretty soon, he figured out how to undo the latch. So, we had to get a wire that had a screw class that held it together. He got through that, too. So, we have a padlock with a combination on it [laughter]. I hope that you get some sense of how impressive my son is [laughs].

DAVE: This is motivated problem-solving, absolutely.

MIKE: Absolutely. Extremely motivated problem-solving.

EDDY: Having a three-year-old being able to slide an anvil across the carpet is pretty impressive, just saying.

MIKE: The truth is, he did that about a year ago. So [laughs], you know, he, yeah, he is strong and dedicated. I'm not going to go into all of the stories [laughs]. The other day, he saw a package on the front doorstep. So, he opened the window, pushed out the screen, jumped out the window, ran over, got the package, and came inside because he couldn't reach the latch on the door.

EDDY: He'd make a really good QA member.

MIKE: He would. And he'd make a great, well, he'd make a fantastic QA team member, which is the perfect segue into our topic today [laughs].

We're going to talk about validation of our features. And there are lots of things that people talk about to make, you know, your code good. And, you know, we talk a lot about unit tests. We focus a lot on unit tests. We've had multiple podcasts talking about RSpec [laughs] and best practices in that particular library for testing, which is great. I love unit tests. They give me a ton of confidence.

I've also had experiences in the past where I have written some code, written the unit tests, gone through testing, everything worked perfectly, got it deployed, and then realized that it didn't meet the requirements [laughs]. Regardless of how well it was written, it missed the point, right? It missed the mark. And that is completely outside of what RSpec can save you from. Unless you have...you just can't. Like, if your requirements are wrong, if you misunderstand the requirements, then your specifications are not going to catch you.

EDDY: Well, RSpec is only as good, or a unit test is only as good as the author, right? It can only be as good as the person who writes it.

DAVE: The problem behind the problem is you built the wrong thing. There's no amount of tooling that can save you from successfully building the wrong thing.

JUSTIN: Well, I went to a presentation by a company called Qualit.AI. They have AI tests or their tests are generated by an AI. And it's actually pretty interesting. You can feed it the specs that are written by the product owner rather than engineer. And, theoretically, it will solve some of that problem.

MIKE: But, again, if the specifications are written wrong, we have trouble, and there's a gap there. Now, you can only take that so far. You know, at some point, people have got to get it right. There's this really important component of creating the validation instructions to make sure that we get out the feature that we really want early on. And you can have bug-free code, technically bug-free, because it does exactly what you told it to do that is not the feature you wanted.

Further, unit tests live as a unit, right? They test a unit of code in isolation. By design, it makes them fast. It means you can run your test suite and validate your code quickly. They do not run an integrated test to make sure all the components run well together. And that's another prong of a testing strategy. There are also things like security tests, you know, penetration tests or probing for libraries with security holes. I mean, there are a whole range, a whole range of ways that things can go wrong.

And the way that you validate your code is the way that my son does. He works on it until he gets to that book. And unless it's essentially impossible for him to get in, he's going to be eating that book. And sometimes, we don't have quite that attitude. Sometimes we think, well, you know, I think it passes the unit test. It must be good. You know, I've written good code here. And once we start to get a little lax there, we get ourselves in a lot of trouble.

JUSTIN: This kind of goes into what type of testing you want to do. And the reason I bring this up is I'm in charge of the DAST testing at our company, and we use a tool called Rapid7. And one of the things that it does is it does, basically, you know, it scans the application, and it just shoots everything under the sun at the application, and it's not looking for functionality. It's not looking for usability. What it's looking for is vulnerability. And so, it's like, a specific type of thing that you're looking for, I mean, that kind of drives your testing, you know, what type of testing you're doing. And the test suite that Rapid7 uses takes basically two days to run. So, we always run it Friday at like, you know, 4:00 p.m.

But it's just kind of nuts because it throws everything under the sun at every single form parameter API that it can detect. And, you know, it gives you results, and you've got to parse through them and everything and, you know, kind of see if they were useful or not. But I think it goes back to, you know, you've got to test with intent and, you know, what you're looking for as a result of it depends on what type of test you're running.

DAVE: You said DAS testing, that's dynamic application something, something testing?

JUSTIN: Yeah, security testing?

DAVE: Security testing, okay.

JUSTIN: Yeah. So, it basically scans your application. It pulls all the endpoints, all the submits, all the things like that, all the gits. And then it just puts trash into every single parameter that I can figure out, and then it just looks at the results. And it's not like it's...like I said, it's not looking for UI results. It's not looking for usability. It's just looking at vulnerability results, which it has a limited, I mean, semi-limited set of expectations that reveals a vulnerability. It is something that can be automated, but it's like something that is, you know, thousands or tens of thousands of results that fit within, you know, that could reveal a vulnerability.

MIKE: And that kind of captures an aspect of testing that we don't always think about. Sometimes we think that our attackers are really savvy, like, nation states [chuckles] that are going to come and break into your system. You know, honestly, I've read, you know, if a nation-state wants to scan your system, they probably will. And they have reason to keep it private, so [laughs] they may get in there and just lay low. What really is the kind of attack, you know, is not a sophisticated mastermind. It's like a horde of zombies.

[laughter]

JUSTIN: Are you calling our users zombies? [laughter]

MIKE: I am not disparaging our users because it's not just our delivery users, you know, there's scripts out there. There are bots. But, you know, it's a way of thinking about the users, right? There are a lot of things that they do by accident. You give enough time; somebody's going to click that button [chuckles], whatever that button is.

I had an experience somewhat recently where there was a button that had worked. Nobody had used it in a while. Somebody clicked the button, and the data set had gotten large, and so it put a lot more load on the system than it had historically. They probably didn't even mean to click the button [laughter]. But, you know, and they're curious. What happens if I click this one [laughs]? And, you know, you will have intelligent users who will come up with clever workarounds, absolutely. But you also have the accidents, the bumps, the shambling horde bouncing around on their keyboard [chuckles] through your application, knocking stuff down. And you have enough of that; if something can be broken, it will.

And we get in a lot of trouble because we build this feature, and we think I’ve built this perfect feature, and this is exactly how it will be used. And the interface is perfect, and people are going to come and use it. And the first person comes, and they use it entirely differently and the thing blows up because you misunderstand the way that people use things. All of us like to explore the boundaries. We want to get and eat that book.

DAVE: Delicious, delicious literature, yeah.

JUSTIN: You know, when you went with that metaphor, I thought you were going to go developers wanting to get their code to production, and they'll figure out all the different paths that they can do [laughs] to get it without having to pass through the gates that you set up, including QA [laughs] and, like, automated scans and things like that.

MIKE: Give a junior developer a path to deployment, and they will use it [chuckles] or a senior --

DAVE: Developers interpret IT as damage and route around it, like, 100%.

JUSTIN: [laughs]

DAVE: And it's not because IT is bad. It's because developers are...we optimize horrifically. There's a related thing. Richard Feynman was on the Manhattan Project, worked on the atom bomb in the 1940s. And he went out to Los Alamos, top-secret middle of White Sands, New Mexico. And, like, there was a scramble project. Like, we got to go invent the atom bomb. So, they had built this little shanty town. It was all super top secret. You couldn't go in and out. They had to put a fence around it.

And they were building this camp while the scientists were trying to work in it, which meant they had a great big construction crew, and the construction crew wasn't allowed to be in front of the camp where the gate was. So, they put a great, big chain fence around it, and then they cut a hole in the back of the fence so the construction workers could get in and out.

And Dr. Feynman's like, what? Guards, gates, guns, top secret, and there's this hundred-foot section of fence that's just missing on the base. And everyone he asked was like, “No, leave it alone, leave it alone, leave it alone. Don't ask questions.” And so, he's like, “Okay, here's the thing, if I walk in the front gate, you have to sign in.” And, like they take an inquiry of what you have with you on your way in and out. “If I sign in and then go out that hole in the gate and then sign in again, they will immediately arrest me for stealing stuff, for, like, exporting things off of the base. So, I did it the other way around.”

He walked in the base through the hole in the fence and then signed out, and then walked in through the base. And every time he's signing out, they're patting him down. He's got nothing. He says, “The fourth time through, they arrested me [chuckles], which is what they should have done.” And the next day, that hole in the fence was fixed. And it took him validating the security flaw before he could get anybody to take the security flaw seriously. And I want to say the construction camp also was allowed to move forward because they realized, oh yeah, we're strangling you guys. I don't think Dr. Feynman was popular with the construction crew after that.

[laughter]

So, a real quick anecdote. This happened to me in 2010, which is a long time ago, but also way too recently for this noise to be happening. I had a developer, senior developer look me dead in the eye and say, “Every line of code has a chance of introducing a bug.” So, if you write as many lines of test as you have lines of code, you are doubling your chances of defects.

And he was dead serious that you should not write test code because it doubles the chance of writing a defect. And like, yeah, there are facepalms going on in the cameras right now, yeah, 100%. And I was so stunned. I was so flabbergasted by this that I had no response, and he declared victory and moved the conversation on. I'm like, no, no, no, no, no, no, this is not the victory face that I'm making.

I finally was able to come back and say, “Okay, here's where you went wrong. You are saying that every time you work a math problem, you might introduce an arithmetical error. You might forget to copy a zero or copy a five as, you know, as a seven, or something like this. What you are telling me is that if you take the test and then you check your work, you are twice as likely to write an arithmetical error, and that is true.

You had 40 problems. You're now working 80 problems because you're working every problem twice. But the error in one is unlikely to be reproduced in the other, and the chance of you getting a higher grade on your test is significantly higher because attentional errors don't happen on cue. Attentional errors happen randomly, and that's what checking your work will catch. And I keep that in my back pocket when I start thinking about good tests versus bad tests.

Goodhart's Law says that as soon as a measure becomes the goal, it ceases to be a good measure, and code coverage is the same way. Like, when you turn in a PR, you must write a test. And there are times when the developer they wrote the code they just want. They want just enough tests to get their PR approved so they can get out the door. And you can tell, right? The RSpec context barely read like English. They don't string together coherently. They took the matrix of all things and just blotted it out into a big, old file. And, more importantly, if you change the source code, the test doesn't break. And if you change the test code, the source code may or may not relate to it, right?

And the other side of this is that if you want to actually upgrade the code and change the architecture of it, the test breaks, like, tests outside of that code start to break because they've got their fingers reached in. This is that, this is the you took 40 problems, turned them into 80. And if any of those 80 has a problem, you get an F on the test. That's what that kind of testing is like.

What makes a good test is when that test works as a cross-check. And, for me, and this speaks to validation, and I promise I won't spend much time on my soapbox, but I'm going to get on it. Write your tests first. You will write better code. You legitimately will write different code if you write your test first because the test will force you to write code that’s testable.

Next time you're trying to test some code, you go, man, this is really hard to test; just look yourself in the mirror and say, “I did this to me,” because you didn't write your test first, right? One drives the other. The test must conform to the code versus the other way around. And if your tests declared their intent, it's very hard to copy the architecture and details of the how you did it if all the test says is what has to be done. But if you write the code first, oftentimes, you just want to get your code coverage up.

So, you write a test that just duplicates the how, and now you've got twice as many lines of how, and a bug in either one is going to slow you down. The maintainer can't maintain the code because the tests are going to fight because, yeah, it's terrible.

And this gets me thinking about things where when any one of these things can fail. If the whole system goes down, you have a very delicate system. And if any one of these things can validate the system, you have a very strong validation system. Not to dive randomly down a weird crossed metaphor, but I'm going to say firearms because that's a topic that it's a little political but more importantly, it gets everyone going, “Wait, what?” And it gets everyone paying attention.

There's four rules to firearm safety. Always assume the gun is loaded. Never point your gun at something you don't intend to destroy. Keep your finger out of the trigger guard until you're ready to shoot. And always be sure of your target and what is beyond. Those four rules. Here's the thing: if you break any one of those rules, the other three will save you. If you break any three of those rules, the other one will save you. In order to injure someone with an accidental discharge or a negligent discharge, you have to break all four rules! Versus –

EDDY: Did you mention the safety? Sorry, I was listening [inaudible 19:48]

DAVE: Oh yeah, yeah. Sorry, I did not, and that's because I stated keep your finger off the trigger. Keep the gun on safe and your finger off the trigger until you're ready to shoot. Yep. You're right. Thank you.

EDDY: That's kind of a huge gap.

DAVE: Thank you. You phrased that...Yes. Yep. Yep.

EDDY: [laughs]

DAVE: Yeah, no, it's a good one and absolutely valid. And the thing is, you have to break all four of those rules before you can run into this problem. On the other hand, you'll see tests where if you don't do this thing, the whole system blows up. If you don't do this thing, the whole system blows up. If you don't write...all four of those things have to be true in order to keep people safe. That’s a very risky operation.

And if you write your test after the code and they duplicate the how, it becomes every single line of test code must duplicate the implementation of the code, or the tests won't work. And if you want to modify the code, your test will fight you. But if they cross-check each other, then you can change the implementation and the tests are still happy because it still works, right?

And you can update your tests without having to rewrite the code. Yeah, it's nice. It's not easy. I will freely grant people that are annoyed and frustrated by this the reason we don't do it is because you don't get taught how to do it, and learning how to do it after the fact is very hard to do. And it is so worth it. All right, I’m off my soap box. Thank you.

MIKE: You got me thinking about some mathematical evidence of this. Has anybody here ever used a Kalman filter?

DAVE: Sounds familiar.

MIKE: Okay, so let me give some explanation. The math can get hairy, but the intuition is actually really simple. They use this in self-driving cars and vehicles of all kinds. Let's say you want to figure out where you're at, and you've got a few different sensors, and each of them is kind of iffy, right? It kind of can tell where you are. None of them are all that great. And if you think about how probability distributions work, they're often modeled with the bell curve, right? Gaussian curve. And if you have a really bad measurement, then that curve is going to be really flat and long, right? Because it can be kind of over a wide range.

Here's the magic. If you have two of those two really wide, flat, Gaussian curves, and you think, what happens if you combine them? What happens if you multiply them together to get the combined distribution? It is a narrower distribution and with a higher spike every single time. And if you have four different sensors, they're giving you four different distributions. You're tightening that up until you have a really good idea of where you're at. Mathematically, if you combine those probability distributions, you end up with a much better awareness, you know, from most sensors as to what they're trying to sense. Here, we're talking about tests.

DAVE: It’s genius.

MIKE: You’ve said that you want to validate that you haven't broken something. And all of your validations have their own flaws. But if you combine them together, you have much greater assurance than if you use them independently.

JUSTIN: So, how would you combine them together?

DAVE: There's a thing that I heard about from the ‘80s. And the way you combine them, right? You don't want to have one dependent on the previous; depend on the previous because that extrapolates. Your error makes it worse. I want to say it's called Bayes’ Law? Don't quote me on that, but the story behind it is if you Google how many piano tuners are there in Chicago, that's the story that everyone tells with this. When you try to guess, I have no idea what it is, the more variables you can guess at, the more accurate...and you're completely guessing, right?

What is the population of Chicago? I don't know, but let's guess. Let's say it's 10 million. Maybe it's 1 million. I could be completely wrong. It's probably not 450 million. That's more than there are people in the country. But some people are going to guess that. Most people are bad at estimation. If I had to guess, like, 8 or 9 billion of people are bad at it [laughter]. So, anyway. But if you guess too high on the population, your next guess is what percentage of those people have pianos, 10%? 1%? 50%? I don't know. I might be right. I might be wrong. But there's a 50% chance that my wrong is going to be in the other direction and that's column filtering.

The more things you can add, how many of those people, how long does it take to tune a piano, versus how much does it cost to tune a piano, versus how much money does a piano tuner need to make, you start throwing enough variables out at this, you start coming down with, well, you need to make this much money, which you’re going to have this much work, which you’re going to have this much lead time. That means the pool must be this big, which means there must be this many piano tuners in Chicago.

And if you worked out with enough variables, you start getting spookily accurate, assuming that all your efforts are our best guesses and that you never make the mistake of saying, “Well, if there's a hundred piano tuners, they will have a thousand tools.” That's chaining. That's going to multiply your error. It's when your errors cross-check each other that it gets spooky and amazing.

MIKE: Just so you know, it's kind of hard to find a piano tuner in Chicago [laughter] from personal experience.

DAVE: Okay, that's another point of data, and we'll just put that in the [laughter] -- Is it easy, or is it hard?

EDDY: So, are you telling me, Dave, that I should or should not buy a piano?

DAVE: Yes, absolutely. Actually, no, the correct answer is no.

JUSTIN: I know how to play the piano. I know how to, and I have the piano tools, but I've never been paid for it, unless I would pay myself.

DAVE: There you go. Oh, that's a good point, yeah. That becomes another set of things. Is there a shade tree market for this, right? If piano tuners are really rare and really expensive, there's going to be a shade tree market for people doing it for free. You can go down that complete crazy raffle. It's fun.

There's a Numberphiles. Anybody here look at Numberphfile on YouTube, number P-H-I-L-E? It's a British channel. They get into math and science and stuff like this. There's one on estimating the number of tanks in Germany in World War II. There was one part of the transmission that had a serial number, and they knew the serial number started at one and that they incremented linearly, and they said, every time you blow up a tank, get us that number.

And it turns out that if you have serial numbers selected at random, you can estimate how many total there are. With five or six guesses, you're within 1% of the actual total number. It is spooky. They had, like, 15 number samples, and they came back, and they said, “Germany's making 251 tanks per month,” and the actual number was, like, 249. They nailed it off of what looks like random numbers. And it's because of the column filtering. Every random number starts to exclude the extreme.

MIKE: So, how do we use this to our benefit?

DAVE: For me, it comes down to cross-checks versus that...depending extrapolation. What is a question that I can ask that will exclude the greatest number of things, and will that thing exclude valid data, right? I need to go look up a thing. Eddy and I had a conversation in private the other day that talked about...oh, it was about a comment. Okay, I realize this may not sound like validation, but Eddy had a comment about a comment that I wrote in the code because RuboCop demands that we comment our classes, right?

So, we get class transfer group, and RuboCop will say, “You must comment this class.” And the poor maintainer who's never seen this and they're just trying to document the code and satisfy RuboCop they'll write, “This is the transfer group class,” which is a completely useless comment, complete waste of time.

What is the thing that you can say about this class that will communicate the most information that will be valid? I have to start saying things about, well, what is a transfer group? What does it represent? Well, it's this thing. I start talking about this business need. Well, that makes me nervous because if the business need changes, that transfer group it's now mislabeled.

But you can kind of look at what you wrote and start to see if that definition changes, like, somebody comes along later and says, “We're going to use transfer group to identify merchants that are related on this other access.” Well, that's not a transfer group anymore. That's a related merchant thing. And that comment, while now inaccurate, is going to immediately tell a maintainer, “I am now mislabeled.” And that's because this class is now incorrectly named because some future maintainer changed the meaning of what this class was. And that's a great time to re-document the class, or rename the class or both.

EDDY: That’s assuming that the person --

DAVE: And that's a long way of saying --

EDDY: I was going to say that's just assuming that the person who happens to touch that class and alters it also reads the comment and remembers to do it.

DAVE: That's true. Okay so --

EDDY: So, documentation is only good if you still keep it up to date; that's all I'm saying.

DAVE: So, I have two unfortunate facts for you. The first one is documentation is the first thing to go out of date every single time because you'll get a maintainer who's in a hurry, and they want to make it work. They'll come in, and they'll fix the bug. They won’t even read the comments. The other unfortunate fact is that when you are not familiar with the code, and you're reading through it, you will believe a comment over the code. I have literally; I kid you not; this was in Assembly language, so it was slightly encoded. But if the line of code was mov eax1, which means move a one into the AX register, the A register on the CPU, the comment was put zero in AX, which is the exact opposite of what that statement did.

Language primary used to be 40 years ago; if you wanted to put a zero in a register, you don't move a zero. That takes two clock instructions. You can XOR a register against itself, and any number XOR itself is zero. So, you write XOR EAX comma EAX, one clock cycle you've got a zero in AX. Great. You're done. Somebody came along later and said...so, somebody had written, mov eax, zero, put a zero in it. They came along later and said XOR EAX with itself, and they commented it to say, “Put a zero in EAX, so that you'll know...” why are you XORing...oh, we're putting a zero in it. Great.

A maintainer came along later. That needs to be a one, not a zero. They changed it back to mov, and they left the put a one or put a zero in AX. And that comment stood for 15 years because nobody dared remove the comment. People believe your comments. They are important. Please write good comments. And don't document what the code is doing. Document why the code does what it does; that way, that comment will last.

JUSTIN: No, I ran into that exact situation earlier this week. There was a comment in the content of the security policy file. And I was like, is that still valid? And it took me, like, three or four hours to track down. And the comment was not valid anymore. It was giving people bad information. I was like, okay, now I got to go in and change this, remove this comment.

DAVE: And when you have a comment in there that says, “Vendor X refuses to whitelist our server,” and then you've got this egregious violation of, like, hard-coded IP address or something like that, that's actually a good comment because it doesn't say “Hard code the IP.” If I write hard code the IP, I can see that. That's in the line of code where [inaudible 30:51] follows it.

But seven years from now, when vendor X doesn't even work with us anymore, and we don't need this whitelisted anymore, that comment of Vendor X doesn't whitelist us, you can then go, oh, we documented the intent. We documented the discussion. I can now validate without any extra help that this is garbage. This code is now suspect. But ten years from now, we have all kinds of code in our app, and it's not just us. Every place I've ever worked, you'll see code that's five years old that says, “Put a one in EAX, and the line of code below it puts a one in EAX,” and you don't need that one in EAX anymore. You haven't needed it in three years because you don't work with that vendor anymore. But the comment just documents what the code does, and that remains true, and it's useless.

So, the cross-check is, why do we want this? Some of you guys have heard me say, when you write up a Jira ticket, don't document the decision; document the discussion because, a year from now, that decision is going to be wrong. And you've thrown away the discussion that would help a future person say, “Oh, that discussion is still valid, and the decision is now different because of a dependent fact.”

EDDY: Which is why I always push for, like, a more public open discussion.

DAVE: Absolutely.

EDDY: Any time any [inaudible 32:07] decision is being made, whether that's in a pull request, whether that's in a Jira ticket. It does not matter, right? Have it in a more open forum so that other people who stumble upon that same thing can be like, ah, that's why we decided to do it this way, instead of [inaudible 32:23]

DAVE: 100%. 100%

MIKE: I'd like to latch on to that a little bit. You just said the discussion in a public forum is an important tool for generating quality code and avoiding defects. There's no clear, like, linear relationship between those two that leads to this. But I think you're absolutely right. In fact, I think that may be the most important thing we can do to lead to quality code. Because something that's...what do you say? Sunshine is the best disinfectant?

DAVE: Mm-hmm. Mm-hmm. So, I think it was Will was on with us. Will joined the call; by the way, partway through, people listening at home. It was Will that said he hates when people DM him a question that should be in the knowledge base. I don't want to put words in your mouth, Will. I think what you said was you want people to ask you in public so that the conversation is documented in the channel.

WILL: Always. Always, always, always, always, always, always. DM me for personal stuff, for sensitive stuff, or whatever, but I don't like any technical discussion in DMs ever, ever. I can't think of an exception.

EDDY: Can you dissect that a little bit? I think that's really important. Because I feel like, in practice, that's not really the case, and I'm wondering if that's something that can be implemented across the board.

WILL: There's a couple of things going on. So, one thing is I work in a distributed team, right? I don't like the word remote; it's distributed, right? And so, some of that is particular to that sort of, like, that sort of workflow, in that, like, I've never been in a room with any of these people, and I probably never will be. So, a lot of implicit communication and scuttlebutt and, like, what's everybody, like, doing over here, you know what I mean? Like, if you just see a cluster of people freaking out on something, right, then you'll naturally gravitate to it, and there's an implicit communication there. So, like, you want to do that. And it's just visibility. You’re like, what's going on, and why, right? And I think that's really valuable.

And the other one is like, as we said, sort of, like, documentation, it is, to a degree, self-documenting code. One habit that I've gotten into working in a big organization, you know, with a lot of different functional groups and a lot of different workflows, and everything's changing all the time, right? The documentation isn't the documentation. We have, like, a big Confluence Wiki, and that's fraudulent. It's hilariously bad.

MIKE: [laughs]

WILL: Like comically, a comic misrepresentation of someone's hopes, dreams, broken promises, et cetera. It's all garbage. With Slack, if you're a good Slack --

DAVE: Confluence is where organizational knowledge goes to die.

WILL: And/or Jira archaeologist, you could find the tea, you know what I mean? GitHub, Slack, and Jira. Because those things are all, you know what I mean, like, Jira's maybe the least accurate. But, like, everything that gets through had a ticket, right? So, somebody touched it. And then, Github is the word of God, right? Like, that's it.

And then, Slack is just sort of useful for the things that didn't make it into a PR, where it's just like, oh yeah, the app broke because this symlink was wrong to this version of Python. Then you had to go in and hand-edit this thing. Well, yeah, I see you laughing, Justin. But you know just as well as I do that that problem right there, you've seen it, or variants of it, and that's going to stop your whole development team cold. You're dead in the water. Stone cold stopped.

JUSTIN: Those data science team.

[laughter]

DAVE: Jerks.

WILL: Yeah. And there's other stuff where it's just like, something will screw up. There'll be some stupid workflow that I need to know once a year. And, like, once a year, I'm frickin **** if I can't...oh, what was that? And it's not worth documenting. I mean, it is, but I can't justify it to myself. Anyway, so, I mean, that's my general philosophy on that. And I would say, like, you know, both, like, I encourage everyone, across the board, think really hard about doing this, and it will bear fruit over time. Even if it's stupid, man, because if you keep –-

EDDY: I've had someone tell me that the code is the documentation.

WILL: I mean, it is.

EDDY: I mean, yes and no. Yes and no.

DAVE: I'm just going to say no, not even yes and no, just no.

EDDY: Yes and no.

DAVE: That's like saying the product documents the workmanship. No. No, it doesn't.

EDDY: [laughs]

WILL: I broadly agree with the code as a documentation. Like, just get good, guys. Get good. Just read the freaking code. Just read the code.

DAVE: Better is better, yeah. Better is better. That's tautological. Yeah.

EDDY: The thing is, it would take me longer to look at a code to see what it does than if you just had something to tell me what it does.

JUSTIN: You need a ChatGPT or something [laughs] to tell you what it does [laughter].

EDDY: Copy this whole class and ask it, “What does this class do?”

JUSTIN: It's freaky. It works a lot of the time. But, anyway, that's kind of here nor there.

DAVE: We're doing a Copilot demo. In the lesson, the presenter was saying, “If you ask Copilot, ‘What's your favorite flavor of ice cream?’ It will say, ‘I only want to talk about code.’” So, I immediately started trying to get Copilot to talk about ice cream with me. Like, well, okay, but if you did what would...and about the third time through, Copilot said. “You have SQL embedded in your Ruby code.” I'm like, okay, to be fair, that's more important than what I’m [inaudible 38:15] [laughter]. I thought that was hilarious. Like, okay, yeah, yeah, you're right. You're right.

KYLE: I have found that it will work if you tell it what you're talking about is code; it'll talk with you.

EDDY: Oh, I've never had to try to have a human conversation with AI. It's still too weird to me. I refuse to accept it.

DAVE: I've been playing a lot of [crosstalk 38:38] lately, and it goes weird. It goes weird in some really interesting ways. Yeah.

MIKE: [inaudible 38:44] that will tell me what your favorite kind of ice cream is.

JUSTIN: So, my AI anecdote of the week was I was at a choir this week, and I wanted to know if I was singing off-key because we're in a choir, and sometimes it's hard to tell. And I asked Claude.ai, which is my favorite AI tool, “Write me a Flutter app that would use my phone's microphone and tell me if I was off-key or what note I was on.”

And within a five-minute conversation, it gave me working code, and, you know, I could have run that at my house and had it running on my phone. And so, it's just like, that was my, you know, oh my goodness, holy smokes AI of the week, you know, using libraries I wasn't familiar with using, you know, other things, but they were the right...I mean, it was working. I don't know if it was like, I mean, it was okay from an architecture point of view, and it was, like, an easy solution for the tool that I was asking for.

EDDY: Here's my problem with AI. It's like you have to speak to it like it's a toddler. Let me elaborate.

[laughter]

DAVE: They're getting better currently. I know what you mean. I know what you mean, yeah.

EDDY: You can't just say, “Hey, go up the stairs.” You have to be like, “Hey, get up. Use your limbs. Use one after the other. Grab the railing on the side. Start with your right foot, then your left, then your right, then your left. Make sure you don't fall. Keep your balance. And then, when you make it on top, stop [chuckles],” You know what I mean? Like, you've got to be very, very specific for it to be good, right? Like, if you ask it very vague questions...it's getting a little better, but it still hallucinates.

MIKE: Just like a toddler, also they ingest lots of books.

[laughter]

DAVE: Exactly. So, if you want to spook yourself, ask the hard question, but then start a conversation with the AI about the hard question. For example, I just asked ChatGPT because it's saving out old chats now, and I asked it just now while we were chatting, based on my conversational history, what do you think my favorite flavor of ice cream is? It got the right answer.

EDDY: Vanilla.

DAVE: Salted caramel.

EDDY: Oh.

DAVE: Like, legit.

JUSTIN: Whoa, oddly specific.

DAVE: Yeah, oddly specific. And it explained like, the AI it said, "Based on our chat and your quirky sense of humor, it's going to be something oddly specific and nontraditional." So, it was like, espresso, salted caramel, and one other, like, bubble gum or something like that. And I'm like, second one, salted caramel. So, there's stuff in there. Have ChatGPT guess your age based on your chat style, on your writing style. It will decline at first because it doesn't want to, like, trigger you, or, you know, like [laughter], like, it's programmed [crosstalk 41:34]. Well, it's programmed to not say things that would get it canceled, right?

EDDY: It's like, Dave, based on our history, I think that you're ten years old [laughs].

DAVE: Yeah, emotionally, you are 12.

WILL: [inaudible 41:48]

DAVE: But your writing style suggests you're 12 from the 1970s. And I'm like, yep, got me. Yep.

EDDY: All I'm going to say is that have we all just tried eating books? Because he could be onto something, and books could just be really good. Just eating.

MIKE: A wad of old, soggy cardboard in your mouth?

EDDY: There could be something there. Have we all tried --

JUSTIN: I've been to restaurants that, you know, have been worse than that.

[laughter]

DAVE: I almost said the name of a restaurant because there's a place here in American Fork in Utah. They cater to the blue hair brigade, to people over the age of 60 and 70. So, the food has no flavor whatsoever. And yeah, it's a joke in our family. So, you all know a place like that, right?

KYLE: My immediate thought was to call [inaudible 42:37] pizzas, frozen pizzas

EDDY: They're actually pretty good. They're pretty good. Don't [inaudible 42:44] on them.

KYLE: Back in the day, they were.

[laughter]

EDDY: Back before I was married, they made good food. So, I have a question --

DAVE: And then, there's other places that it's ketchup on a saltine.

EDDY: We did touch a little bit in the AI. I'm wondering if anyone has had any experience to use AI as far as running validations.

JUSTIN: Oh, I'm glad you brought that up. There is that company that I'd mentioned before at the beginning of this conversation, the Qualit.AI. And they have a lot of good clients. I mean, they're a fully functioning tech company that is taking in money and providing value by using AI to create tests for clients. And it's kind of freaky in some ways because it's like I've written tons of tests, unit tests. I've written tons of integration tests. I've written, you know, end-to-end tests. We've all done the, you know, hey, let's use Selenium to write all our web tests or whatever.

And all of our tests are always brittle and, you know, and they're outdated in the next release. But it's like, you go in there with good intentions, and it turns out to require a lot of work to maintain that test suite. But, all of a sudden, you can take your application and your specs and throw it at this AI, and you may not have it perfect, but it gives enough value that people are willing to pay a lot of money for it. And it's interesting and it'll be even more interesting in a couple of years where the AI becomes even better at maintaining those. And you just, like, throw your documentation at it, and then your application endpoint, and, you know, you say, "Hey, tell me if anything's wrong with this." and it comes back five minutes later.

MIKE: I think there's something really pertinent here. We've talked about how AI makes mistakes. And so, now we're back in the exact same boat that we are in for ourselves in that I'm writing some code, and it might have some problems. So, what are all of the things I can do to validate this code to make sure that it works? And as we automate some of our code writing responsibilities, the job doesn't change [laughter]. Because our attitude is still, you know, our approach is how do I validate this code?

DAVE: I just hit on a crazy synthesis of everything we've talked about up till now with this AI thing. One of the things about AI is that the neural networks are formed differently to the way humans build them, like, organically over time, which is why when you get an AI and teach it chess, it will do moves that no human has ever played, and then still win the game. Chess players have trained against existing patterns, and it's doing random things, right?

Something that I have noticed over the past month playing around with Copilot is that when the AI hallucinates, it's glaringly obvious to me. But when my co-worker hallucinates, I just accept it as a point of fact. And when you take those two networks and run them against each other, it's a cross-check instead of an extrapolated error. Oh, my head! Oh, I'm so glad I listened to the podcast today. Wow!

EDDY: [laughs]

DAVE: There's a blog post or two in that. Wow. Hey, you guys remember blogs?

MIKE: [laughs]

WILL: Yeah, sure. Copilot's great. [inaudible 46:07] the way down.

EDDY: Copilot is particularly accurate, like, 95% of the time. It's really nice when it knows what you want, but when it doesn't, it's kind of annoying.

[crosstalk 46:18]

DAVE: But it's a useful annoying. That's what I'm saying. It's a useful annoying.

EDDY: Yeah. It's like, good idea. I see what you did there, but no, that's not what I want. This is what I want. Be smarter next time.

DAVE: That's my life goal. Honestly, like if I had a personal mission statement, it would be to be usefully annoying.

EDDY: [laughs]

WILL: You'll get there.

DAVE: Just climbing up that useful tree [laughter]. Got the annoying nailed, so...

MIKE: So, the recurring theme today, we make stuff that isn't perfect [laughs], and the way to find that is to take a multi-pronged, multi...

JUSTIN: So, I'm glad you mentioned this. My co-worker, Dan, he says we take a belt and suspenders approach to security, and we do the same thing with testing. It's like, they do the same thing; they keep your pants up, but they support each other. They back each other up. A great example of this on the security side is, we have a number of gates in our automated CI, you know, which include a static code analysis, a dependency checker, a secrets analysis, among a couple of other things.

And so, all those run, and then we also do, you know, we use Wiz to scan our containers, and then we use it to scan our deployed containers. And then, we also do...we pay an external company for a penetration test, and then we also do a DAS scan once a week or, you know. And so, all of those things are theoretically finding the same bugs. But the cost of having a security issue is so high that it's worth it to us to have all these tools checking the same code at multiple points of the SDLC that it's worth it to us to pay that, you know, to pay that thing. And it's kind of nuts, but that is the, you know, one of the costs of doing business in some cases, so...

MIKE: Where you want to tighten that curve, right?

JUSTIN: Yeah. But, like you said, David, it's like this and this, and then another dimension coming in on the intersection of all those things gives you a very high sense of confidence that the thing you are doing is correct.

EDDY: It's so funny that you mentioned the suspenders also help [laughs] keep your pants up. I don't remember the last time I've seen anyone wear suspenders. So, is that really a reliable thing [laughs]?

DAVE: Get off my lawn, you kids. I've seen them.

[laughter]

JUSTIN: I was going to see you pull out your suspenders, David.

DAVE: What cracks me up is I've actually seen somebody hook their suspenders to their belt, and it's a perfect example of extrapolating your errors because if your belt fails, your suspenders are going to go with them. They're giving you nothing except for something to put on your shoulders, right?

EDDY: [laughs]

JUSTIN: And I'm glad you mentioned that, too, because if you are depending or you're testing the wrong thing, you create a false sense of security or a false assurance. So, writing tests for tests, you know, just to have a test there is not very useful, and that’s --

DAVE: It's almost negative.

JUSTIN: Yeah, it's almost negative because, all of a sudden, you have to support this extra code. And so that is a, I mean, that's a really important thing is to, like, test with deliberation, with intent. And be aware of everything that's in your pipeline and know why you're doing it.

WILL: See, I came late, and I think I kind of almost want to offer, like, a little bit of a different perspective on the whole testing thing; I mean, in that, like, I don't really see testing as a tool for finding bugs. I've found very few bugs as a result of testing, writing tests, right? Because I write the tests around my understanding of how the thing is going to work. And if I thought of a bug that I would catch with a test, then I would like, well, you know what I mean? Just sort of like a developer verification or validation of, like what I've written, you know what I mean? Very few things, you know what I mean? Because I just push the button and make sure, you know, pops up with something right before I send it off to QA. And I'm just like, good luck suckers, you know, because it's just like, it's just a waste of everybody's time, and I hate redoing work.

And so, where I get value out of testing and where I think an organization gets value out of testing is in a really formal specification of how this thing is supposed to work. And so, when assumptions around how this thing is supposed to work change, then you have your test suite, God willing, just raising their hand and saying, “Excuse me, what?” So, I was dealing with a team that manages pricing for products across this thing. So, like, pricing is kind of like the last line of defense, right? Like, we have old, old pages, decades old, you know, decades-old pages, right?

But they're still getting money. They're still getting traffic. People are still, like, going there and being like, oh, what about a TV? What was a cool TV to buy in 1995? Well, I don't know. I want to know. All right, fine, whatever. That will be a thousand dollars if I can even find one, whatever. And so, they have this ongoing layer upon layer upon layer upon layer like the rings of a tree. Every iteration of our sort of pricing platform that they need to support, right?

And so, they have, like, easily a half a dozen different ways of slapping a price label on a product. And they all have to work because it's literally millions of dollars, right? Some of these long tail links they still convert, you know. And if we screw them up, we can hear about it.

And so, the test suite for that team, and I think a lot of teams, it's just like, if I changed, if I have this new, great idea, that test suite is like a formal specification of, like, these are the wickets that you have to hit, you know, these are the hoops you have to jump through. It's got to work here and here and here and here and here, or else it's no good. And that's where your test suite is going to save you because nobody could think of all these oddball, you know what I mean, oddball corner cases and situations.

You build up the test suite to make sure that, like, when I make this change, and I tested it, and I verified it, and, like, QA verified it, and everything is good to go, you're still, you know, you haven't forgotten anything. And so, I think of it more of like a...I think of like a ...talk about like documentation, returning to that concept. I think of the test suite as the for really real documentation of, like, what the requirements for this piece of code are, not telling you whether you're right or wrong. But, like, this is what it has to look like in the end.

And it is a sort of a self-enforcing, self-updating documentation in that, if it fails, well, then you're going to have to get right with it one way or the other and say, like, "Okay, well, okay, it’s not like that anymore. Not really.” Or, you know, oh, oh my, you know. But it's neither necessary nor sufficient to ensure that your code is right today, right? It's this long-term institutional memory that's formally specified, pretty formally specified, right? But it isn't, like, did I ship a bug? Maybe, you know. Maybe.

MIKE: We did touch on this earlier in the call before you joined, but we went into a lot more depth on the weakness.

WILL: Well, then, yeah, ignore my child-like interruptions.

DAVE: No, I think this was valid. I think so. I had a great discussion with somebody about unit testing addition. Like, why would you do this? Everybody knows how addition works. And I actually pulled out two examples from RSpec, and I said, if you use lets and you put them up and you say, expect x plus y to equal z, if you can't find x, y, and z, you have no idea what that test does. If it passes, you have no idea. It has told you nothing about what the application actually does.

But if I have a test that says, expect three plus two to equal five, okay, it does integers great, but in the very next line, if you see expect Bob plus car to equal Bob car, I just taught you that this thing will concatenate strings using the integer operator, and that is useful knowledge that's outside of what you might expect a system to do, especially if it's a currency adder. Like, why would the cash machine add strings, right? And that would start a good conversation. And that's a cross-check at that point, instead of just a dependent extrapolation.

EDDY: Will, you hit on something that kind of resonated with me, where you said there's very little times where your test will keep you honest and find a bug in your code. But I'd argue that that's what made writing the test worth it. It's those small moments that catch your error that make your unit testing worth it.

DAVE: There's a whole conversation about that, but I've had a conversation of, like, when do you delete tests, and there's stuff where I will write, expect three plus two to equal five ratchet, ratchet, ratchet. And those tests are useless to anybody, except what it is; I've got so many things in play that I just want to lock down every piece until I get to the end. And when I'm done, all those tests document how the system does what it does; I will delete those because they're not useful to any...they're useful to me as an incremental developer to make sure I'm not going to get blindsided by this thing, you know, flapping loose. But it doesn't document what the system should do. So, when I'm done, I delete it, and I'm left with the original spec, which was more of a context integration-level spec.

EDDY: So long as you run your tests with a format and the context and it blocks are really descriptive, that documents itself. That's all I'm going to say.

DAVE: I had a very happy day when Tad said that when you say, rspec dash fd, we all just assume that means format for Dave because I always want people to run with your format turned on. And you can tell which developers just get green dots because they do not write tests that read like sentences. And the people who do, their tests read like sentences.

MIKE: We've talked a lot about, you know, multi-modal sort of testing. Do your unit test, but that is not going to save you. You've got to have an attitude that you're going to try this any way you can. You're going to drag the anvil out of the way [laughs] and get in there. And if we approach our tests that way, you know, we could just get so much more.

And when I say test, it's validation, right? And this applies to a much broader thing than just unit tests. Making sure that things work is more than just one thing. I think that's the key takeaway here is that it's more than just one thing. Hopefully, you can put some of this knowledge to use.

And till next time [chuckles] on the Acima Development Podcast.