Episode 54
Monoliths vs Microservices
September 4th, 2024
55 mins 59 secs
About this Episode
In this episode of the Acima Development Podcast, Mike kicks off the discussion with a few personal stories before diving into a technical debate. He recounts his recent epic hike in the Rockies with his son, which turned into a grueling trek due to altitude sickness. He also shares a nostalgic memory of visiting the massive Edmonton Mall in Canada as a teenager, noting the stark contrast between the pre-internet shopping experience and today's online marketplace like Amazon. The conversation then shifts to a comparison of governance structures in various countries, which sets the stage for the episode's main topic: software architecture and the debate between monoliths and microservices.
The panel explores the evolution of software architecture, with a focus on the trade-offs between large monolithic systems and distributed microservices. Mike and the others reflect on how early approaches to software development often favored building large, cohesive systems, but over time, the trend shifted toward breaking these systems into smaller, independent services. They touch on the challenges of communication and synchronization among microservices, and some express a sense of regret over the movement toward extreme modularity, suggesting that, while microservices have their advantages, they can introduce significant complexity and overhead when overused.
The conversation also delves into the practicality of maintaining and managing distributed systems, particularly from a DevOps perspective. The panel discusses the fine balance between creating boundaries in code and allowing flexibility for different teams to work on separate components. There is consensus that, while microservices can improve scalability and modularity, they can also lead to what one panelist dubs "micromonoliths" if not properly managed. Ultimately, the panel agrees that the choice between monoliths and microservices depends on the specific needs of the system and the organization, underscoring the importance of choosing the right tool for the job.
Transcript:
MIKE: Hello, and welcome to another episode of the Acima Development Podcast. I'm Mike. I'm hosting today. We've got a great panel with us here today. We've got Will, Eddy, Adam, Dave, and Kyle. And today...well, I'm going to hold off on what we're talking about. I'm going to give three stories; three things to start us off.
WILL: [laughs]
MIKE: But the first one is a story. I'll start there. So, last week, I was on vacation, drove out west, saw the Rockies, went on a, I'd say, an epic hike, which is true, but it was also just, like [laughs], in some ways, a brutal death mark hike [laughs]. Went on a hike with my oldest son, and we did, like, 8,000 feet of elevation gain, and, like, 20 miles, like, 16-hour hike, just a big, brutal hike.
Loved it, but it was tough, particularly because we got up to the high elevation, and my son got altitude sickness, and he was just sick. I'll spare you the details, but there were some digestive issues going on that were serious [laughs], and that cost us quite a bit of time [laughter]. And he was not having a good time. Everything's fine. We got down to a lower elevation, and he felt a lot better.
And it was great, I mean, the view was amazing. Didn't see very many people. It was a gorgeous hike, but [laughs] it was rough. We've been back home this week, and he's been doing some online classes, but, in between, he's been running up and down the stairs for exercise. So, like, you know, go one flight of steps up, and I think he's been doing 60 times up and down the stairs, seeing how low of time he can get it, and no altitude sickness [laughs]. It's working out better for him. So, there's my first story.
Second story, when I was a kid, and this was a long time ago, I went up to the Edmonton Mall up in Canada. I think it's the largest mall in North America. And I don't know how old I was, like, 14? So, I'd say it was a long time ago. So, I really can't speak to them now, but I can when I was 14. It was this huge mall, and there were several stores. There was, like, three of that store within this mall. It was so big that there was, like, a region where you go to a store and then go to another region of the mall and go to the store again. And you just get lost in this place [laughs].
I don't know how many, you know, fast food places they had. I remember there's some places that had at least three stores. It was crazy. And they have, like, an indoor amusement park, and water park, big mall. And, you know, thinking about the era that I'm talking about, that was, like, the pinnacle of shopping experiences because you could get all the places, you know, all the stores in one place. Since that time, times have changed. That was pre-internet era.
Well, now, if you want to go to something similar, you just pull up your phone, and you go to Amazon, right [chuckles]? Not that people don't still go to the big mall in Canada; I'm sure they do. And they probably enjoy that experience, but they probably lean more into the experience part of it, your amusement park and water park. Because you go to Amazon, instead of having everything in one place, they have their little stores all over the world, right?
And by distributing that work, and just making it a central, like, bazaar, you know, where you come and see all the stores in one place, you get way more selection. And the world's kind of fallen for it. We're not going to debate the pros and cons of Amazon here [laughs] [inaudible 03:54] that they have been widely embraced, that they have been widely embraced with that different model.
Third story. And this one is not so much a story as a comparison. Let me talk a little bit about governments. Now, in most large...I say most, I'm going to mention the United States and the European Union. You do have a central federal government, but you also have a contentious group of states within that central government that don't always agree and regulate themselves internally. And sometimes it works, sometimes it doesn't, but [laughs] it does allow for some local rule and for those independent states to run things, you know, to their own liking. And then, they federate for the big decisions that need to be made together.
There are nations that don't work that way. I was thinking about Saudi Arabia. Very much, we've got a monarchy [laughs]. We say what you do, and that's it. The interesting thing about that, though, is it's not like those fractious internal debates don't happen in Saudi Arabia. If you read anything about the internals of the ruling family there, it's scary [laughs], murderous sometimes, and I don't want to mess with that. So, it's not like those fractious internal debates go away. They just get handled very differently when you've got kind of a monolithic rule, which brings us to the topic of our discussion today.
We're going to talk about software architecture and about monoliths versus microservices. This has been an ongoing discussion for a long time now in the community. It used to be kind of everybody went the [inaudible 05:33] think [laughs]. You'd build a big thing and then deal with it. And people start saying, "Well, no, big is bad. We need to have lots of small things." And then, some people pull back and say, "Well, no, maybe big wasn't so bad after all because there's a lot of issues with communication among lots of small services and keeping those all in sync."
I brought up a few examples outside of the software world to give us some fodder to chew on, and I'm going to start off by just asking you all in this spectrum, saying, hey, you know, putting everything in one big application versus breaking everything up into small apps. Where do you stand? Where do you think that right balance should be?
TAD: Can I get a definition of what you would say a monolith is and a microservice? Because I remember way back when we were just...everything was services, and then this huge trend came along of, oh, what if we made, like, really, really small, very, very narrow services and did everything with Sinatra and never touch Rails again because it's too big and bloated? So, what makes something a microservice, versus a service, versus a monolith, versus anything?
MIKE: There you touched it, right [chuckles]? I used the word spectrum deliberately [laughs] because I don't think that anybody has a perfect definition there. And you could certainly, say, "Hey, we're going to have an architecture where there's no service that has more than one end point." Sure, right? I don't think most people would like that. Just going out on a limb --
DAVID: It's arbitrary. That feels like an arbitrary decision, right?
TAD: Well, I've seen that. I've seen places that they're like, oh, we're going to have a routing number validation service. We're going to have a credit card validation service. We're going to have, for any particular thing that you need to do, you've got a service that you hit. And it's divided up amongst...it's almost like functional level [laughs] services.
MIKE: But how did it work? [inaudible 07:43]
TAD: Not great. Not great. You can probably tell that I'm like...every time I hear the word microservice, I just [vocalization] shudder and just, I don't know.
WILL: So, one, I feel like a monolith versus microservices architecture it's like pornography, you know what I mean, it's hard to define, but you know when you see it. And, two, and what I've seen, I have some direct experience, and then I have received...maybe a year and a half, a year, year and a half ago, there was this top-down mandate at the large electronics retailer that I work for that all of the teams are going to be cross-functional teams, right? So, they said like, "Nah, we're not doing monoliths. Everybody's doing everything," right?
And almost immediately, all of the cross-functional teams immediately spread out into their, you know what I mean, on a micro level, so instead of having a team of mobile developers, or a team of backend developers, or a team of frontend developers, everybody sort of split out and said, "All right, I know how to do the mobile app stuff, so I'll do mobile app stuff. You know how to do back-end stuff, so you'll do back-end stuff. And you know front-end stuff, so you'll do front-end stuff." And everybody just sort of self-organized into their silos of choice.
And even if it was one giant repo with one giant Windows desktop application, right? Like, if you had a team of developers...and they will naturally, instinctively, like honeybees, self-organize into functional groups. It's like how even if you have an open seating plan, everybody will kind of pick a chair, and they will sit in their chair automatically. And they will get familiar with this one siloed piece of the codebase because it takes time to boot up. And so, it's an emergent sort of organizational thing. Microservices will happen instinctually. And they're like, oh, maybe we don't have separate GitHub repos. Even if it's all in just one big pile, humans will instinctively do that, you know what I mean?
MIKE: I think that there's a lot of truth to that. Well, we have limits as to what we can fit in our brains [chuckles]. There's finite limits to human capacity to think through a problem, to think through some domain within a problem. And once something gets outside of that space, you know, once it outgrows that, it becomes really hard to think about. And, you know, you probably have to map something out and think about sub-pieces of it. So, you just naturally look at sub-pieces of it.
And there's a few ways to approach that. You can have it within the same repo, right? Not the same repo but the same application. You deploy it all together. And then, you have, you know, lots of units, functional units within a single codebase, or you can just blow it up and have separate, you know, fully independent. If you're out on the web, if you have the advantage of doing so, you can have all these separate units, you know, deployed separately, managed separately. You don't even have to think about the other ones, except via API. And really kind of harden those boundaries to where, you know, you only talk to each other via API.
And by enforcing that, now you don't even have to think about those other things, maybe [laughs]. Maybe you don't have to think about them. And I think that's kind of the microservices or the, you know, the small services advocates would say, "Well, yeah, no, you should definitely make a hard boundary. That way, you're not tempted to cross those boundaries." Where people who've seen that go too far tend to say, "Well, you know, maybe we can maintain boundaries in code, and we don't have to put it, you know, deploy separately." You're saying, "Well, this happens automatically." And I think that there's a lot of truth to that. Do you think there's value in hardening those boundaries to enforce an API?
WILL: You know, if I'm being totally honest with you, what I see, like, the difference between sort of a monolith versus a microservice thing is, I see it as analogous to sort of serial versus parallel programming, right? So, with serial programming, you do it right, and then if you do everything right in a row, procedural, like, tick, tick, tick, tick, tick, tick, you know? Add it all up in a line, and then there's my result. I return my result, and I'm good. But it's not as fast. It's not well, you know, broadly.
Like, if I want to write some slick parallel algorithm thing, that's a harder job to do. But you can get that 16 core, you know, server. You can keep them hot, and that's great. But it takes higher skill. And so, similarly, the spectrum you're choosing your path on in terms of microservice versus monolith, if you have a monolith, you're leaning really, really hard as things get bigger. As the monolith becomes larger, you're leaning really, really hard on super high-skill, super domain knowledge engineers that can wrap their heads around everything and understand how to deal with the monolith without knocking it over. Because we think of it like a pyramid, but really, it's like a pillar just swaying in the breeze.
And so, when you go over to the microservices thing, then you're leaning on sort of managerial organizational capacity to manage this sort of 16 core. Instead of 16 core CPU, you've got 16 dev engineering team. And you need to make sure that everybody is coordinated and pulling in the same direction and all those things. And so, that's sort of the spectrum. Like everything in engineering, it's not good versus evil, angel versus devil things. It's just sort of where are you on that spectrum, and what's the right tool for the job for where you are right now?
DAVID: And, Mike, you mentioned at the top of the call you were talking about monoliths versus services and whatnot. And do we sit down and write a big thing? And I think, no, when you start out, you always write a small thing that, you know, every monolith started out small somewhere. And, for me, it's always down to a trade-off, right? If you've got services, you now have very expensive communication between those services. But the things that are in those services are now decoupled, which is nice.
In a monolith, the communication between them is very fast and very easy. I just want to look at that module, just go look at that module. The drawback is that if that other module wants to look into your stuff, they can just do it because they're inside the firewall, and so there's that coupling back and forth.
So, I made a laundry list as you guys were talking. Tad, you mentioned definition of microservices. What I saw in the teens, the last decade, was a lot of people doing like, oh, monolith's bad; SOA good. And what they went out and wrote was every service was its own monolith. It's like, oh, instead of having one, you know, 100,000-line code base, you now have 500,000-line codebase. This is not moving in the right direction.
TAD: Now you've got two problems.
DAVID: Now you've got two problems, exactly. And so, yeah, every problem in computer science can be solved with another server in the microservice cluster, except for the problem of too many services in the cluster. And I heard a really good definition of what a microservice is, and that is something that you could rip out and rewrite in two weeks. And I've never [laughter] seen somebody write...yeah, exactly. That is the correct reaction. And if you're saying, "We're doing microservices," and your services are all 30,000 lines and have 73 models, those are not micro. You couldn't rip them out and rewrite them.
The interfaces between things are very expensive. Well, they're much more expensive because you have to negotiate with both sides, and, often, there's, like, three or five different deploys that you have to do to make it possible, and then use it, and then require it, and da da da. And you have to work out that dance. And if you need that secured, and hardened, and guaranteed, then you want that to be formal. You want it to be in a service. So, you want it to be locked down on a contract. But if it's fluid and it's flowing, you're just paying through the nose for something that you could have spiked out with an agile team overnight. But now it's a six-month initiative because you made it as expensive as possible.
EDDY: I'm actually curious to hear this from a DevOps perspective. Do they prefer to maintain microservices over a monolith?
DAVID: Do we have DevOps on...Kyle?
KYLE: Yeah, I was going to say, at least from my perspective, I'm all about orchestration tools, and something like Kubernetes is extremely powerful. So, I would much prefer a microservice, but I do feel like we've kind of talked about it. A lot of microservices end up being micromonoliths, is what I've called them in the past. And it's still one of those things, smaller units that can be divided out. And then, it's, at what point do you continue to divide? Because you can have quote, unquote, "microservices," or you can go [inaudible 16:54] the functions because there's stuff like AWS Lambda or serverless, right? So, at what point do you stop...each rendition complicates the next one. With functions, at what point do you keep them hot? Because that's something with Lambdas.
Originally, they weren't kept hot, so you had to wait for them to spin up and be able to function; now, they have hot functionality, and the same thing with even running microservices on servers. You have to have hot servers. Otherwise, you're waiting to deploy. And then, you go to the smaller units, at least in my experience, and you've always got that one server that you can deploy to because it's always up. But you start going out. It is nice in the sense that you can divide your workload amongst smaller machines and cut costs and stuff that way at times. But that's a long way of me saying it really depends on the implementation, I guess. I prefer microservices, but not to an exaggeration.
EDDY: So, micromolothiths.
WILL: Well, I always felt like, from a DevOps perspective, you've got really heterogeneous workloads, in that you've got boxes that are like, okay, I'm doing this thing. I need this amount of stuff. And I have other things that, like...and so, the more heterogeneous jobs you have running on the same machine, I feel like, correct me if I'm wrong, because this isn't really my trade, right? It's like, it makes it hard to correctly provision a machine. Because it could be doing anything under the sun, then you have to create a larger machine, larger buffer because the workload could widely vary.
KYLE: Yeah, yeah. And even within inside of our company, we've got services that are doing different things that are different sizes. And we will make it so that they won't land on the same nodes as themselves or the other large services, and sometimes we don't even want some of the small services to land on there. And kind of what you're talking about, we do have more general-type workload servers, along with we have more scheduled workload like our data science teams, where they've got crons that sit there and run more so than anything, maybe serverless is what it would be closer to, where it's just very spiky workloads. It's just, throughout the day, just bam, bam, bam. And, like you're saying, those need very different machines to support them.
WILL: Right. Right, yeah.
DAVID: One of the things about Kubernetes and Docker—and its predecessor Vagrant, and Chef, and Puppet—this idea of, like, oh, you say you're doing microservices, but you need an entire operating system now? Your microservice needs...you got to configure how many CPUs it has? That's not a microservice anymore. If you need to install Ubuntu, all of Ubuntu [laughs], to run your service, it's not a microservice anymore.
WILL: I don't think I track.
DAVID: Oh, just something you can rip out and rewrite in two weeks. Oh, but you're going to do a Docker container. So, you have to provision an entire operating system drive space. That used to be called full-stack dev. And it's like [crosstalk 20:17]
WILL: And maybe I don't understand because, like, I don't know, I'm stupid, you know what I mean? I thought Docker was cool. I was like, oh, I could just do a Docker. Like, I'll fill up a pod, and I don't have to do a full-on EC2 install. This is so great. What should the microservice, I mean, if it's a proper microservice, how should we be doing it if not [inaudible 20:38] a Docker image? [inaudible 20:40]
DAVID: I want to be careful to not walk into the trap of proper because I think the way we do it now is kind of proper. But 15 years ago, all the hardware in the colo was the same. Like, everyone had the same processor. Everyone had the same hard drive. And everyone would say, "Can I please have more RAM on the server?" And ops would say, "No, you can't have another gig of RAM because another gig of RAM on your box means we have to provision every machine in the colo. It's a 20 million upgrade." And Docker gets us away from that. And I'm 100% on board that that's a feature.
I'm just saying when I see Docker scripts where it's like, okay, yeah, well, let's go get this thing, oh, and now let's download Python and patch it, and da da da da, and it all becomes part of the entrails that are actually part of what you're working on, suddenly, it's very, very complicated.
Mike, you mentioned story time at the top of the hour, and there was a specific story that I think is actually relevant to this. When I was at a company that shall remain nameless because I might still have friends there and I want to keep it that way, they wanted to do a big push for SOA. And we had over 700 separate services at one point, and it was trending up. So, when I parted ways with the company, it was 700 different services, and there was incredibly painful fatigue around adding a new service. Like, if you wanted...oh, I want to do a new service to this thing. You would get...like, everyone would just automatically just...
Sorry, Eddy just asked in the chat, "What is SOA?" service-oriented architecture. So, take your monolith. Instead of a single architecture, it's spread it out. And then, most people, when they do SOA, they'll take...instead of doing micro, they start tiny, but they get bigger and bigger and bigger. And then, you've got complexity inside each of these distributed services. So, that's where that came from. We had 3 different services that were putting records into the God record, right? Every startup has that one table, right? For us, it's like the contracts that we do. And in a medical company, it'll be your prior authorization request, that kind of thing.
And we had three separate services that were creating new items, new documents, which they needed to create, and I can't remember...they could not just stick it in the database and read it back to find out what the primary key was. They were, like, physically disconnected from each other, which means they had to come up with primary keys without colliding with each other. And for whatever reason, it wasn't GUIDs. So, these services had this...I want to point out this was in the last 5 years of my life or 10 years of my life, and I've been here for 4, last 10 years of my life.
Every time you created one of these documents, it would try to insert, and then we'd go, did it work? Uh, no [laughter]? That primary key was taken? Okay. Here's a new one. Does it...no? Okay. And, yeah, and it was fun because we were running out of key space. We had meetings about exhausting the key space. We had used all of the keys randomly through the entire search space. We had over 50% of them, and so we had this system that would retry 5 times, and it was failing 0.01% of the time, then 0.02%. And we're like, you know what? We're going to retry six times. We're like, this is not solving the problem. We are running out of places to randomly roll dice and try to hit a blank spot somewhere in the key space.
WILL: [laughs]
DAVE: It was a nightmare. And so, I came in and I said, "Guys, this is the textbook example for a microservice," a random number generator that can look at the database and see, is it colliding? No, no, it's not. Great. Oh, and, by the way, we could make the random number generator deterministic. They couldn't have ordinal because they were worried about ID prediction attacks. So, they had to be random.
And so, yeah, little microservice. It was five lines of code, baby. Fantastic. And everybody would talk to this to get their primary key and insert it in the database, and it was guaranteed to be available to them. Life was good. And everyone dug their heels in and said, "No. No more new services. No more." And I'm like, primary key on your God object table, if there was a reason for a microservice, this is it. And because we get away from why are we doing this and into this is the right way to do it, we stuck to the doctrine rather than to the principle. And, hmm, yeah, good times, good times.
MIKE: So, they'd done so many microservices that people refused to do another one, even when --
DAVID: Yeah, there were over 700. Somebody came back from DEF CON...I won't name names, but his initials are David Brady. He came back from DEF CON...I told my team, "You got to see some of this stuff." And I started talking about war gaming and red team versus blue team, and, you know, what's our disaster recovery. And the company was in the process of getting a second data center. So, all of a sudden, we went from 700 servers to 1,400 servers because we had to duplicate everything. And the war game plan was, if this colo gets nuked from orbit, we have to be able to fail over to this one. It was a nightmare.
And it took over a year to get to a point where we could get everyone together, and we had a war game night. And it was literally when traffic died down on Saturday night. We all lined up, and we started...we basically simulated an outage as politely as possible. We let everybody shut their services down politely so that you didn't have to do data recovery on the restart. But we took everything down out of the Atlanta data center and started bringing everything up in the Chicago data center, and those war games took 9 hours. We started at 6:00 p.m. And we figured this will probably go till midnight. No, it took till 4:00 a.m. It was a nightmare.
And the plan was to bring it down, move it over, bring it up, test it, bring it down, come [inaudible 26:17] back. And once we got up on the new data center, we were, like, "This is our new primary data center. Nobody ever move this again." And, a year later, they said we needed to war game again. So, every year, that company swaps data center. I don't know if they're still doing it that way, but that's...yeah. So, too many...literally, just the act of saying, "Where is the next service going to be?" It gets expensive. And it's a low exponent, but it is exponential. And if you get enough, that line will start to curve up on you.
MIKE: Well, I think that your story captures something, and it's been touched on a few times in this call, and it goes to the idea of coupling. If your services are tightly coupled, you have not separated out your system. You've just made the same system with a more challenging interface. I've heard it put that a tightly coupled system that's distributed is just a distributed monolith.
DAVID: I like it. I like it.
MIKE: You've got all of the problems of both monolith and a distributed system. And the coupling there is, I think, absolutely the key. If you can't have one of those services go down, then you haven't really extracted it. Anything that's a single point of failure is not really meaningfully extracted. It might as well be inside your system because you depend on it just as much as you did internally. And you have all of the cost of network overhead, and managing that service, and everything else that is involved in the cognitive overhead of having that separate. You really haven't separated it out.
There are tools, though, I mean, you can have...if you build your system right, you can, many times, have a service that can go down and nobody notices [laughs]. I mean, hopefully, you're monitoring it, and you notice, but the other services can keep running. And there's challenges there. You have to have some data redundancy. You have to have, you know, whatever. It's very situational-dependent, right? So, I'm not even going to drill into that.
DAVID: So, kind of a weird...I was reading on some stuff on how Amazon did their architecture way, way back when. They did things in Lisp because they had a bunch of crazy gearhead PhD doctorate candidates that loved to do stuff in Lisp, which is wildly inefficient, but they were writing this highly distributed, super scalable system, right? And it was because they had everything super, super tiny. The weird thing that they said that I have taken away and I just keep it in my back pocket...and I've forgotten the why. So, I'm going to throw this out there undefended and let you guys just pick it apart. And anybody listening to this can pick this apart but just think about it.
When you're building two Rails models, if you do a belongs to in one direction, it's just instinctive, right? You automatically go to the other one and create a...if it belongs to, then you go to the other one and do a has many, right?
TAD: Has many. Sure.
DAVID: Yeah, and the guy that was writing the...I want to say this was the microservices book that came out of Amazon from this guy. He said, "Resist that temptation." Bidirectional communication in CS is often ten times harder than single direction. And he gave a very specific example that if your customer has a shopping cart, okay, if customer, you know, has shopping cart, that's great. If I want to pull up your order page, I have to loop in the customer server, and I have to loop in the shopping cart page. But if it's bidirectional, then if you turn around and you look at the leaf node, as long as you know the customer name or stuff that you can go look up later, that's fine.
But if it's indexed and it requires this thing to be in the other present, like you said, you now have just duplicated monoliths. You have to have...that's what it was. He was specifically talking about having a database where you might have shopping carts and no customers. Like, you literally don't have customers in that database. You have to go to the service to get it.
But in the other system, the customer is there, and the shopping carts are there because they're linked, and they have to be efficient. So, that was the rule that I tried. It's like, I try not to just immediately do...and I want to say we actually have a RuboCop rule that says, if you put the one side in, always put the other side back in. And that's not always a bad idea, but it's not always a great idea.
TAD: I think it's pretty surprising to not have bidirectional stuff. Like, if you encounter something, you assume that that relationship goes both ways. So, --
DAVID: I could give you a better example. If you want to...so, we do leasing here, for those listening online. You need the customer record in order to service the lease. If you're in the call center, you need the lease, and you need the customer. But if you are at the contract origination system, you don't need any of the servicing stuff on the account because the account has...but you do need the customer information. And that was where the argument was.
And I agree with you that if you're down over in the servicing side, it should probably be bidirectional because you always need both. But if you're at the other end, single point, and that was the guy's push, was, like, by default, go single directional until you need bidirectional. And I've been caught in the neck with, you know, it's like, I just need this item, and it's over here, but there's no way to propagate. And so, you have to do a refactor PR to add that, so...
TAD: Well, I could understand you don't want to load everything up, but it seems surprising to me that you wouldn't want both things to know about their relationship with each other.
WILL: I'm with David. I'm with David. I'm strongly with David because, like, I don't know whether you guys saw the lightbulb turn on, right? Because one of the easy patterns that I've seen in this sort of database-centric things, I mean, we talked about a Rails application, but there are many where the lines of ownership get really muddy because you can get to anywhere from anything. And you'll find these situations where you'll get sort of, like, hop ons, right?
A better thing is I do have this sort of aggregator object. I've got, let's say, a lease, and I have things attached to a lease. Or I have a customer, and I have things attached to a customer. And it promotes very clear lines of ownership, where if I've got a customer that, I can get what I need from the customer. Or I have a lease, I can get what I need from the lease.
A way to visualize the thing that I'm thinking about, at least, is, like, think about an unstructured graph and traversing this unstructured graph where things can just go topsy turvy, any old place, versus a tree where I go from the node, and I could go to the leaves, but I don't go up, right? There are no cycles in this graph. And think about sort of traversing and just writing sort of traversal algorithms on an extremely structured tree, where it has rules, and the rules must be followed, like a bee tree or whatever.
TAD: But doesn't a tree have a node? Like, the parent node knows about its children, and the children know which node they belong to.
WILL: It depends on the tree, [inaudible 33:13]
TAD: It seems super weird to me that a leaf, you can say like, "Who's your parent?" And it says, "Oh, that guy." But I can ask that guy, and he's like, "I don't have any kids. I don't know of any relationship to my children." That just seems really surprising to me.
WILL: I'm thinking about really sticky, complicated, little data graphs, data associations because I've seen, I don't know, I've seen them go off the rails, where there are all these things snarled up together because there isn't clear ownership. There's not a clear, I am the container node that everything sort of spins around. And what I'm thinking about specifically is these situations where you need to put in...so let's say I want...I want to make a service call, and I have to attach a bunch of different extraneous stuff to my thing because it's like, oh, well, I have the shopping cart.
The shopping cart has orders, but I have to have the customer because the customer has the rewards account index, right? So, I can give them their rewards points. And then, I have to have the associated store because the order is going to be fulfilled from their preferred store. And you get all these things that are just sort of, like, there isn't a reason that they all have to go together, but because these things have not been organized...and, I mean, I don't want to be, like, I'm not a zealot on this. But, as a rule of thumb, making things a one-way hop, unless there's an overriding need for it not to be, that's really compelling as a guiding principle.
DAVE: I may have explained it poorly. Actually, I may have achieved exactly what I said when I started this story, which is I'm defending this poorly, and so it'll start discussion on it. But yeah, it's interesting to kind of push and think about, like, if you've got literally two services, one for orders and one for customers, one of them needs to know about the other one, but the other one might not always need to know, and that's, yeah, even though they do always have that relationship, yeah.
MIKE: Well, I think that's actually kind of important. Let me take your idea a little further. Let's say you have customers, and you've got leases, and you've got locations, and you've got merchants, and you've got shopping carts, and you've got, you know, any number of things. I think that if the customer has to know about all of those things, then it's just a big ball of mud. You've got all of that complexity tied up in the customer. But if you have a customer and it's just customers and your shopping cart is in some other service that handles shopping carts, and they have a reference back to the customer, so they know who the customer is.
But the customer knows nothing about shopping carts. Doesn't have to know a thing about shopping carts and doesn't even have the ability to look it up because he doesn't know about shopping carts and doesn't have to. But your customer stays simple, and you don't have to deal with that complexity. Shopping cart knows what the customer record is and could look it up if it needed to, right? But that's a one-directional relationship, and it doesn't need to be otherwise.
And where those shopping carts are very ephemeral, right? I mean, they come into existence; they pop out of existence; you have many, many of them, and they really have nothing to do with the core idea of the customer, that actually disentangles, I think, your architecture dramatically. And when you have all kinds of things, you know, and I could probably come up with 100 things that need to reference those customers, if the customer has to know about all of those, then your customer gets extremely complex, where I think that you haven't really decoupled, unless the customer doesn't have to know, right?
You've got something that maintains, here's your customers. And somewhere in a data warehouse, you can map everything together, right? Or maybe even in the frontend, you have some ability to tie things together because you can make the, you know, you can [inaudible 36:56] backend and put things together. But I don't think that whatever is holding, you know, managing your customers should know about what payment you made, you know, what your payment method was that was used three days ago. It gets entangled with something that has nothing to do with that. And that's the idea, I think, in the best sense of what Dave is trying to say.
DAVID: I think I might see a distinction here as well that if you talk to the entire system, you're going to have to know about both and the system output. But if you want to have a conversation about shopping carts, you can't just go talk to the customer service. You have to go talk to the shopping cart service. It's allowed to talk to customer, so there's that direction. But the other direction, there's no link. There's no way to link customers down to their shopping carts because that service is whittled down and kept very, very small. That's the piece that, at the system level, you absolutely can go both directions. I'm not saying sever the link in one way. I'm just saying this particular tiny service we sever it just for that service.
WILL: Well, I like this shopping cart customer kind of thing, right? Because there's a clear ownership thing, right? Customer has a shopping cart, right? But maybe case study, like, what I'm imagining here. It's like, okay, I've got, let's say, I'm trying to do some new feature from this shopping cart, and I want to go out, and I want to say, "Based on the stuff you've bought, I'm going to machine learning up some things so that when you're in your shopping cart, I can show you some other stuff because you've got cash in hand." And I'm like, okay, I got a hot buyer. Let me see if I can upsell them on some stuff based on the things that I know about them.
And so, then as developers start developing out the feature, then you start to tunnel through the shopping cart, then through to the customer, and then through to their purchase history. And now you're going from this shopping cart thing, and you're tunneling through, and you're getting into all this other stuff, which is like, whoa, whoa, whoa, whoa, whoa this is a little bit, you know, this is a little funky, and now you're sort of injecting these dependencies by going through the path of least resistance. Like, I have a shopping cart. I'm checking out, right?
So, rather than pulling in the stuff maybe kind of a top-level way, you're side-loading this relationship, and you're using that to tunnel your way through the system to avoid doing it on a straight-up way, where it's like, hey, I want you to, like, you know, like, you're bringing in a shopping cart, and I want you to bring in some recommendations for me, you know what I mean? Is this analogy coming off the rails? I'm freestyling a little.
TAD: I think what I'm kind of thinking about now is, I think that you need to know the relationship, but you also need to respect the relationship, meaning Eddy and I are coworkers. And he's like, "Hey, let's go for pizza," or something like that, and I'm like, "Hey, yeah. Can I borrow some money? [laughter]" He'd be like, "Sure," right?
But if I reach into his pants and pull out his wallet [laughter] and then grab that money out of his wallet, I've violated that relationship, right? I think it's important that I know like, oh, Eddy and I...Eddy is someone I could borrow money from, right? But if I'm going past that interface of that relationship, I'm violating the boundaries of that relationship. So, I think, that's kind of how I'm seeing it right now.
WILL: Well, yeah. I think we're all seeing it. I think we actually are vigorously agreeing because, like --
DAVID: Yeah, violent agreement, yeah.
WILL: We all understand that this is maybe a guideline, but not a hard and fast rule, which is why it would be totally fine. And you were like, "Hey, I need to borrow a couple of bucks for lunch." And Eddy was like, "Yeah, yeah, yeah, just go grab my wallet out of my jacket. It's over there," right? Like, obviously, that's not a blank check to just get into Eddy's pockets anytime you'd like, but, like, reasonable people can make exceptions.
And the thing that I like about not doing these bidirectional relationships by default is it sort of puts guardrails on the relationship where you need to do it on purpose. Like, I'm doing this on purpose because these things are so coupled, and I actually think shopping cart to customer might be. In the end, it might be one of those such tight relationships, where it's like, yeah, we should probably go both ways.
MIKE: Well, it depends on your business.
WILL: Yeah, exactly. But, on the other hand, I don't trust you guys with nothing. Like, a big piece of my life is just hiding the sharp objects from the junior developers.
EDDY: Jeez, well, I don't know, man, if someone's reaching into my pocket, I want to be able to reach into theirs, too, you know?
[laughter]
DAVID: That's a bidirectional relationship, yeah [laughter].
TAD: That's a violation.
WILL: [laughs]
DAVE: I just want to say I'm so happy to be on this call with y'all and to not be the person who launched the sentence about reaching into somebody else's pants. That's normally Dave Brady brand but thank you. You did give me a...I'm sorry, go ahead.
MIKE: If HR might need to get involved, then it's a violation of best practices.
DAVE: Yes. Yes.
WILL: That's why I work remote, man.
[laughter]
DAVID: I had a light bulb moment when we were talking about this, that if customers and shopping carts are coupled...and stop me when you've had this...we've had this pain. Everyone in this room, I'll wager...I know, maybe not Will, and maybe not Kyle. I don't know how much coding you're doing lately. But everyone else on this call, I know you've had this problem where it's like, oh, I need to, Will talked about the shopping cart, I need to calculate the shipping and, in order to calculate shipping, I need to know how many boxes and how big the items are. So, what boxes so that I've got to do the packing on it, and then I need the weight on this.
We can go to a shipping calculator. It needs to know what state we're going to send it to, and that's going to come from the customer. But other than that, the shipping service doesn't need to know anything about the customer. It just needs...and, honestly, the only thing it needs to know from the shopping cart is what items are in it. It needs an inventory list.
And if you can say, "Here's the state, and here's the list of items," that service now doesn't need to know anything about the shopping cart or the customer. But we've all written specs, where you're like, okay, I want to do this. Oh, I need...this is a shopping cart. Oh, I need a merchant. Okay, in order to have a merchant, I need a location. And to get a location, I need a merchant, to get a merchant...oh, I need the legal paperwork that says, "The merchant has been onboarded and has signed the paperwork." Because they're a valid merchant, they're supposed to go through the system, so they're going to be checked at every point.
EDDY: You don't have to worry about that if you use factories, Dave.
DAVID: Right? Exactly. Exactly. So, just, you know, we'll just do everything with factories, put it all in the database. And do you want half-hour spec suites? Because this is how you get half-hour spec suites [chuckles].
WILL: And also, the other thing about shipping is like, maybe it's not for me. And I also said it, like, because I could speak as a large electronics retailer. We do a lot of shipping, and let me assure you, shipping is much more complicated at the state. There's lots of places [inaudible 44:17] you can get stuff the same day, and there's a lot more places in Utah you cannot [laughs]. Like, my dad lives in Michigan, but he lives in the part of Michigan where Prime is not a thing. You'll get it next week, period [laughs].
DAVID: The last mile problem we were all talking...like, in 2008, we were talking about the...whoever solves the last mile problem...this is before Uber existed, right? I might have that year wrong, but, you know, before Uber existed, solving the last mile problem was the reason why there were only three big, you know, FedEx, UPS, and DHL were it for shipping stuff, and now it's all super federated, and everybody's distributed.
I tracked a package on Amazon yesterday, and I saw a new status update, which said, "Delivery appointment scheduled," And I'm like, "What?" I'm used to carrier picked up package, package in transit, received at facility, left facility, you know, out for delivery. That's all you need. And, all of a sudden, we've got a delivery appointment. And I'm like, somebody's working on that last mile problem, and it's not well...what is it? The future is here; it's just not well-distributed, right?
There's still places...there's a place in Nevada that does not have cell phone services. There's only four pairs of copper going in and out of the town. Great Basin National Park, that's what it is. There's a little town there and a little national park. And yeah, I remember pulling in there, and I'm like, oh, my phone doesn't work. Let's have a modern horror movie now. We've now solved that problem.
WILL: Yeah. I mean, Great Basin, I want to write that down. Like, I want to go camping. I only want to go camping places if there are no bars.
DAVID: It's a...not to turn it into an ad for this, Great Basin National Park is a glacier in Southern Nevada, in the middle of the desert, right adjacent to the Mojave Desert. It's a glacier because it's on a great, big, volcanic talus cone. And you're just driving across the desert, and then, all of a sudden, there's this mountain sticking up out of nowhere, and there's a glacier on top of it. It's crazy. And Lehman caves, which is gorgeous, so...I may have vacationed there recently.
WILL: I'm underlining it [laughs] --
EDDY: So, I kind of want to...Sorry, Will. I just kind of want to go back to where we initially started. So, I guess, essentially, there is no rule of thumb to follow, right?
DAVID: Maybe not a rule of thumb. Well, there's lots of rules of thumb. You just have to remember that they're all based on principles, right? You can distribute something. And if you distribute it in such a way that the distributed nodes have very slow communication between them, and they don't do very much communication, you're great. You can do all your high performance in your silos now, and everything's great. And if you do it wrong, you're going to take the high-performance stuff and then stretch it over a boundary, and now everything is over HTTP that used to be in the CPU cache. Ugh, been there, done that.
WILL: I don't know, I've got a rule of thumb. I've got a rule of thumb in terms of, like, monoliths versus distributed services. And this is going to be on the record hot take right now: You should keep it as monolith for as long as possible.
DAVID: I agree.
WILL: Because, as I see it, the fundamental trade-off that you've got between monoliths and distributed services is you're levering engineering output versus sort of management and the ability to communicate, you know what I mean, to coordinate and keep everybody pulling in the same direction and keep the children from fighting with each other because it's just natural human nature. And, in the end, this is probably my engineering bias, but I've run into a lot more effective engineers than I have effective engineering leadership and [inaudible 48:12], you know what I mean? I see you, Kyle. You can hide behind your hand all you want, but I saw that [laughs].
EDDY: Isn't there a cost, though, to having a monolith, though? Like, if you say, like, "Hey, no, like, I have a monolith that was the first service that we developed for their startup. It's been developed actively for 10-plus years," right?
DAVID: Sure.
EDDY: And you're like, oh, go and develop for this feature, right? Suddenly, isn't there, like, a higher ceiling to come in and get acquainted with, versus if it had a microservice, a true definition, to Dave's point, to a microservice, right? Wouldn't it be easier for you to hit the ground running because you know how to scope or comb through all this [inaudible 48:57]?
TAD: Well, the company I worked for before, and it's out of business, but I won't say their name anyway, they had the entire company was an entire single Go codebase. And you're talking 60-70 developers all working on a codebase that has to compile. And so, every subdirectory had a README file, and it's like if you want to touch the code in this folder, you got to get sign-off from these three people [laughs].
DAVE: The stakeholders.
TAD: And that's how they --
WILL: Stakeholders, David, yeah.
TAD: Like, took care of it. And it's like, oh my gosh, like...
WILL: So, what you run into doing that stuff is you'll have these sort of rather than management fiefdoms, you'll have these engineering fiefdoms, where you have this increasing important hardcore set of engineers that know everything and everything has to run through them. It's just like parallel programming.
TAD: Well, I'm just saying, like --
WILL: People will max out, and you'll feel it, you know?
TAD: It felt like the worst parts of a monolith, where I want to change these three files, therefore, I have to get, like, eight people to sign off on it. Oh, two of them are on vacation right now, well...Whereas if it were a single service with five people, any other four people on my team could probably sign it off, and we could roll it out that day, right?
DAVE: Yeah. To Will's point, there's an agile principle, which is to try to defer your decisions as far as possible. The idea is to defer the decision to the point of maximum information because the information increases monotonically over time. So, if you make a decision early before you've got all the data in, and that decision commits you to a path, and then you find out you were wrong, it can get very, very expensive.
In SOA, once you split, if there's a decision that has to be made on both sides, that affects both sides, that's twice as...well, more than twice as expensive, probably four or eight times more expensive because they both have to change, and they both have to communicate. We've all had that thing where you have to do a tandem deploy because my service wants to change this, and it'll crash you if I do it before, and da da da, and that whole game.
So, to Will's point, yeah, I think you should start small and start monolith, until there is pressure to do. At Acima, we have a great, big monster service that has been calving icebergs off of it. It's not even a monolith now. It's a continent on it, and it's just, you know, calving little subcontinents across. And that's a great way to do it. I don't know if it's a great way to do it. It's exhausting, and it's painful. And we have architecture meetings every week where we scream about it. But it worked. It started small. It got big. It got too big, and then we broke it down.
And if we had started out eight years ago breaking things into, okay, you're going to be this service; you're going to be that service...the first time we tried to change a low-level thing, like, the first time you try to move outside the United States to do business, and, all of a sudden, you need postcodes instead of zip codes, right? If you have to change that in every single service and there's validations that are hard that will crash your service if that's wrong, in every single service, it's going to be a nightmare. If it's a monolith, just update the zip code table, right? Just, right? Less expensive if it's all in one place. Yeah.
TAD: How do you know when to start slicing off baby services from your service? Is it complexity? Is it team size? Like, what's your rule of thumb?
DAVE: I let pain be my guide. Like, when something hurts long enough, when something hurts, I kind of make a mental note, and if it hurts twice or three times, I start going, maybe we should fix this, right? I personally am constantly driven to just polish out the tracks where I'm frequently going. So, if I'm doing something, like, pairing with somebody and I opened up a PR, and then I went back to my terminal, and I typed git set PR number, like, why did you do that?
And I switched over on another branch, and I did git open PR. And it opened a browser to the pull request for that thing. And I'm like, well, it's because I wrote a little thing that ties branches to PR numbers. And I'm like, yeah, that's hard work to do that. I'm like, yeah, but it's really painful to not have this. And, for me, it was less painful to stay up over a weekend writing a little PR management, you know, a little number associator.
WILL: I think if you talk to your senior engineers, you'd be like, "What sucks?" Like, everybody –-
DAVE: You'll get answers [laughs].
WILL: Yeah. And they're going to be...I'm saying there's a whole, like, how do you chip away your icebergs? That's a [inaudible 53:56] question that we could spend a series of podcasts on. Honestly, I'm more interested in, how do you maintain the communication and collaboration among teams? Because I have seen the distributed services architecture is a tremendous leadership challenge.
And I've seen so much just...we could just call it deadlocks, just like parallel programming. I figured it exactly like parallel programming and sort of deadlocks, and race conditions, and stuff like that with these sort of distributed things. It becomes a leadership problem. I mean, everybody knows the reasons why you should start branching out services, why you spawn a new threat. But how do you keep these things together? How do you keep the teams working with each other, right?
DAVID: Let's do that next week. I think that could be a whole call. How do you keep parallel teams productive? Yeah.
EDDY: I guess to kind of –-
WILL: Absolutely. To keep people from fighting, you know, keep people accountable.
EDDY: I guess the answer really is...it was a long-winded conversation to just say, "It depends."
DAVE: That's the other name of this podcast, a long-winded conversation to say, "It depends," yeah.
WILL: Well, yeah, I mean, it always does. There's no right or wrong in engineering. It's just the right tool for the right job. And, believe me, there can be a wrong tool for the job, and you're going to feel that [laughs].
DAVID: Mm-Hmm. Cool. Well, we have lost our host. Mike got called out to a production issue. Thank you for listening and watching our podcast. We appreciate it.