2021 Year End Review

Kapa’a, Hawaii photo by Rebecca

Here’s a quick recap of blog posts I wrote in 2021.

Agile Experience Reports

Juggling Multiple Scrum Teams I introduce Iuri Ilatanski’s experience report about life as a multi-tasking Scrum Master. Juggling involves meeting each team’s specific needs. I was Iuri’s “shepherd”—his sounding board and advocate—as he wrote this report presented at Agile 2021. Thank you, Iuri, for being so open to discussion, reflection, and the hard work of revising your writing.

Agile Experience Reports: A Fresh Look at Timeless Content I spent August organizing the vast Agile Alliance experience reports collection hosted on the Agile Alliance’s website. The collection includes reports from 2014 to 2021 as well five XP conferences. Experience reports are personal stories that pack a punch. There are many gems of wisdom here.

Domain Driven Design

Splitting a Domain Across Multiple Bounded Contexts Sometimes it can more productive to meet the specific needs of individual users rather than to spend the time designing common abstractions in support of a “unified” model.

Design and Reality We shouldn’t assume domain experts have all the language they need to describe their problem (and all that you need to do as a software designer is to “capture” that language and make those real-world concepts evident in your code).

Models and Metaphors Listening to the language people use in modeling discussions can lead to new insights. Sometimes we find metaphors, that when pushed on, lead to a clearer understanding of the problem and clarity in our design.

Decision Making

Noisy Decisions After reading Noise: A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony, and Cass Sunstein I wrote about noisy decisions in the context of software design and architecture. These authors define noise as undesirable variability in human judgment. Often, we want to reduce noise and there are ways we can do so, even in the context of software.

Is it Noise or Euphony? At other times, however, we desire variability in judgments. In these situations variability isn’t noise, but instead an opportunity for euphony. And if you leverage that variability, you just might turn up some unexpected, positive results.

Heuristics Revisited

Too Much Salt? We build a more powerful heuristic toolkit when we learn the reasons why (and when) particular heuristics work the way they do. I now think it is equally important to seek the why behind the what you are doing as you cultivate your personal heuristics.

Models and Metaphors

When a complex technical domain isn’t easily captured in a model, look for metaphors that bring clarity.

One of us (Mathias) consulted for a client that acted as a broker for paying copyright holders for the use of their content. To do this, they figured out who the copyright holders of a work were. Then they tracked usage claims, calculated the amounts owed, collected the money, and made the payments. Understanding who owned what was one of the trickier parts of their business.

-“It’s just a technical problem.”

-“But nobody really understands how it works!”

-“Some of us understand most of it. It just happens to be a complicated problem.”

-“Let’s do a little bit of modeling anyway.”

Case Study

Determining ownership was a complicated data matching process which pulled data from a number of data sources:

  • Research done by the company itself
  • Offshore data cleaning
  • Publicly available data from a wiki-style source
  • Publicly available, curated data
  • Private sources, for which the company paid a licence fee
  • Direct submissions from individuals
  • Agencies representing copyright holders

The company had a data quality problem. Because of the variety of data sources, there wasn’t a single source of truth for any claim. The data was often incomplete and inconsistent. On top of that, there was a possibility for fraud: bad actors claimed ownership of authors’ work. Most people acted in good faith. Even then, the data was always going to be messy, and it took considerable effort to sort things out. The data was in constant flux: even though the ownership of a work rarely changes, the data did.


Data Matching

The engineers were always improving the “data matching”. That’s what they called the process of reconciling the inconsistencies, and providing a clear view on who owned what and who had to pay whom. They used EventSourcing, and they could easily replay new matching algorithms on historic data. The data matching algorithms matched similar claims on the same works in the different data sources. When multiple data sources concurred, the match succeeded.

Initially, when most sources concurred on a claim, the algorithm ignored a lone exception. When there was more contention about a claim, it was less obvious what to do. The code reflected this lack of clarity. Later the team realised that a conflicting claim could tell them more: It was an indicator of the messiness of the data. If they used their records of noise in the data, they could learn about how often different data sources, parties, and individuals agreed on successful claims, and improve their algorithm.

For example, say a match was poor: 50% of sources point to one owner and 50% point to another owner. Based on that information alone, it’s impossible to decide who the owner is. But by using historical data, the algorithm could figure out which sources had been part of successful matches more often. They could give more weight to these sources, and tip the scales in one direction or the other. This way, even if 50% of sources claim A as the owner and 50% claim B, an answer can be found.


Domain Modelling

The code mixed responsibilities: pulling data, filtering, reformatting, interpreting, and applying matching rules. All the cases and rules made the data matching very complicated. Only a few engineers knew how it worked. Mathias noticed that the engineers couldn’t explain how it worked very well. And the business people he talked to were unable to explain anything at all about how the system worked. They simply referred to it as the “data matching.” The team wasn’t concerned about this. In their eyes, the complexity was just something they had to deal with.

Mathias proposed a whiteboard modeling session. Initially, the engineers resisted. After all, they didn’t feel this was a business domain, just a purely technical problem. However, Mathias argued, the quality of the results determined who got paid what, and mistakes meant customers would eventually move to a competitor. So even if the data matching was technical, it performed an essential function in the Core Domain. The knowledge about it was sketchy, engineering couldn’t explain it, business didn’t understand it. Because of that, they rarely discussed it, and when they did, it was in purely technical terms. If communication is hard, if conversations are cumbersome, you lack a good shared model.

Through modeling, the matching process became less opaque to the engineers. We made clearer distinctions between different steps to pull data, process it, identifying a match, and coming to a decision. The model included sources, claims, reconciliations, exceptions. We drew the matching rules on the whiteboard as well, making those rules explicit first class concepts in the model. As the matching process became clearer, the underlying ideas that led to the system design started surfacing. From the “what,” we moved to the “why.” This put us in a good position to start discovering abstractions.


Trust

Gradually, the assumptions that they built the algorithm on, surfaced in the conversations. We stated those assumptions, wrote them on stickies and put them on the whiteboard. One accepted assumption was that when a data source is frequently in agreement with other sources, it is less likely to be wrong in the future. If a source is more reliable, it should be trusted more, therefore claims from that source pulled more weight in the decision of who has a claim to what. When doing domain discovery and modeling, it’s good to be observant, and listen to subtleties in the language. Words like “reliable,” “trust,” “pull more weight,” and “decision” were being used informally in these conversations. What works in these situations, is to have a healthy obsession with language. Add this language to the whiteboard. Ask questions: what does this word mean, in what context do you use it?

Through these discussions, the concept of “trust” grew in importance. It became explicit in the whiteboard models. It was tangible: you could see it, point to it, move it around. You could start telling stories about trust. Why would one source be more trusted? What would damage that trust? What edge cases could we find that would affect trust in different ways?

Trust as an Object

During the next modelling session, we talked about trust a lot. From a random word that people threw into the conversation, it had morphed into a meaningful term. Mathias suggested a little thought experiment: What if _Trust_ was an actual object in the code? What would that look like? Quickly, a simple model of Trust emerged. Trust is a Value Object, and its value represents the “amount” of trust we have in a data source, or the trust we have in a claim on a work or usage, or the trust we have in the person making the claim. Trust is measured on a scale of -5 to 5. That number determines whether a claim is granted or not, whether it needs additional sources to confirm it, or whether the company needs to do further research.

It was a major mindshift.

The old code dynamically computed similar values to determine “matches.” These computations were spread and duplicated across the code, hiding in many branches. The team didn’t see that all these values and computations were really aspects of the same underlying concept. They didn’t see that the computations could be shared, whether you’re matching sources, people, or claims. There was no shared abstraction.

But now, in the new code, those values are encapsulated in a first class concept called Trust objects. This is where the magic happens: we move from a whiteboard concept, to making Trust an essential element in the design. The team cleaned up the ad hoc logic spread across the data matching code and replaced it with a single Trust concept.

Trust entered the Ubiquitous Language. The idea that degrees of Trust are ranked on a scale from -5 to 5, also became part of the language. And it gave us a new way to think about our Core Domain: We pay owners based on who earns our Trust.

Trust as a Process

The team was designing an EventSourced system, so naturally, the conversation moved to what events could affect Trust. How does Trust evolve over time? What used to be matching claims in the old model, now became events that positively or negatively affected our Trust in a claim. Earning Trust (or losing it) was now thought of as a process. A new claim was an event in that process. Trust was now seen as a snapshot of the Trust earning process. If a claim was denied, but new evidence emerged, Trust increased and the claim was granted. Certain sources, like the private databases that the company bought a license for, were highly trusted and stable. For others, like the wiki-style sources where people could submit claims, Trust was more volatile.


Business Involvement

During the discussions about the new Trust and Trust-building concepts, the team went back to the business regularly to make sure the concepts worked. They asked for their insights into how they should assign Trust, and what criteria they should use. We saw an interesting effect: people in the business became invested in these conversations and joined in modeling sessions. Data matching faded from the conversations, and Trust took over. There was a general excitement about being able to assign and evolve Trust. The engineers’ new model became a shared model within the business.

Trust as an Arithmetic

The copyright brokerage domain experts started throwing scenarios at the team: What if a Source A with a Trust of 0 made a claim that was corroborated by a Source B with a Trust of 5? The claim itself was now highly trusted, but what was the impact on Source A? One swallow doesn’t make Spring, so surely Source A shouldn’t be granted the same level of Trust as Source B. A repeated pattern of corroborated Trust on the other hand, should reflect in higher Trust for Source A.

During these continued explorations, people from the business and engineering listed the rules for how different events impacted Trust, and coding them. By seeing the rules in code, a new idea emerged. Trust could have its own arithmetic: a set of rules that defined how Trust was accumulated. For example, a claim with a Trust of 3, that was corroborated by a claim with a Trust of 5, would now be assigned a new Trust of 4. The larger set of arithmetics addressed various permutations of claims corroborating claims, sources corroborating sources, and patterns of corroboration over time. The Trust object encapsulated this arithmetic, and managed the properties and behaviors for it.

From an anemic Trust object, we had now arrived at a richer model of Trust that was responsible for all these operations. The team came up with polymorphic Strategy objects. These allowed them to swap out different mechanisms for assigning and evolving Trust. The old data matching code had mixed fetching and storing information with the sprawling logic. Now, the team found it easy to separate it into a layer that dealt with the plumbing separate from the clean Trust model.


The Evolution of the Model

In summary, this was the evolution:

  1. Ad hoc code that computes values for matches.
  2. Using Trust in conversations that explained how the current system worked.
  3. Trust as a Value Object in the code.
  4. Evolving Trust as a process, with events (such as finding a matching claim) that assigned new values of Trust.
  5. Trust as a shared term between business and engineering, that replaced the old language of technical data matching.
  6. Exploring how to assign Trust using more real-world scenarios.
  7. Building an arithmetic that controls the computation of Trust.
  8. Polymorphic Strategies for assigning Trust.

When you find a better, more meaningful abstraction, it becomes a catalyst: it enables other modeling constructs, allowing other ideas to form around that concept. It takes exploration, coding, conversations, trying scenarios, … There’s no golden recipe for making this happen. You need to be open to possibility, and take the time for it.


Discussion

The engineers originally introduced the concept of “matching,” but that was an anemic description of the algorithm itself, not the purpose. “If this value equals that value, do this.” Data matching was devoid of meaning. That’s what Trust introduces: conceptual scaffolding for the meaning of the system. Trust is a magnet, an attractor for a way of thinking about and organizing the design.

Initially, the technical details of the problem were so complicated, and provided such interesting challenges to the engineers, that that was all they talked about with the business stakeholders. Those details got in the way of designing a useful Ubiquitous Language. The engineers had assumed that their code looked the way it needed to look. In their eyes, the code was complex because the problem of matching was complex. The code simply manifested that complexity. They didn’t see the complexity of that code as a problem in its own right. The belief that there wasn’t a better model to be found, obscured the Core Domain for both business and engineering.

The domain experts were indeed experts in the copyrights domain, and had crisp concepts for ownership, claims, intellectual property, the laws, and the industry practices. But that was not their Core Domain. The real Core was the efficient, automated business they’re trying to build out of it. That was their new domain. That explains why knowledge of copyright concepts alone wasn’t sufficient to make a great model.

Before they developed an understanding of Trust, business stakeholders could tell you detailed stories about how the system should behave in specific situations. But they had lacked the language to talk about these stories in terms of the bigger idea that governs them. They were missing crisp concepts for them.


Good Metaphors

We moved from raw code, to a model based on the new concept of Trust. But what kind of thing is this Trust concept? Trust is a metaphor.1 Actual trust is a human emotion, and partly irrational. You trust someone instinctively, and for entirely subjective reasons that might change. Machines don’t have these emotions. We have an artificial metric in our system, with algorithms to manipulate it, and we named it Trust. It’s a proxy term.

This metaphor enables a more compact conversation, as evidenced by the fact that engineers and domain experts alike can discuss Trust without losing each other in technical details. A sentence like “The claims from this source were repeatedly confirmed by other sources,” was replaced by “This source has built up trust,” and all knew what that entailed.

The metaphor allows us to handle the same degree of complexity, but we can reason about determining Trust without having to understand every detail at the point where it’s used. For those of us without Einstein brains, it’s now a lot easier to work on the code, it lowers the cognitive load.

A good metaphor in the right context, such as Trust, enables us to achieve things we couldn’t easily do before. The team reconsidered a feature that would allow them to swap out different strategies for matching claims. Originally they had dismissed the idea, because, in the old code, it would have been prohibitively expensive to build. It would have resulted in huge condition trees and sprawling dependencies on shared state. They’d have to be very careful, and it would be difficult to test that logic. With the new model, swapping out polymorphic Strategy objects is trivial. The new model allows testing low level units like the Trust object, higher level logic like the Trust-building process, and individual Claims Strategies, with each test remaining at a single level of abstraction.

Our Trust model not only organizes the details better, but it is also concise. We can go to a single point in the code and know how something is determined. A Trust object computes its own value, in a single place in the code. We don’t have to look at twenty different conditionals across the code to understand the behavior; instead we can look at a single strategy. It’s much easier to spot bugs, which in turn helps us make the code more correct.

A good model helps you reason about the behavior of a system. A good metaphor helps you reason about the desired behavior of a system.

The Trust metaphor unlocked a path to tackle complexity. We discovered it by listening closely to the language used to describe the solution, using that language in examples, and trying thought experiments. We’re not matching data anymore, we’re determining Trust and using it to resolve claims. Instead of coding the rules, we’re now encoding them. We’re better copyright brokers because of this.


Bad Metaphors

Be wary of bad, ill-fitting metaphors. Imagine the team had come with Star Ratings as the metaphor. Sure, it also works as a quantification, but it’s based on popularity, and calculates the average. We could still have built all the same behavior of the Trust model, but with a lot of bizarre rules, like “Our own sources get 20 five-star ratings.” When you notice that you have to force-fit elements of your problem space into a metaphor, and there’s friction between what you want to say and what that metaphor allows you to say, you need to get rid of it. No metaphor will make a perfect fit, but a bad metaphor leads you into awkward conversations without buying you clarity.

To make things trickier, whenever you introduce a new metaphor, it can be awkward at first. In our case study, Trust didn’t instantly become a fully explored and accepted metaphor. There’s a delicate line between the early struggles of adopting a new good metaphor, and one that is simply bad. Keep trying, work on using your new metaphor, see if it buys you explanatory power, and don’t be afraid to drop it if it does not.

And sometimes, there simply isn’t any good metaphor, or even a simpler model to be found. In those cases, you just have to crunch it. There’s no simplification to be found. You just have to work out all the rules, list all the cases, and deal with the complexity as is.

Conclusion

To find good metaphors, put yourself in a position where you’ll notice them in conversation. Invite diverse roles into your design discussions. Have a healthy obsession with language: What does this mean? Is this the best way to say it? Be observant about this language, listen for terms that people say off the cuff. Capture any metaphors that people use. Reinforce them in conversations, but be ready to drop them if you feel you have to force-fit them. Is a metaphor bringing clarity? Does it help you express the problem better? Try scenarios and edge cases, even if they’re highly unlikely. They’ll teach you about the limits of your metaphor. Then distill the metaphor, agree on a precise meaning. Use it in your model, and then translate it to your code and tests. Metaphors are how language works, how our brains attach meaning, and we’re using that to our advantage.


Written by Rebecca Wirfs-Brock and Mathias Verraes

  1. If you want to learn more about metaphors and how they shape language, read “Metaphors we Live By” (George Lakoff and Mark Johnson, University of Chicago Press, 2003).

  • Share this article on Twitter or LinkedIn
  • Follow Rebecca on Twitter, LinkedIn
  • Too Much Salt?

    Practiced speakers and writers know that good examples rarely tell the whole story. Instead they shape their narratives to make the big ideas stand out. Stories are bent ever so slightly, plot details are pared down, leaving space for emphasis and audience impact.

    I wouldn’t go so far as to say we invent fiction, but rather that we simplify our stories to make them compelling. Too many details and our audience would tune us out. And when we repeatedly tell these stories, we come to believe we’ve pared down the narrative to its essence. We’ve nailed it!

    But what happens when you encounter information that sheds new light on such a story? What if the story you’ve told no longer rings quite true?

    The past few years I’ve explored Billy Vaughn Koen’s definition of heuristics as they relate to software design and architecture. I’ve written blog posts and essays, presented talks, keynotes, and workshops about heuristics (for a gentle introduction to different kinds of heuristics see Growing Your Personal Design Heuristics Toolkit).

    Along the way I’ve encouraged people to discover, distill, and own their personal heuristics. I advise them to not just take every bit of advice they find about software design as being authoritative. Instead, they should question the validity of that advice’s applicability to their specific context. They should also bring their own heuristics they’ve accrued through experience to bear on the problem at hand.

    I start most heuristics presentations with a story about my experience cooking my very first Blue Apron recipe for Za’atar Roasted Broccoli Salad (for details see Nothing Ever Goes Exactly by the Book). I jokingly point out all the places that the recipe suggests adding salt. I then postulate that if I just blindly followed Blue Apron instructions without applying any judgment, the dish would be way too salty.

    Instead of following the recipe, I told how I used my past experiences to “modify” the instructions to fit with my understanding of what makes for a tasty dish. In short, I ignored lots of places where the recipe suggested adding salt.

    My heuristic for this situation was to ignore advice on where to add salt if it seems excessive and only add salt to taste at the end. Following that heuristic, I most likely made a much blander dish that, while it looked great, undoubtedly lacked flavor.

    But… achieving a tasty dish wasn’t the point of my original story!

    Instead, it was to encourage using personal judgment and heuristics based on past experiences. I wanted to emphasize that we each have experiences and insights that we can and should draw on in many situations. Simply trusting and blindly following “experts” or “recipes” because they are published or credentialed can lead us astray—or to cooking inedible dishes. We should value and treasure our experiences and draw upon the heuristics we’ve accrued through those experiences.

    Ta-da! Point made! Perhaps…

    A week ago as I was waiting for surgery to repair my broken nose (that’s another story for another time) I started reading How to Taste, by Becky Selengut. At the time I was detached, slightly impatient, and resigned to just being there in the moment. The doctor was late and I had time to kill.

    The introductory first chapter starts: “Telling you to ‘season to taste’ does nothing to teach you how to taste—and that is precisely the lofty goal of this book. Once you know the most common culprits when your dish is out of whack, you’ll save tons of time spinning your wheels grabbing for random solutions. You’ll start thinking like a chef. Some people are born knowing how to do this—they are few and far between and most likely have more Michelin stars that you or I; the rest of us need to be taught. I’ve got your back.”

    Now that grabbed my attention!

    Unless I was superhuman (I’m not), I can’t rely on my instincts to become a better cook, knowing when and how much seasoning or salt to add.

    My experiences cooking have certainly been ad hoc. And the heuristic I applied for salting that Blue Apron dish came from who knows where. I never learned why I was doing what I was doing when following a recipe or ignored some parts of it. Instead, I learned a few shortcuts and substitutions, largely through combing the internet. And while my technique may have improved over time, I haven’t developed the ability to craft a dish with nuanced flavors, let alone improvise one.

    Becky suggests reading her book “…start[ing] at the beginning, as I intend to build upon the concepts one puzzle piece at a time.” Each chapter presents fundamental facts, reinforced by a recipe that highlights the important points of the chapter and then suggesting Experiment Time activities intended to develop a reader’s palate

    Aha, again!

    A good way to learn how to exercise judgment is to perform structured experiments after you’ve learned a bit of theory and why things—in this case, the chemistry of cooking—work the way they do.

    I quickly read through the chapter on Salt and learned: Salt is a flavorant—bringing out the flavor of other ingredients. Salting early and often can improve taste dramatically. For example, adding salt to onions as they sauté can speed up the cooking process and causes them to sweat out water. And when you only season a soup at the end, no matter how much salt you add, the flavors of unsalted ingredients (for example potatoes), fall flat. You end up over salting the soup stock and still having tasteless, bland potatoes. Salt needs to be added at the right time, often at several steps in the cooking process, to have the desired result. And to my surprise, different kinds of salt—iodized, kosher, flaky, fine-grained sea salt, each have their own flavoring properties and ratios in recipes.

    This brought to mind a whole new way of thinking about my Blue Apron cooking experience. Blue Apron didn’t have bad recipes, but their recipes didn’t make me a better cook either. This is because most recipes focus on the how—not the why. Their pretty little pictures and step-by-step instructions did nothing to help me to achieve an understanding of how to achieve tasty dishes.

    And that’s a problem if I want to get better at cooking tasty dishes and not simply at following recipes.

    I’m afraid way too much information we absorb—whether it is about cooking or agile practices or software development—is presented as step-by-step lists of instructions, without any explanation of why it makes sense to do so or the consequences of not doing a particular step specifically as instructed.

    Consequently, we learn a bunch of procedures, or simply cut and paste them. We follow instructions because somebody says this is what we should do. Over time we may build up a playbook of those procedures but our understanding of why these procedures work isn’t very deep or rich or adaptable.

    If we want to truly gain proficiency in cooking (or software design or programming or running or gardening or basket weaving), instruction that emphasizes the why along with the how is what we need.

    Teach me some facts that ground what I’m about to do in a bit of knowledge. Spark my curiosity. Inspire me. And then give me tasks that let me tinker and practice applying that knowledge. Only then will my actions become integrated with that knowledge, allowing me to build up adaptable heuristics that I can use in novel situations.

    In hindsight, I now believe that the story I told about applying my personal heuristics and knowledge to a problem was OK. It’s reasonable to be a healthy skeptic when someone says, “Just do as I say. Trust me,” when attempting a new task. Distilling you own heuristics from previous experiences and applying them in familiar situations is also good. And writing them down helps to bring them to your awareness.

    But in addition, I now think it is equally important to seek the why behind the what you are doing. And to loosen your grip on those simpler narratives you’ve held dear. They are not the whole story and they may be holding you back. Be open to new information that may reshape your stories and enhance your heuristic toolkit.

    Perhaps one day, with enough knowledge and practice, I’ll be able to create a flavor profile for a dish instead of merely following the recipe.

    Design and Reality

    Reframing the problem through design.

    “The transition to a really deep model is a profound shift in your thinking and demands a major change to the design.”

    Domain-Driven Design, Eric Evans

    There is a fallacy about how domain modelling works. The misconception is that we can design software by discovering all the relevant concepts in the domain, turn them into concepts in our design, add some behaviors, and voilà, we’ve solved our problem. It’s a simplistic perception of how design works: a linear path from A to B:

    1. understand the problem,
    2. apply design,
    3. end up with a solution.

    That idea was so central to early Object-Oriented Design, that one of us (Rebecca) thought to refute it in her book:

    “Early object design books including [my own] Designing Object-Oriented Software [Wirfs-Brock et al 1990], speak of finding objects by identifying things (noun phrases) written in a design specification. In hindsight, this approach seems naive. Today, we don’t advocate underlining nouns and simplistically modeling things in the real world. It’s much more complicated than that. Finding good objects means identifying abstractions that are part of your application’s domain and its execution machinery. Their correspondence to real-world things may be tenuous, at best. Even when modeling domain concepts, you need to look carefully at how those objects fit into your overall application design.”

    Object Design, Rebecca Wirfs-Brock

    Domain language vs Ubiquitous Language

    The idea has persisted in many naive interpretations of Domain-Driven Design as well. Domain language and Ubiquitous Language are often conflated. They’re not the same.

    Domain language is what is used by people working in the domain. It’s a natural language, and therefore messy. It’s organic: concepts are introduced out of necessity, without deliberation, without agreement, without precision. Terminology spreads across the organization or fades out. Meaning shifts. People adapt old terms into new meanings, or terms acquire multiple, ambiguous meanings. It exists because it works, at least well enough for human-to-human communication. A domain language (like all language) only works in the specific context it evolved in.

    For us system designers, messy language is not good enough. We need precise language with well understood concepts, and explicit context. This is what a Ubiquitous Language is: a constructed, formalized language, agreed upon by stakeholders and designers, to serve the needs of our design. We need more control over this language than we have over the domain language. The Ubiquitous Language has to be deeply connected to the domain language, or there will be discord. The level of formality and precision in any Ubiquitous Language depends on its environment: a meme sharing app and an oil rig control system have different needs.

    Drilling Mud

    Talking of oil rigs:

    Rebecca was invited to consult for a company that makes hardware and software for oil rigs. She was asked to help with object design and modelling, working on redesigning the control system that monitors and manages sensors and equipment on the oil rig. Drilling causes a lot of friction, and “drilling mud” (a proprietary chemical substance) is used as a lubricant. It’s also used as a carrier for the rocks and debris you get from drilling, lifting it all up and out of the hole. Equipment monitors the drilling mud pressure, and by changing the composition of the mud during drilling, you can control that pressure. Too much pressure is a really bad thing.

    And then an oil rig in the gulf exploded.

    As the news stories were coming out, the team found out that the rig was using a competitor’s equipment. Whew! The team started speculating about what could have happened, and were thinking about how something like that could happen with their own systems. Was it faulty equipment, sensors, the telemetry, communications between various components, the software?

    Scenarios

    When in doubt, look for examples. The team ran through scenarios. What happens when a catastrophic condition occurs? How do people react? When something fails, it’s a noisy environment for the oil rig engineers: sirens blaring, alarms going off, … We discovered that when a problem couldn’t be fixed immediately, the engineers, in order to concentrate, would turn off the alarms after a while. When a failure is easy to fix, the control system logs reflect that the alarm went on and was turned off a few minutes later.

    But for more consequential failures, even though these problems take much longer to resolve, it still shows up on the logs as being resolved within minutes. Then, when people study the logs, it looks like the failure was resolved quickly. But that’s totally inaccurate. This may look like a software bug, but it’s really a flaw in the model. And we should use it as an opportunity to improve that model.

    The initial modeling assumption is that alarms are directly connected to the emergency conditions in the world. However, the system’s perception of the world is distorted: when the engineers turn off the alarm, the system believes the emergency is over. But it’s not, turning an alarm off doesn’t change the emergency condition in the world. The alarms are only indirectly connected to the emergency. If it’s indirectly connected, there’s something else in between, that doesn’t exist in our model. The model is an incomplete representation of a fact of the world, and that could be catastrophic.

    A Breakthrough

    The team explored scenarios, specifically the weird ones, the awkward edge cases where nobody really knows how the system behaves, or even how it should behave. One such scenario is when two separate sensor measurements raise alarms at the same time. The alarm sounds, an engineer turns it off, but what happens to the second alarm? Should the alarm still sound or not? Should turning off one turn off the other? If it didn’t turn off, would the engineers think the off switch didn’t work and just push it again?

    By working through these scenarios, the team figured out there was a distinction between the alarm sounding, and the state of alertness. Now, in this new model, when measurements from the sensors exceed certain thresholds or exhibit certain patterns, the system doesn’t sound the alarm directly anymore. Instead, it raises an alert condition, which is also logged. It’s this alert condition that is associated with the actual problem. The new alert concept is now responsible for sounding the alarm (or not). The alarm can still be turned off, but the alert condition remains. Two alert conditions with different causes can coexist without being confused by the single alarm. This model decouples the emergency from the sounding of the alarm.

    The old model didn’t make that distinction, and therefore it couldn’t handle edge cases very well. When at last the team understood the need for separating alert conditions from the alarms, they couldn’t unsee it. It’s one of those aha-moments that seem obvious in retrospect. Such distinctions are not easily unearthed. It’s what Eric Evans calls a Breakthrough.

    An Act of Creation

    There was a missing concept, and at the first the team didn’t know something was missing. It wasn’t obvious at first, because there wasn’t a name for “alert condition” in the domain language. The oil rig engineers’ job isn’t designing software or creating a precise language, they just want to be able to respond to alarms and fix problems in peace. Alert conditions didn’t turn up in a specification document, or in any communication between the oil rig engineers. The concept was not used implicitly by the engineers or the software; no, the whole concept did not exist.

    Then where did the concept come from?

    People in the domain experienced the problem, but without explicit terminology, they couldn’t express the problem to the system designers. So it’s us, the designers, who created it. It’s an act of creative modeling. The concept is invented. In our oil rig monitoring domain, it was a novel way to perceive reality.

    Of course, in English, alert and alarm exist. They are almost synonymous. But in our Ubiquitous Language, we agreed to make them distinct. We designed our Ubiquitous Language to fit our purpose, and it’s different from the domain language. After we introduced “alert conditions”, the oil rig engineers incorporated it in their language. This change in the domain is driven by the design. This is a break with the linear, unidirectional understanding of moving from problem to solution through design. Instead, through design, we reframed the problem.
    Is it a better model?

    How do we know that this newly invented model is in fact better (specifically, more fit for purpose)? We find realistic scenarios and test them against the alert condition model, as well as other candidate models. In our case, with the new model, the logs will be more accurate, which was the original problem.

    But in addition to helping with the original problem, a deeper model often opens new possibilities. This alert conditions model suggests several:

    • Different measurements can be associated with the same alert.
    • Alert conditions can be qualified.
    • We can define alarm behaviors for simultaneous alert conditions, for example by spacing the alarms, or picking different sound patterns.
    • Critical alerts could block less critical ones from hogging the alarm.
    • Alert conditions can be lowered as the situation improves, without resolving them.

    These new options are relevant, and likely to bring value. Yet another sign we’d hit on a better model is that we had new conversations with the domain experts. A lot of failure scenarios became easier to detect and respond to. We started asking, what other alert conditions could exist? What risks aren’t we mitigating yet? How should we react?

    Design Creates New Realities

    In a world-centric view of design, only the sensors and the alarms existed in the real world, and the old software model reflected that accurately. Therefore it was an accurate model. The new model that includes alerts isn’t more “accurate” than the old one, it doesn’t come from the real world, it’s not more realistic, and it isn’t more “domain-ish”. But it is more useful. Sensors and alarms are objective, compared to alert conditions. Something is an alert condition because in this environment, we believe it should be an alert condition, and that’s subjective.

    The model works for the domain and is connected to it, but it is not purely a model of the problem domain. It better addresses the problems in the contexts we envision. The solution clarified the problem. Having only a real world focus for modelling blinds us to better options and innovations.

    These creative introductions of novel concepts into the model are rarely discussed in literature about modelling. Software design books talk about turning concepts into types and data structures, but what if the concept isn’t there yet? Forming distinctions, not just abstractions, however, can help clarify a model. These distinctions create opportunities.

    The model must be radically about its utility in solving the problem.

    “Our measure of success lies in how clearly we invent a software reality that satisfies our application’s requirements—and not in how closely it resembles the real world.”

    Object Design, Rebecca Wirfs-Brock

    Written by Mathias Verraes and Rebecca Wirfs-Brock. Special thanks to Eric Evans for the spot on feedback and constructive advice.

    Is it Noise or Euphony?

    The book Noise: A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony, and Cass Sunstein has me thinking deeply about noisy decisions.  In this context, noise is defined as undesirable variability in judgments. They explain two different kinds of noise—level noise (variability in the average level of judgments by different people) and pattern noise. Pattern noise is further broken down into the unique noise individuals bring into any decision and occasion noise—noise caused by the particular context surrounding particular decisions. Occasion noise can be influenced by our mood, the interactions with people we’re deciding with, what we ate for dinner last night, or even the weather.

    So when is noise worth reducing?  And what can we do to reduce that noise? How do we know our efforts at noise reduction have the desired effect?

    Are there situations where variability might be desirable? I haven’t found a name in the literature for such desirable variation. Perhaps euphony—a harmonious succession of words having a pleasing sound—is one possibility. In these situations we’re favoring finding some euphony over conforming to a noise-free rigid standard for our judgments.

    I’ll use the review of conference submissions of papers, talks, and workshops as an example of where both noise and euphony play a part in our decision-making, as it is one I am quite familiar with.

    One major source of variability is when new reviewers join a review committee. Newcomers often look at submissions differently than experienced reviewers. But not all variability is noise. If some variability is welcomed, expected, and encouraged, the review process greatly benefits from fresh perspectives. This kind of variability adds spice.

    And yet, there may be standards (whether formally written down or more loosely held) we’d like uphold for what we consider a worthy submission. One way to reduce level noise in reviews is to ensure that reviewers understand these expectations. One way to convey this information is to hold a meeting where we discuss and present examples of submissions and exemplary reviews (reviews from prior years are a good source). Newcomers can learn what a reasonable proposal is and what is expected in a review. They also get to know their peers, ask questions and, in effect, “calibrate” their expectations for reviewing.

    But this meeting is insufficient to remove another major source of noise—occasion noise caused by group interactions. Kahneman, Sibony, and Sunstein state: “Groups can go in all sorts of directions depending in part on factors that should be irrelevant. Who speaks first, who speaks last, who speaks with confidence, who is wearing black, who is seated next to whom, who smiles or frowns or gestures at the right moment—all these factors, and many more, affect outcomes.” Group dynamics introduce noise.

    But there are several practical ways to further reduce the noise in group decisions. Oscar Nierstrasz wrote a set of patterns called Identify the Champion for reviewing academic papers. I encourage anyone running a conference to consider a review process along the lines of what Oscar introduces. I’ve adapted these patterns and process to non-academic conference reviewing with only a few minor tweaks.

    The key ideas in these patterns are the roles of champion and detractor, and a structured process for discussing submissions. Champions are strong advocates for a submission who are prepared to discuss its merits; detractors disapprove of a submission and are prepared to discuss its weaknesses.

    Submissions are discussed in groups according to their highest and lowest scores. Care is taken to identify proposals with both extreme high and low scores, and to not to rank submissions numerically. If a submission has no champion, it isn’t discussed. It is rejected. Ranking and then discussing submissions one-by-one in order would only add level noise (actually I find we get numbed by reviewing and tend to reject “lower ranked” submissions without enough consideration).

    The review committee is asked to suspend final judgments until all championed submissions are presented. The champion is first invited introduce the submission and explain why it should be accepted. Then, detractors are invited to state their reasons. At the end of all presentations, discussion is opened for all and the committee tries to reach consensus.

    In practice, following this discussion protocol, it is easy to accept outstanding submissions—they typically have plenty of champions. This leaves the bulk of our time to dig into the strengths and weaknesses those championed submission that have mixed reviews.

    The Identify the Champion process forces me to hit “pause” on my judgments and to not jump to premature conclusions. And the first thing we hear about any submission are its positive aspects. When detractors speak, I get a richer understanding of that submission. Although I might have had some initial impressions, I find they can and do change.

    Sometimes I warm up to a submission. At other times, detractors’ perspectives grab my attention and make me revisit whether the submission is as strong as I had initially thought.

    The cumulative weight of all this discussion has an even more profound effect. I find I am much more accepting of the outcome: what will happen will happen. Yes, there is unpredictability in this decision-making process. But we’re all trying to make reasonable decisions as a group. I end up actively engaged in making the outcome the best it can be and supportive of our collective decisions.

    Although the Identify the Champion review process still has noise (it is hard to eliminate noise caused by group dynamics entirely), I believe it to be less noisy than most other review processes I’ve participated in.

    One downside, however, is that it can be exhausting. To avoid having some occasion noise creep back in, it’s good to ensure that reviewers get sufficient breaks to meet their personal needs, and not get too tired or cranky or hungry.

    One place I’ve applied my adaptation of the Identify the Champion pattern is for Agile Alliance experience report submissions. Experience report submissions are “pitches” for written experience reports. Only after a submission has been accepted does the actual writing begin. So as reviewers, we’re not only judging the topic of the pitch but also whether the submitter will be able to write a compelling report. Champions of experience reports also commit to shepherding the writing of the reports. These shepherd-champions commit to reviewing and commenting on drafts of reports are as they are written over a period of several weeks. Now that’s real commitment! Frequently we have more championed submissions than room in the conference program. So our judgments come down to some difficult choices.

    Before we hold our review meeting, we ask reviewers to give us two lists: submissions they’d like to shepherd and an optional list of submissions they’d like to see on the program (but do not want to shepherd). At our meeting, we then have a lively discussion where champions forcefully advocate for their proposals and gain others’ support. Once again, I find we spend most of our time discussing those submissions that have mixed reviews. But we also spend time a lot of time listening to champions and then as a group making tradeoffs between submissions (remember we have more good submissions than we have capacity to accept them). The message we convey to all reviewers is that that if you really want to shepherd a submission, we as a group will support your decision to be a shepherd-champion. But let’s discuss first.

    We can’t guarantee the quality of any final report. We base our judgments on both what the submitters have written (in many cases, there has been a back and forth conversation between submitters and reviewers that we can all see that has led submitters to reshape and refine their proposals) as well as the convincing arguments of champions.

    Judging conference submissions is subjective. Our process acknowledges that. We accept the risk of selecting a less-than-stellar report proposal over missing an opportunity for a novel or insightful report.

    Is it our goal to eliminate noise in our decision-making? Where we can, yes. But, that isn’t our only goal. If we tried to eliminate it entirely we might end up establishing standards for experience report submissions that would inadvertently filter out newness or novelty. In our search for a bit of euphony we stretch out to accept a submission if there is a convincing champion. Consequently, we accept a little variability (and unpredictability) in our decision-making. However, at the end of our review process, reviewers are generally happy with the proposals we accept, happy with their shepherding assignments, and eager to begin working with their experience report authors. An important aspect of our process, which cannot be understated, is that we also work hard to make good matches between each champion-shepherd and prospective authors. Not only do reviewers buy into the review process, they also commit to being ongoing champions.

    Noise reduction is important in many situations, especially group decisions. Paying careful attention to how the group is informed, discusses, and then decides can reduce noise. Paying attention to the voices of champions is one way to turn up euphony. By tuning your decision-making processes you can achieve these goals.

    Noisy Decisions

    “The world is noisy and messy. You need to deal with the noise and uncertainty.”–Daphne Koller

    I have tinnitus. When there isn’t much sound in my environment, for me it still isn’t quiet. I hear a constant background hum. It is hard to describe what this noise sounds like. I’ve lived with it for too long.  Remembering back to when I first noticed it, I thought there was some nearby electrical device humming. Was it my phone plugged into the wall outlet? Or??? I remember getting up from bed to hunt for the source of that noise.

    I can’t forget that noise or ignore it. It doesn’t go away. But it doesn’t dominate my headspace. I’ve learned to slip between that noise and my desire to sleep or to just be in a quiet place, and not let it distract me. I’ve learned to deal with tinnitus.

    Recently I finished reading Noise: A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony, and Cass Sunstein.

    The entire book is about the “noise” in human judgments and what we can do to lessen its effects. So what exactly is this noise? A simple definition is “noise” is undesirable variability in judgments. Call this system noise if you will.

    Both recurring and onetime decisions are influenced by noise. Depending on the time of day, how well I slept last night, what others say, and even how we as a group decide how to decide effects my judgment. This noise, in addition to any biases I have, affects all my judgments.

    Kahneman, Sibony and Sunstein introduce two different types of system noise: level noise and pattern noise.  Let’s consider each in turn.

    Level noise is easiest to understand. It is the variability in the average level of judgments by different people. People judge on different scales. Consider rating a talk at a conference. Perhaps you never give a conference speaker the highest possible rating because you believe they could do better. Or, maybe if you are star struck, you always rank a presentation from a well-known speaker more highly. Personally, I know that I tend to not rate speakers either as very high or very low, because, well…I’m sort of middling with my ratings. On average, humans aren’t average in their judgments.

    The other kind of noise, pattern noise, is often an even bigger factor in our judgments. It is comprised two parts: occasion noise and our own personal idiosyncratic tics. Occasion noise is the variability in the judgment at different points in time. Depending on my mood, how stressful the situation, how well I slept last night, or how the question is put to me, my judgment will vary. A simple example of occasion noise that software folks can relate to is estimating how long it will take to complete a task. My mind isn’t the same today as it was yesterday. Heck, from moment to moment, I might give a different answer simply because I am thinking about the task differently, or I that I am hungry (and hence tend to come to a snap judgment), or I’m grumpy, or I’m happy.

    The second source of pattern noise is our personal attitudes toward the particular judgment context. Consider, for example, this kind of noise when reviewing conference proposals for papers or talks. Some reviewers are harsher in their personal rating for some proposals and more lenient in others. This variability reflects a complex pattern in the individual attitudes of reviewers toward particular proposals. For example, one person may be relatively generous in their review of proposals on a particular topic. Another may be particularly keen on proposals that seem to break new ground but be a harsher judge of proposals on topics that are perceived to cover familiar territory.

    As Kahneman, Sibony, and Sunstein state: “Noise in individual judgment is bad enough. But group decision making adds another layer to the problem. Groups can go in all sorts of directions depending in part on factors that should be irrelevant. Who speaks first, who speaks last, who speaks with confidence, who is wearing black, who is seated next to whom, who smiles or frowns or gestures at the right moment—all these factors, and many more, affect outcomes.”

    Having been part of many conference review teams as well as on the receiving end of their reviews over the years…I find that the dynamics of group decisions to be a particularly salient example of system noise.

    The information about system noise in general and noise in group decision making can be rather depressing. If we humans are naturally wired to be imperfect and flawed in our judgments, how can we hope to make reasonable decisions? And, once we become aware of our judgment errors, and try to be better decision-makers, the actions required to lessen the effects of noise these authors suggest seem surprisingly difficult to carry out.

    Awareness is a good first step. But it’s not enough. In contrast to tinnitus, of which I’m constantly aware, the noise in our judgments is at first difficult to perceive. But once you become aware of sources of system noise, you start noticing them everywhere. How and when (and even whether) it is appropriate or feasible to mitigate these sources of noise is a topic for another day.

    Splitting a Domain Across Multiple Bounded Contexts

    How designing for business opportunities and the rate of change may give you better contexts.

    
    
    
    
    

    I have started collaborating with Mathias Verraes and writing on the topic of Bounded Contexts and strategic design. This blog post is the first in what I hope will be an ongoing discussion about design choices and their impacts on sustainable software systems. –Rebecca

    Imagine a wholesaler of parts for agricultural machines. They’ve built a B2B webshop that resellers and machine servicing companies use to order. In their Ubiquitous Language, an Order represents this automated process. It enables customers to pick products, apply the right discounts, and push it to Shipment.

    Our wholesaler merges with a competitor: They’re an older player with a solid customer base and a huge catalog. They also have an ordering system, but it’s much more traditional: customers call, and an account manager enters the order, applies an arbitrary discount, and pushes it to Shipment.


    The merged company still has a single Sales Subdomain, but it now has two Sales Bounded Contexts. They both have concepts like Order and Discount in their models, and these concepts have fundamentally the same meaning. The employees from both wholesalers agree on what an order or a discount is. But they have different processes for using them, they enter different information in the forms, and there are different business rules.

    In the technical sense that difference is expressed in the object design, the methods, the workflows, the logic for applying discounts, the database representation, and some of the language. It runs deeper though: for a software designer to be productive in either Bounded Context, they’d have to understand the many distinctions between the two models. Bounded Contexts represent Understandability Boundaries.

    In a perfectly designed system, our ideal Bounded Contexts would usually align nicely with the boundaries of the subdomains. In reality though, Bounded Contexts follow the contours of the evolution of the system. Systems evolve along with the needs and opportunities of organisations. And unfortunately, needs and opportunities don’t often arise in ways that match our design sensibilities. We’re uncomfortable with code that could be made more consistent, if only we had the time to do so. We want to unify concepts and craft clean abstractions, because we think that is what we should do to create a well-designed system. But that might not be the better option.

    Deliberate Design Choices

    The example above trivially shows that a single subdomain may be represented by multiple Bounded Contexts.

    Bounded Contexts may or may not align with app or service boundaries. Similarly, they may or may not align with domain boundaries. Domains live in the problem space. They are how an organisation perceives its areas of activity and expertise. Bounded Contexts are part of the solution space; they are deliberate design choices. As a systems designer, you choose these boundaries to manage the understandability of the system, by using different models to solve different aspects of the domain.

    You might argue that in the wholesaler merger, the designers didn’t have a choice. It’s true that the engineers didn’t choose to merge the companies. And there will always be external triggers that put constraints on our designs. At this point however, the systems designers can make a case for:

    • merging the two Sales Contexts,
    • migrating one to the other,
    • building a new Sales Context to replace both,
    • postponing this effort,
    • or not doing anything and keeping the existing two Contexts.

    These are design choices, even if ultimately the CEO picks from those options (because of the expected ROI for example). Perhaps, after considering the trade-offs, keeping the two Sales Contexts side by side is the best strategic design choice for now, as it allows the merged company to focus on new opportunities. After all, the existing systems do serve their purpose. The takeaway here is that having two Bounded Contexts for a single Subdomain can be a perfectly valid choice.

    Twenty Commodities Traders

    When there is no external trigger (as in the wholesaler merger), would you ever choose to split a single domain over multiple Contexts, deliberately?

    One of us was brought in to consult for a small commodities trader. They were focused on a few specialty commodities, and consisted of 20 traders, some operational support roles, and around 10 developers. The lead engineer gave a tour of the system, which consisted of 20 Bounded Contexts for the single Trading Domain. There was one Context for each trader.

    This seemed odd, and our instinct was to identify similarities, abstract them, and make them reusable. The developers were doing none of that. At best they were copy-pasting chunks of each other’s code. The lead engineer’s main concern was “Are we doing the optimal design for our situation?” They were worried they were in a big mess.

    Were they?

    Every trader had their own representation of a trade. There were even multiple representations of money throughout the company. The traders’ algorithms were different, although many were doing somewhat similar things. Each trader had a different dashboard. The developers used the same third party libraries, but when they shared their own code between each other, they made no attempt at unifying it. Instead, they copied code, and modified it over time as they saw fit. A lot of the work involved mathematical algorithms, more than typical business oriented IT.

    It turned out that every trader had unique needs. They needed to move fast: they experimented with different algorithms, projections, and ways of looking at the market. The developers were serving the traders, and worked in close collaboration with them, constantly turning their ideas into new code. The traders were highly valued, they were the prima donnas in a high stress, highly competitive environment. You couldn’t be a slow coder or a math slacker if you wanted to be part of this. There were no (Jira) tickets, no feature backlogs. It was the ultimate fast feedback loop between domain experts and programmers.

    Things were changing very rapidly, every day. Finding the right abstractions would have taken a lot of coordination, it would slow development down drastically. An attempt at unifying the code would have destroyed the company.

    This design was not full of technical debt. It also wasn’t a legacy problem, where the design had accidentally grown this way over the years. The code was working. This lack of unifying abstractions was a deliberate design choice, fit for purpose, even if it seems like a radical choice at first. And all the developers and traders were happy with it.

    These weren’t merely separate programs with a lot of repeated code either. This was a single domain, split over 20 Bounded Contexts, each with their own domain model, their own Ubiquitous Language, and their own rate of change. Coordinating the language and concepts in the models, would have increased the friction in this high speed environment. By deliberately choosing to design individual Contexts, they eliminated that friction.

    Trade-offs

    There are consequences of this design choice: When a developer wanted help with a problem, they had to bring that other developer up to speed. Each developer, when working in another bounded context, expected that they’d have to make a context switch. After all, their terms and concepts were different from each other, even though they shared similar terminology. Context-switching has a cost, which you’ve probably experienced if you’ve work on different projects throughout a day. But here, because the Contexts were clearly well-bounded, this didn’t cause many problems. And sometimes, by explaining a problem to another developer with a similar background (but a different Bounded Context), solutions became obvious.

    Multiple Bounded Contexts in Ordinary IT Systems

    The trading system is an extreme example, and you won’t come across many environments where a single Subdomain with 20 Bounded Contexts would make sense. But there are many common situations where you should consider splitting a domain. If in your company, the rules about pricing for individual and corporate customers are different, perhaps efforts to unify these rules in a single domain model will cost more than it is worth. Or in a payroll system, where the rules and processes for salaried and hourly employees are different, you might be better off splitting this domain.

    The trading system is an extreme example, and you won’t come across many environments where a single Subdomain with 20 Bounded Contexts would make sense. But there are many common situations where you should consider splitting a domain. If in your company, the rules about pricing for individual and corporate customers are different, perhaps efforts to unify these rules in a single domain model will cost more than it is worth. Or in a payroll system, where the rules and processes for salaried and hourly employees are different, you might be better off splitting this domain.

    Conclusion

    The question is not: Can I unify this? Of course you can. But should you? Perhaps not. The right Context boundaries follow the contours of the business. Different areas change at different times and at different speeds. And over time what appears similar may diverge in surprising and unexpectedly productive ways (if given the opportunity). Squeezing concepts into a single model is constraining. You’re complicating your model by making it serve two distinct business needs, and taking a continued coordination cost. It’s a hidden dependency debt.

    There’s two heuristics we can derive here:

    1. Bounded Contexts shouldn’t serve the designer’s sensibilities and need for perfection, but enable business opportunities.
    2. The Rate of Change Heuristic: Consider organizing Bounded Contexts so they manage related concepts that change at the same pace.

    Written by Mathias Verraes @mathiasverraes and Rebecca Wirfs-Brock @rebeccawb.

    Design (Un)Certainty and Decision Trees

    Billy Vaughn Koen, in The Discussion of The Method: Conducting the Engineer’s Approach to Problem Solving, says, “the absolute value of a heuristic is not established by conflict, but depends upon its usefulness in a specific context.”

    Heuristics often compete and conflict with each other. Frequently I use examples I find in presentations or blog posts to illustrate this point.

    For example, I took this photo of a presentation by Michiel Overeem at DDD Europe where he surveyed various approaches people employed to update event stores and their event schemas.

    Five different alternatives for updating event stores

    Five different alternatives for updating event stores


    Event stores are often used (not exclusively) in event-sourced architectures, where instead of storing data in traditional databases, changes to application state are stored as a sequence of events in event stores. For an introduction to event sourced architectures see Martin Fowler’s article. An event store is comprised of a collection of uniquely identified event streams which contain collections of events. Events are typed, uniquely identified and contain a set of attribute/value pairs.

    Michiel found five different ways designers addressed schema updates, each with different tradeoffs and constraints. The numbers across the top of the slide indicate the number of times each approached was used across 24 different people surveyed. Several used more than one approach (if you add up the x/24 count at the top of the slide there are more than 24 updates). Simply because you successfully updated your event store using one approach doesn’t mean you must do it the same way the next time. It is up to us as designers to sort things out and decide what to do next based on the current context. The nature of the schema change, the amount of data to be updated, the sensitivity of that data, and the amount of traffic that runs through apps that use the data all play into deciding which approach or approaches to take.

    The “Weak schema” approach allows for additional data to be passed in an event record. The “upcaster” approach transforms incoming event data into the new format. The “copy transform” approach makes a copy and then updates that copy. Michiel found that these were the most common. “Versioned events” and “In-Place” updates were infrequently applied.

    I always want to know more about what drives designers to choose a particular heuristic over another. So I was happy to read the research paper Michiel and colleagues wrote for SANER 2017(the International Conference on Software Analysis, Evolution and Reengineering) appropriately titled The Dark Side of Event Sourcing: Managing Data Conversion. Their paper, which gives a fuller treatment of options for updating event stores and their data schemas, can be found here. In the paper they characterized updates as being either basic (simple) or complex. Updates could be made to events, event streams, or the event store itself.

    Basic event updates included adding or deleting an attribute, or changing the name or value of an attribute. Complex event updates included merging or splitting attributes.

    Basic stream update operations were adding, deleting, or renaming events. Complex Stream updates involved merging or splitting events, or moving attributes between events.

    Basic store updates were adding, deleting or renaming a stream. Complex store updates were merging and splitting streams or moving an event from one stream to another.

    An example of a simple event update might be deleting a discountCode attribute. An example of a complex event update might be splitting an address attribute into various subparts (street, number, etc.).

    Importantly, their paper offered a decision framework for selecting an appropriate update strategy which walks through a set of decisions leading to approaches for updating the data, the applications that use that data, or both. I’ve recreated the decision tree they used to summarize their framework below:

    Decision Tree for Upgrading an Event Store System from The Dark Side of Event Sourcing: Managing Data Conversion

    Decision Tree for Upgrading an Event Store System from The Dark Side of Event Sourcing: Managing Data Conversion


    The authors also tested out their decision framework with three experts. Those experts noted that in their experiences they found complex schema updates to be rare. Hmm, does that mean that half of the decision tree isn’t very useful?

    Also, they offered that instead of updating event schema they could instead use compensating techniques to accomplish similar results. For example, they might write code to create a new projection used by queries which contained an event attribute split into subparts—no need to update the event itself. Should the left half of their decision tree be expanded to include other compensating techniques (such as rewriting projection code logic)? Perhaps.

    Also, the experts preferred copy or in-place transformations instead of approaches that involved writing and maintaining lots of conversion code. I wonder, did they do this even in the case of “basic” updates. If so, what led them to make this decision?

    Decision trees are a good way to capture design options. But they can also be deceptive about their coverage of the problem/solution space. A decision table is considered balanced or complete if it includes every possible combination of input variables. We programmers are used to writing precise, balanced decision structures in our code based on a complete set of parameters. But the “variables” that go into deciding what event schema update approach aren’t so well-defined.

    To me, initially this decision framework seemed comprehensive. But on further reflection, I believe it could benefit from reflecting the heuristics used by more designers and maintainers of production event-sourced systems. While this framework was a good first cut, the decision tree for schema updates needs more details (and rework) to capture more factors that go into schema update design. I don’t know of an easy way to capture these heuristics except through interviews, conversations, and observing what people actually do (rather than what they say they prefer).

    For example, none of the experts preferred event record versioning, and yet, if it were done carefully, maybe event versions would require less maintenance. Is there ever a good time for “converting” an old event version to a newer one? And, if you need to delete an attribute on an event because it is no longer available or relevant, what are the implications of deprecating, rather than deleting it? Should it always be the event consumers’ responsibility to infer values for missing (deleted) attributes? Or is it ever advantageous to create and pass along default values for deleted attributes?

    This led me to consider the myriad heuristics that go into making any data schema change (not just for event records): Under what conditions is it better to do a quick, cheap, easy to implement update to a schema—for example deciding to pack on additional attributes, having a policy to never delete any?

    People in the database world have written much about gritty schema update approaches. While event stores are a relatively new technology, heuristics for data migration are not. But in order to benefit from these insights, we need to reach back in time and recover and repurpose these heuristics from those technologies to event stores.Those who build production event sourced architectures have likely done this or else they’ve had to learn some practical data schema evolution heuristics through experience (and trial and error). As Billy Vaughn Koen states, “If this context changes, the heuristic may become uninteresting and disappear from view awaiting, perhaps, a change of fortune in the future.” He wasn’t talking about software architecture and design heuristics per se, but he could have been.

    Reconciling New Design Approaches with What You Already Know

    Image

    Change
    Last week at the deliver:Agile conference in Nashville I attended a talk by Autumn Crossman explaining the benefits of functional programming to us old timey object-oriented designers. I also attended the session led by Declan Whelan and Sean Button, on “Overcoming dys-functional programming. Leverage & transcend years of OO know-how with FP.”

    The implication in both talks was that although objects have strengths, they are often abused and not powerful enough for some of today’s problems. And that now is an opportune time for us OO designers to make some changes to our preferred ways of working.

    Yet I find myself asking: when should I step away from what I’ve been doing and know how to do well and step into a totally new design approach?

    No doubt, functional programming is becoming more popular. But objects aren’t going away, either.

    There are some benefits of pure functional solutions to certain design problems. Pure functional programming solutions don’t have side effects. You make stream-processing steps easily composable by designing little, single purpose functions operating over immutable data. You are mutating data, it just isn’t being mutated in place. In OO terms, you aren’t changing the internal state of objects, you are creating new objects with different internal state. By using map-reduce you can avoid loop/iteration/end-condition programming errors (letting powerful functions handle those details). No need to define variables and counters. This is already familiar to Smalltalk programmers via do:, collect:, select:, and inject:into: methods which operate on collections (Ruby has its equivalents, too). And by operating on immutable data, multi-threading and parallelization get easier.

    I get that.

    But I can create immutable data using OO technology, too. Ever hear of the Value Object pattern? Long ago I learned to create designs that included both stateful and immutable objects. And to understand when it is appropriate to do so. I discovered and tweaked my heuristics for when it made sense to stream over immutable data and when to modify data in place. But in complex systems (or when you are new to libraries) it can be difficult to suss out what others are doing (or in the case of libraries, what they are forcing you to do).

    But that’s not the point, really. The point is, once you understand how to use any technique, as you gain proficiency, you learn when and where to exploit it.

    But is pure functional programming really, finally the panacea we’ve all been looking for? Or is it just another powerful tool in our toolkit? How powerful is it? Where is it best applied?

    I’m still working through my answers to these questions. My answers will most likely differ from yours (after all, your design context and experience is different). That’s OK.

    Whenever we encounter new approaches we need to reconcile them with our current preferred ways of designing. We may find ourselves going against the grain of popular trends or going with the flow. Whatever. We shouldn’t be afraid of trying something new.

    Yet we also shouldn’t too easily discount and discard approaches that have worked in the past (and that still work under many conditions). Or, worse yet, we shouldn’t feel anxious that the expertise we’ve acquired is dated or that our expertise can’t be transferred to new technologies and design approaches. We can learn. We can adapt. And, yet, we don’t have to throw out everything we know in order to become proficient in other design approaches. But we do have to have an open mind.

    We also shouldn’t be seduced by promises of “silver bullets.” Be aware that evangelists, enthusiasts, and entrepreneur frequently oversell the utility of technologies. To get us to adopt something new, they often encourage us to discard what has worked for us in the past.

    While I like some aspects of functional programming, I see the value in multi-paradigm programming languages. I’m not a purist. Recently I’ve written some machine learning algorithms in Python for some Coursera courses I’ve taken. During that exercise, I rediscovered that powerful libraries make up for the shortcomings and quirks of any programming language. I still think Python has its fair share of quirks.

    And while some consider Python to support functional programming, it isn’t a pure functional language. It takes that middle ground, as one stack overflow writer observes:

    “And it should be noted that the “FP-ness” of a language isn’t binary, it’s on a continuum. Python doesn’t have built in support for efficient manipulation of immutable structures as far as I know. That’s one large knock against it, as immutability can be considered a strong aspect of FP. It also doesn’t support tail-call optimization, which can be a problem when dealing with recursive solutions. As you mentioned though, it does have first-class functions, and built-in support for some idioms, such as map/comprehensions/generator expressions, reduce and lazy processing of data.”

    Python’s a multi-paradigm language with incredible support for matrix operations and a wealth of open machine learning open source libraries.

    I haven’t had an opportunity to dial-up the knobs and solve larger design problems in a pure functional style. I hope to do so soon. My current thinking about a pure functional style of programming is that it works well for streaming over large volumes of data. I’m not sure how it helps support quirky, ever-changing business rules and lots of behavioral variations based on system state. Reconciling my “go to” design approaches with new ways of working takes some mental lifting and initial discomfort. But when I do take the time to new design approaches, I have no doubt that I’ll find some new heuristics, polish existing ones and learn more about design in the process.

    What we say versus what we do

    I’ve been hunting design heuristics for a couple of years. I’ve had conversations with designers in order to draw out their “go to” heuristics. I’ve joined design and programming sessions with experienced designers and captured on-the-fly what we were doing. My goal is to learn ways to effectively find heuristics in the wild, distill them, and then share them broadly.

    But lately, I’ve been thinking about how to deal with this puzzle: What people say they do isn’t what they really do.

    Let me give you an example. I joined the Cucumber folks last summer for several remote mobbing sessions. One heuristic they shared with me was this:

    Heuristic: the person who has the most to learn (or knows the least about how to solve the problem) should take on the role of driver.

    In “classic” mob programming as initially described, the person who is the driver and has his or her hands on the keyboard follows guidance of navigators—other mobbers who ostensibly guide the driver on what to do in order to make progress.

    “In this “Driver/Navigator” pattern, the Navigator is doing the thinking about the direction we want to go, and then verbally describes and discusses the next steps of what the code must do. The Driver is translating the spoken English into code. In other words, all code written goes from the brain and mouth of the Navigator through the ears and hands of the Driver into the computer.”

    What I observed the Cucumber mob doing was somewhat different. Sometimes the driver had an initial design idea and was keen to try it out. In this case, they often actively navigated and drove at the same time. Occasionally others would comment and offer advice. But mostly they just watched the design and implementation unfold. Sometimes that eager driver asked the others, should we try this now? But instead of waiting an uncomfortable length of time for them to chime in, the driver often continued on without any discussion. And I don’t think that driver was asking a rhetorical question. They wanted feedback if someone had any.

    At other times the driver would stop to collect their thoughts and force a discussion. In this case the driver became uncomfortable when they didn’t get enough feedback. And sometimes they took themselves out of the driver’s role, asking someone else to fill in. In short, while I observed that driver was often in control of the wheel (and forward progress), at the same time, they didn’t overly dominate. Drivers rotated. Every one got their turn. But how these switches happened was very dynamic.

    In all fairness, the mob programming website did touch on drivers and their participation in discussions:

    “The main work is Navigators “thinking , describing, discussing, and steering” what we are designing/developing. The coding done by the Driver is simply the mechanics of getting actual code into the computer. The Driver is also often involved in the discussions, but her main job is to translate the ideas into code.”

    While the main job of the driver may be “mechanics,” the small fast moving Cucumber team didn’t insist that getting the code into the computer be the driver’s main function. Now mind you, I suspect being remote affected their style of communications. They also knew each other well and knew each others’ common design approaches and preferences.

    So why did the Cucumber mob behave this way? Did they believe one way but consciously act in another way? Did they intentionally lie about their heuristics? Or were they deceiving themselves? Are people wired to explain what they do through some kind of distortion field? How often do people believe one thing (and hold it up as an ideal) but then choose alternative heuristics? If so, is this OK?

    I’m not sure the team was aware that their ways of driving/navigating deviated from the conventional driver/navigator roles until I shared my observations with them. I suspect that when they first started mobbing they were more rigorous about following the “rules” for these roles. Over time they found their own ways of working. And so the heuristics they collectively use to decide what to do, what design approach to try next, and how they interact with each other are much more fluid and nuanced than the simple descriptions of drivers and navigators on the mob programming website. They don’t exactly go “by the book.” And I suspect their heuristics for how they work together are still evolving.

    So how should I as a heuristics hunter reconcile my simple goal of distilling essential heuristics with the messy realities I find on the ground?

    Should I plunge into a concerted effort to sort out and formulate more nuanced heuristics? The short answer is, yes. While I want to find and record both general and more particular heuristics, I’m not inclined to want to sort them out into tidy, neat categories. After all, as Billy Vaughn Koen says, there is more than one way to solve any design problem and more than one heuristic that can work. By recording these nuances, I hope to get richer insights into the different conditions and cases and situations that lead to choosing them.

    This still leaves me with one nagging question: How can I reconcile what people say they do and believe with what they actually do? My (current) approach is that as I distill heuristics I also describe the context where I find them. Should it bother me that designers don’t do as they say they do all the time? Probably not. After all, we’re wonderfully creative problem solvers. And there are always options.