BACKGROUND

This is a talk called “The Maybe Great Idea”, originally written for Etsy’s Code As Craft speaker series.

TALK Abstract

You have a great idea for a software project. But… will it work? Is it actually a great idea? Are you sure that there aren't huge risks you're not seeing that could cause the design to be torpedoed at the last minute? Could other people be quietly solving the same problem? It sure would be nice if your organisation could endorse the idea and say yes, absolutely, do it!

Once an engineering org has more than a few people, making big decisions can be hard. Often there are lots of people who can say no to an idea, but it's hard to find anyone to definitively say yes. So how do we make it easy for people with good ideas to get support and get started?

In this talk I'll tell you about our journey from the popular #YOLO method of software development, through writing RFC (Request for Comment) design documents, writing better RFCs, reviewing designs as a massive terrifying group, reviewing designs in secret smoky rooms, finding a balance that made us happy, and accidentally building a community along the way. It's a journey we're glad we've taken, and it can work for other people too.

CREDITS

The emoji came from emojipedia.com! I love that site.

Thank you Rebecca Sliter, Katie Sylor-Miller, Gordon Radlein and Sarah Milstein for reviewing! Thank you Eva Parish for “You have a great idea. Is it actually a great idea?”, which is succinct and perfect. Thank you Alex, Andrew, David, Gabe, Johnathan, Karim, Regan, Shane and Tom who came to the dry run of this talk and gave me excellent advice. And of course thank you Trish for the project.

Thanks to Biz for making amazing slides (I hope she will at some point share her Facts About Animals deck. You will all learn so much!) and to Joel for being hilarious about Herman Melville and both of them for putting up with me spending every moment tweaking slides for the last couple of months.

LINKS

See also https://noidea.dog/blog/yes-if, the blog post that inspired this talk. I also have opinions about RFCs at https://noidea.dog/blog/design-documents and https://noidea.dog/blog/nobody-could-have-predicted-that.

TWEETS

Etsy was proud to host @whereistanya as our @codeascraft speaker with our very own @ksylor moderating. Awesome lessons from her work using RFC design documents. pic.twitter.com/DprhErOVMF
— Mike Fisher (@MikeFisher_Fish) December 4, 2019

Y’all, tonight I got to listen to @whereistanya dropping knowledge and then have a conversation with her and it was awesome! Very excited to share the video when it’s ready! pic.twitter.com/Qx8y2RufeO
— Katie Sylor-Miller (@ksylor) December 4, 2019

SLIDES

Say there’s a technical problem you want to solve. You think a lot about how to solve it.

And over time, your thoughts start coming together and finally you think you know exactly how to do it. You have an idea.

It might even be a great idea! You have this whole design in your brain. What happens next?

You could just build it.

I've seen people do this, go off in a corner for a month or a quarter and churn away at their idea, refining it and creating it. When you emerge at the end of that time, then you get to show other people what you've made.

You tell your potential users, other teams, people whose software you might want to integrate with…

…and maybe all of those people love it! Maybe it's exactly solved their problems too.

But what's more likely is that someone will say...

“This doesn't quite fit my use case."

“I don't get why this is useful? Why did you spend time on it?”

Or “I just built something really similar!”

One approach to avoid this situation is, back when we're at the point of knowing what we want to build, we write it all down. There are a bunch of names for this kind of document but for this talk I'm going to call it an RFC because that's what we call it at Squarespace and I've finally, after coming up on two years, trained myself out of saying "design document", which is what we called it at Google.

RFC stands for Request For Comments. And it is a request for comments. You might not always like the comments you get. But we'll get to that.

For now, let's say that an RFC is a way of explicitly saying...

"Here's what I want to do!"

Here's why I want to do it.

Here's when I think it will be done.

We send it to other people and ask for their review.

And they look at the goals, and decide if they think those are good goals.

And they look at the plan and decide if they think that's a good plan.

And they look at the timeline and decide if they think that's a good timeline. And they tell you.

And then you incorporate their comments and build the system, right?

And everyone is happy? The end.

So I think writing RFCs is amazing and great, but I also think the process of turning a maybe great idea into a reality is more complicated than that and we should talk about why. But I should get my bio out of the way first.

Hi! My name's Tanya. I'm a principal engineer at Squarespace which has its headquarters here in lovely New York City! Squarespace, as you'll know if you ever heard a podcast, is the all in one platform for domains, websites, online stores, and marketing tools! We're hiring! I'm whereistanya on github and twitter and I blog (here!) at noidea.dog, which is of course a Squarespace site.

Before I go back to talking about RFCs, I want to take a moment to acknowledge my domain bias which is that I've been an infrastructure/backend person for most of my career. While I have worked with wonderful product folks and mobile folks and frontend folks, that's not where my experience lies.

While this talk will look at how an organization agrees that someone should go ahead with a design, I will not be able to say intelligent things about how to collaborate with product managers and designers, or what kind of scaffolding and process is useful for making that smooth.

But, can we agree that, wherever in the stack we sit, we have a lot in common? For example, we all want to work on good and useful things that other people want to use? Let’s agree on that.

So my job's a bit weird, because I was hired as the first principal engineer at Squarespace and I had to figure out what that meant. Now that I've been doing this job for 20 months or so, I've come to the conclusion that whatever I end up doing on any given day is probably my job. But when I first joined in March of last year, the guidance I had was to "solve the big cross-team problems". Which, btw, is the best job and I recommend it. I *love* solving big cross-team problems.

So in my first week of the job, I was getting to know people, learning systems, looking around for problems to solve.

And after a few days I had this meeting with my colleague Trish. So, Trish is one of those people who can just look at a situation and see what's not working, and know exactly what needs to change for it to be wildly successful. She's a technical program manager, a TPM, which means she starts out with +15 to competence, but she has even more than usual of this incredible power to see what things need to be tweaked to make everything better. At this point, I didn't even know that but cool, someone put a 1:1 in my calendar so I went!

And we talked about a bunch of things, and one of them was that Trish asked "If you're looking for problems to solve, can you do something about RFCs?"

And I said "maybe, what's wrong with RFCs?"

Trish said “Well, infrastructure engineering at Squarespace was a small group until relatively recently. The way a small group makes decisions is that everyone who needs to know something probably sits together, eats lunch together, and talks to each other. Information flows easily.

But we've grown a lot and we recently split infrastructure into multiple teams. As we've grown, we can't all be in the same discussion. Here's our current RFC process.”

“Someone writes an RFC, a Request For Comment document. They share the doc directly with some people, and then they create a Jira ticket for it in the RFC project as well.”

“A lot of people have configured Jira so that they get a mail for every new issue in the RFC project. So an arbitrary bunch of people will find out about this RFC and some subset of them will find the subject line interesting and choose to go read it. They aren't necessarily the set of reviewers you would have chosen; you just get whoever happens to have time to read the document.”

“And they JUMP on the doc and beat the hell out of it. You requested comments? You get comments! Some of them are useful comments. Lots of them are nitpicks. Some are spelling corrections. Some of them rip the core of your idea apart. Some of them obsess about things you consider out of scope. There's a general feeling of being beaten up.”

“And then the reviewers go back to their regular lives and never think about your document again.”

“So then the author has a bunch of comment threads — everyone typing the great American novel into that little comment sidebar that’s like ten pixels wide. The author probably accepts the spelling corrections. They deal with the easier nitpicks. And then they're left with the big difficult comments. Someone wants to know how your design will handle future integration with a technology you've never heard of, and you have to go do a bunch of reading and figure out how much that matters. Someone says don't do this design, instead use this other system we're building, but it won't be ready for a quarter. And now you have to decide whether you can wait a quarter and whether you really believe they'll be ready.”

“But even worse than the comments are the lack of comments. What do you do about the comments you didn't get?

Like, you see someone has their little icon in the corner of Google docs so you know they've opened the doc, and you can even see them highlighting things like they're going to comment and then... NOTHING. And then a couple of days later they're not there any more. What does that mean? Does that mean they have no objections? Maybe they really like it? Or maybe the doc was open in one of their 700 tabs and when their icon disappeared later it's because they restarted their browser?”

“In the end the author has a decision to make.

Should they implement the thing? What do all of the comments and lack of comments add up to. Well, in the end it depends on their level of self-confidence.

People just kind of go with their gut. Maybe they do whatever they would have done if they'd never sent the thing for review at all.”

“This means we have good ideas that never get implemented because the RFC authors don't know they'll have broad support. Because people left some questions and comments but nobody said yes.

Lots of people can say no, but nobody can say yes.”

“But even if you do decide to do it, there are more problems.”

“Maybe you’re half way in and then you get to a point where someone else tells you they missed the initial RFC and are surprised about what you’re doing… and it turns out that's a team you were depending on to do something. And they don't have time to do the something you need them to do and so that part of your system is going to be delayed by a quarter.”

“Or a different team are mad that you're making a change that will break their workflow or that you're proposing to change something that they don't really believe you understand.”

“Or senior people come along and see the design for the first time, after it's already implemented, and are like ‘I wish you'd shown me this earlier. There's a security risk you haven't thought of, or a fatal flaw , or I just philosophically HATE IT. You can not launch this.’”

“Anyway, if you have time, can you work on that?”

I’m like “Phew ok, so, to recap:

We've got big risks and design flaws that are discovered at the end.

We've got authors being overwhelmed with comments many of which are nitpicks and some of which are essays.

But we still don't know if the right people have seen the doc.

And in the end nobody really gives guidance on whether to do the thing.

IS that about right?

Trish is like “YES. Do you want to do something about that?”

“DEFINITELY!”

“This sounds like EXACTLY the sort of bike shed I should get involved in on my first week in a new company.”

And now it’s 20 months later.

I want to talk about the three steps we took in Squarespace Infrastructure to try to fix some of these problems.

We built an opinionated RFC template.

We set up a regular group meeting called Infrastructure Council.

And we set up a much smaller meeting called Architecture Review.

Each of these was not perfect, but was an incremental improvement on the one before. And, spoiler, what we end up with is also not perfect. We're not done. But I like what we have so far.

I wanted to start with that first problem: fatal flaws, risks, surprising dependencies and objections not being discovered until the end, after the thing was being implemented. That's annoying for everyone involved. It would be better if the early reviewers asked all of the questions to shake out important information.

But it can be hard to ask questions. When someone sends an RFC for review, it's easy to nitpick small things, point out spelling errors and minor technical issues. It's much harder to ask nicely "how much do you know what you're doing here?"

And it's hard to ask important questions that you think maybe the author *must* have thought about, but just didn't mention because it's so obvious. Like say I'm writing some new service that stores private data and I'm putting it directly on the internet, you might want to ask me "you talked to the security team about reviewing this, right?". But that's such an obvious question to ask.

You might think I’ll get mad at you for even asking! (I wouldn't btw. I'd say thank you for asking!)

But a lot of questions can feel really obvious like that. Like, "You said you're doing DNS lookups. Do the people running the name servers know you're about to send them 10x their current traffic?" or "When you've said you're going to use this new type of database you heard about at a conference, were you planning to run that yourself, or are you hoping the databases team will do that for you?". It feels almost RUDE to ask those questions, because SURELY the author already has thought of it. But someone has to ask or you're setting yourself up for trouble later.

So the first thing was to add an RFC template that asks all of these questions for us.

So we don't need to push through the discomfort of a slightly weird social situation while reviewing.

Let's be clear that this is not an original idea. Squarespace already had several RFC templates. Google has this great design doc template called Bluedoc that I think has been rewritten from memory at every company that has engineers who used to work at Google.

Anyway, I updated our RFC templates to make sure they included some very specific questions.

Like, what problem are you trying to solve? And just as important, what problems are you not trying to solve? I added an explicit non-goals section.

What existing internal and external systems will this one depend on? Do the people who run those systems know you're coming and do they know your scale.

What other approaches did you consider? What existing solutions were close but not quite right? How will this project replace or integrate with the alternatives?

Are you adding any new regular human processes or extra work for any teams? Do you need anything new from customer operations? If this is a new system, who will run it?

What security/privacy/compliance aspects should be considered? The template says if you're not certain, never assume there aren’t any. And it spells out how to get in touch with the security team.

And then: what risks can you think of which might make this not work, and why aren't you worried about them. And we have some examples to spark ideas: complexity, compatibility, latency, service immaturity, lack of team expertise, other work taking priority, etc.

And of course there's space to talk about the details of the design, the context it's all happening in, and so on. Diagrams. I like diagrams.

If you do all of that, you're shaking out the answers to a lot of questions up-front. The people who will need to interact with the system in future will know what they're getting and their lives get easier.

I think doing this work upfront helps the author too though. A lot of the power of the RFC template is that it lets you become your own first reviewer. It's a checklist for stuff to think about as you design.

So that should help with design flaws. But what about all of those comments and deciding which ones take priority? If one person hates the design, how much does that matter? In the end, after all the comments, can the author go ahead and build the thing?

We added an approvers section. The approvers section lists the people whose opinions you have decided you most care about. If all of these people say yes, then you're going to go ahead.

It's unambiguous. If the approvers leave vague comments, they still need to come up here to the top of the doc and type the words yes, or occasionally no, or "yes if you address my inline comments" or "not yet; I don't think this data structure will be efficient at scale". And if one of your approvers said nothing or is vague, that's not implied approval; you get to chase them down and remind them to read your doc.

So that added a lot of clarity. The RFC author stated in advance whose comments they planned to care most about, and got an explicit yes from those people.

But we didn't have a good way of making sure those were all of the right people. We still had projects being implemented without domain experts or teams running dependencies seeing the design or knowing it existed.

People should be able to choose their reviewers, but if you're going to do something weird with databases, databases team should find out before it's in production.

So we needed a forum to advertise RFCs and let people ask questions about them. And so the next idea was to add a regular meeting called Infrastructure Council.

Why is it called Infrastructure Council? Well, it was for infrastructure. That much is easy. And our friends in product engineering already had a regular meeting called Product Backend Council. And that seemed pretty good, so I copied it.

This talk is basically the story of me copying other people's good ideas. I'm cool with that.

The Product Backend Council had some features that I really liked.

Someone who wanted their RFC reviewed would schedule it for Product Backend Council. The moderator would send out an email to say what's on the agenda. Most of the org would turn up. The RFC author would… Well, kind of read the RFC to the room, which wasn't great but then people would talk about it in depth, problems would get discovered, there was good discussion. As a nice side effect, people on teams that didn't interact every day would get to know each other. I liked that.

But I also added our own little flair to it. We started every meeting by asking:

Is anyone new?

Did anything cool happen this week?

And what's on the radar? Is anything coming up soon that everyone needs to know about?

I sent an email telling people about it and I was actually pretty nervous about whether this was going to be ok.

Because for all that I act like I know what I'm doing, I'd never had the hubris to organise a meeting and invite an entire organisation before. That's an expensive meeting! And I'd been at the company for maybe two months at this point.

I didn't know if people would come. I didn't know if the agenda was interesting to people who weren’t me.

But we did the meeting and… it was honestly pretty great. Tons of people came — we were standing room only — and something happened that I hadn't really thought about. People applauded at the end of each presentation. And laughed together a lot. And at the end I said something like "ok hi thanks for coming, we're going to do this every two weeks so if you have something to present come talk to me" and… my friends … everyone applauded that too and then we all kind of laughed at ourselves for applauding and it was so nice.

On the way out I met Trish (who, you've probably noticed, her opinion matters to me) and she told me she'd loved Infra Council and I felt like:

I have solved all the problems!

Trish was right to trust me with this important mission!

Don't worry, this feeling of success didn't last.

Because over time, some flaws started to become clear. Brand new problems!

One was that people were scared to present. It turns out, if you have an idea, even if you think it's maybe great, it's intimidating to go in front of your whole org and ask what they think.

And actually, people were kind of right to be scared, because the review was sometimes absolutely BRUTAL. It's bad enough when someone leaves a comment on your design document saying that you're full of crap. It's a whole other thing when they say so in front of your whole org! Like, your manager is sitting there and your whole team and your director or whatever and the first comment on your presentation is "I HAVE GRAVE CONCERNS HAVE YOU REALLY THOUGHT THIS THROUGH?". And the comments weren't even always good comments. We had all the nitpicks and vague questions that we'd had in RFCs.

So once again the presenter had the ambiguity of not knowing how much they needed to address the comments. They were from someone who wasn't an approver, so was it ok to ignore them?

So, although we still had a good community vibe, we were finding that people didn't always want to present. Or if they did, they waited until the end, until their RFC was really really polished and the prototype was really polished. Often they were basically running in production by then, so any changes would have been really unwelcome.

So the review came too late with all the design flaws and risks not being noticed. And anyway, with three presentations in the meeting, each RFC only got 15 minutes of review, which wasn't enough time to introduce an idea and go deep on it.

So, ok, this helps, but it's not enough. Let's try again.

If we had any hope of people wanting to come, and being willing to come early, we needed a smaller group than infra council. And that brought us to architecture review.

I'm certain I didn't invent this either btw. My best guess is that I stole it from Gordon Radlein.

The idea of architecture review was that we would have five people -- and always the same five people -- and they would review every major RFC in infrastructure. We started with me, three staff engineers from infrastructure and one staff engineer from product engineering, to make sure we were well calibrated. I picked people who I thought could cover a lot of different areas and who would have useful comments.

Five people is not a super scary group.

So the author would still write an RFC and list any approvers they wanted, and get them to approve. And then they'd take the RFC to architecture review.

The idea was that, by always having the same people at architecture review, they'd start to build up a feeling for what stuff is happening in the organisation, and notice overlaps or inconsistencies. We would all commit to showing up having read the RFC, and then we'd spend a whole hour on it. And at the end, we'd spend the last 10 minutes explicitly saying yes, or not yet or usually "yes if...". And we'd write in the notes what everyone said, so the design author would leave the room with the documented support of a bunch of senior engineers.

And actually that worked pretty well.

Except that people immediately started to act like were this creepy shadowy group judging from behind closed doors and we really didn't want that. We didn't want to invite the whole org but it still needed to feel open. And also sometimes we needed extra experts in the room; we don't know everything.

So now we invite a few subject matter experts for each review. Sometimes these are technology expert. Sometimes it's some key users of the system, or teams who run systems this one depends on.

Since the experts rotate through, it makes the room not feel so secret. Also we take detailed notes and those are always public. And we announce in every Infra Council what's coming up for review and what's happened at Arch Review since the previous one.

I think this has genuinely gone well. I've been asking people about how it's going and collecting feedback.

People said they found Architecture Review useful for thinking through all of the details, that the comments they got from reviewers have made them better at their jobs, and that they like being able to see their design in a big context.

I also got the best testimonial I've ever had:

"it's actually not scary".

So that’s working!

So where does that leave Infrastructure Council? Well, we've put away the weapons and leaned more on the applauding good work and giving kudos and having a community feeling. Infra Council is still for sharing information, but it's not for review. It would be considered very bad form to ask a mean question at it now.

We use it for mini tech talks, for sharing really good retrospectives and post-mortems, for describing how we did something, sometimes for presenting an RFC after architecture review is bought in on it. We have a regular “Meet A Product/Internal Engineer” segment where we invite teams from other parts of engineering to come give us perspective on how they use our infrastructure.

It's kind of our community meeting, like a tiny all hands every two weeks. 50% informational, 50% celebrating good work. I love infra council.

Product backend council has changed to use the same model as infrastructure council btw.

So you might think I'm now like AHA I have solved all the problems! again. But no, we have a ways to go.

Because the rfc template we use it was pretty big and ended up feeling too intimidating for some simple use cases and just the wrong fit for some other orgs. We've just done another pass on it to make it a bit more approachable and also to make sure it's flexible enough to meet the needs of, say, the folks in mobile. And we're waiting to see how that ends up working for everyone.And our friends in product engineering and internal engineering started doing architecture reviews, and the load on some of the reviewers got too heavy. We're splitting it into multiple core groups, but it's not clear yet if that's the right way to do it.

And we need some sort of story for following up on what happened afterwards. Like, what if someone finds a problem with their design and need to do some other thing. Should they do another review? TBD.

And we don't have a good story for rotating people in or out of the groups. Architecture reviews have mostly been done by staff engineers. But I think we should we be inviting some senior engineers and giving them opportunities to do staff level work.

So for all that I think we've made good progress, we're still iterating and we'll probably always be iterating. And I think that's good.

Because the point of all of this is to make something good and useful that other people want to use.

RFCs and reviews should be helping people move faster over the long term, because they're catching problems earlier and making it easier to build consensus. If it feels like gatekeeping or red tape, or it's too scary or it's not solving our problems for some other reason, we should take another look at what we’re doing and try again.

IN CONCLUSION. If you're trying to solve a problem, be prepared to try things out and iterate and iterate because process is only useful if it's actually useful. I know this is the kind of profound observation you came here for.

I’ll leave you with a slide from my six year old, who saw me making this deck and asked to help.

She wants you to know that if you keep trying to make things better, you'll feel happy at the end.

Also then she got very serious and said that I should tell everyone to write good code. So please write good code.

And that's all I have. Thank you.