SRE Interviews in Silicon Valley

A few months ago I had several interviews at some of the bigger Silicon Valley tech companies. I learned a tremendous amount in the process and while I couldn’t openly talk about it back then, I thought that I should at least write down my experiencees after the fact.

Disclaimer

Some of these companies require NDAs for the interview process in case you run into any secrets while you’re on site, so I won’t name any names or describe any specific details.
This post is supposed to summarize the similarities between the companies and to give an overview of what to expect during the tech interviewing process.

The position

My experiences were particular to the type of position I got approached for. The position was very similar for all of them. Some companies call it “Production Engineer”, some call it “Site reliability engineer” (SRE). The idea is the same. It is the middle ground between a systems engineer and a software engineer. The positions requires in depth knowledge of 5 different areas:

Coding
Systems
Networks
Troubleshooting
Scalable Architectures

You don’t have to know all of them, but should at least have a good knowledge of 2-3 and a basic understanding of all of them.

Coding

For the SRE position, these are not usually the brain teasers you can read about in the tabloids. It also didn’t consist of very data structure heavy acrobatics (I didn’t have to rebalance red-black tree or implement mergesort)

Most of the time people just want to see that you can develop reasonably complex tooling and know the pitfalls that you encounter in production.
It usually starts out as a basic task (e.g. log parsing, file pruning, …) and then gets extended a bit (“What if this had to run continuously?”).

The gotchas are the usual things that you run into when working on an actual system and not just sitting in a lecture about one.
It starts with escaping spaces and ends at multiline syslog messages and “does this file fit into RAM?” kind of problems. Most of the time, you get to pick your programming language of choice. I would usually suggest Ruby or Python. Nobody wants to get stuck in weird IO interfaces or languages that don’t support strings natively ;)

Depending on the interviewer, you might end up having to do a little bit of string manipulation (find all palindromes, group by x, …), but since most of these string manipulations are relatively approachable, I rather enjoyed myself even though I would classify my remaining theoretical datastructure/algorithm knowledge as “could need some polish”.

Systems

The systems part of the interviews is usually targeted towards Linux.
It includes Filesystem knowledge (What are Inodes?), knowledge about the process lifecycle (What is fork+exec? How do signal handlers work? Thread vs Process?), Linux internals (What is load? Describe the boot process? How does dynamic linking work?).

These all require relatively in depth answers of more than a sentence.
The more in detail you can go the better.

Networking

At least for me, this wasn’t too much about Spanning Tree or BGP.
The networking interviews targeted more on the application side of things.
A lot of conversations about TCP (Nagle’s algorithm, TCP CORK, …), DNS (Glue Records, recursive resolvers, …), IP (CIDR), SSL, …

I was once even asked what my favorite protocol was. Luckily I had skimmed my thesis on anonymous filesharing on the flight over, so I had some talking points :)

A lot of the time, you will hear open ended questions (“You type a URL in your browser and hit enter, what happens?”) and can go down the stack to your heart’s desire :)

Troubleshooting / Incident response

This part of the interview is the one that differs most between the companies.
It is probably also the hardest one to come up with as an interviewer.
It ranges from actual debugging of LAMP problems inside a VM to looking at alerts and prioritizing them, to looking at a 32 thread stacktrace and telling a story of what happened.
Some of the interviewers are able to play a D&D style “dungeon master” role and give you a hypothetical system on which a defect is manifesting itself. You then have to describe your steps to zone in on the problem while the interviewer will tell you the results of your queries (“I check for inodes using df -i” - “You see that you have a utilization of 30%”).

Scalable architectures

This is one of the interviews that is probably a big unknown to people who have mostly dealt with smaller systems before.
The interview is usually an interactive whiteboarding session in which you have to design a system that withstands a certain amount of requests. The initial requirements are relatively tame and the interviewer will gradually force your architecture to scale more and more. This is where you can bring in your knowledge about load balancers, caching layers, consistent hashing and sharding. Bonus points for fancy things like bloom filters of hyperloglog :)

It probably also doesn’t hurt to know some of the technology that has emerged from the company in question. Most of the tech companies have 1-2 open source projects that might be worth a look beforehand.

The interview process

It seems like all of the big tech companies have agreed on a way to do interviews.

Initially, I got contacted by a recruiter. This seems to usually happen either via LinkedIn or eMail (maybe via a github profile?).
Most recruiters will usually talk a bit about the position, learn about your experience and once they deem you a fit, will do a little pop quiz. The pop quiz will consist of a set of 20'ish questions about all of the topics mentioned above. Usually they can be answered with a single word or two. (“What port does DNS run on?”, “What is saved on an inode?”, …)

Once thethe initial screening is over and was successful, there will be 3-4 phone interviews of about 45-60 minutes each. The interviews will go into one of the topics mentioned above. The coding will be done using a collaborative online editor. This editor can also be used to paste stacktraces and log entries for systems questions. You might want to brush up on what all of the letters in vmstat mean ;)

At the end of each interview, there are usually 10 minutes set aside for questions.
This is a good time to ask about the day to day stuff that the engineers might be able to answer a bit better than the recruiter.

Once all of these phone interviews are over and went reasonably well, the fun part starts: the on-site!

For me, this meant free flights from Boston to San Francisco! Not only did this allow me to escape the winter, but it also allowed me to spend some time driving around SF and the valley. I had never been before and was able to connect with some old friends and colleagues.
Usually the companies cover the whole trip. From airport parking to a rental car, an allowance for food and hotel stays, it’s all taken care of.

Preparation

Work is keeping me reasonably busy and I usually stay up to date by reading lots of blog-posts in my free time, so my only preparation for the interviews was the book Modern Operating Systems by Andrew Tanenbaum.

Besides a few google searches about interview questions and a look at Glassdoor, I think using the 5+ hour flight to read over the Tannenbaum book was probably the thing that helped me the most.

For the 3rd interview, I also spent some time reading Programming Pearls. Solving these kind of math heavy problems is nothing that comes naturally to me, but I think I got a bit of a better grasp about the problem space and how a different perspective can sometimes show up elegant solutions.

Conclusion

Honestly, the whole experience was highly entertaining and I learned a lot.
I didn’t actively look for a job, so I was able to come into those interviews without any pressure on me. It was nice to see how a well executed HR/Recruiting organization can work and taking a peek inside all of these companies was really interesting.
I really enjoyed talking to the Engineers during the interviews and getting a bit of a feeling for how the companies operate and what the people that make these giant infrastructures work do on a regular day. As an added benefit, knowing one’s market value does help a lot on the professional development side of things.