How Facebook Develops Code

    Translation of the original article .

    How Facebook Develops Code


    I am fascinated by how Facebook works. This is a very unique society, not easily recreated (and their method would not work for all companies, even if they tried). These are notes accumulated from conversations with many Facebook friends about how the company develops and releases software products.

    More than six months have passed since I collected these observations, and I’m sure that even now Facebook is constantly improving its software development methods. So these notes are probably a bit out of date. And also, it seems that the Facebook culture driven by developers is gaining more and more public attention. So now I feel more comfortable releasing these notes ... HUGE thanks to the many people who helped put together this idea of ​​Facebook from the inside! I also thank the people of epries and fryfrog who made corrections and edited.

    UPD : translation is not a colorful literary work that is read and breathtaking. Therefore, if possible, better read the original in English.


    Notes:
    • As of June 2010, the company consisted of nearly 2,000 employees, with 1,100 employees 10 months before. Almost doubling the staff in less than a year!
    • The largest teams are specialists and the operations department, approximately 400-500 people each. Both teams make up almost 50% of the entire state.
    • The ratio of the number of managers to the number of specialists is approximately 1: 7 or 1:10.
    • All specialists go through a 4-6 week training at the training center, where they study the Facebook system by correcting errors (bug fixing) and listening to lectures by senior / full-time specialists. Approximately 10% of people from each training class do not go further and quit with recommendations from the company.
    • After the training center, all specialists get access to an up-to-date database (accompanied by the standard lecture “Great responsibility comes with great strength” and a clear list of “fire-able offenses”, for example, the disclosure of the user's personal data).
    • [Ed. thx fryfrog] “There are also very good security measures to keep anyone in the company from doing terrible things that may come to mind. People around you have the opportunity to get up to speed and try to solve the problem. But if you nevertheless “become” those who need help, this fact is recorded along with the reason and carefully considered. It is not permissible to stray from the true path here, of course. ”
    • Any specialist can change any part of the Facebook code and commit it as they wish.
    • Developer-driven culture. “Product managers are essentially useless here,” is a quote from a specialist. Specialists can change the specifications of the development process itself, change the order of work projects and introduce new ideas at any time.
    • During the monthly inter-team meetings, specialists are the only ones who report on progress. The marketing and management departments take part in these meetings, but if they are too frank, this is reported to management as "the product spoke too much at the last meeting." They really want specialists to openly own their developments and be the main link for projects that they developed.
    • The allocation of resources for projects is completely voluntary.
      • PM gathers a group of specialists, trying to give them the opportunity to enter the excitement, discussing their own ideas.
      • Experts decide which of the ideas sounds more interesting to begin work on it.
      • Specialists communicate with their managers and say: "I would like to work on these 5 things here for a week."
      • Those. Directors usually leave the preferences of specialists to their discretion, sometimes they may be asked to do certain tasks in the first place.
      • Specialists manage the entire development themselves - JavaScript on the frontend, database code on the backend and everything in between. If they want to get the help of a designer (the staff of specialized designers is limited), they are forced to interest the designer enough to take up their project. The same goes for architects. But it is expected that in most cases, specialists will cope with all their needs.

    • Is it worth the idea of ​​a eaten egg usually becomes clear within a week of its implementation and further testing on selective users, for example, 1% of users of the state of Nevada.
    • In general, specialists prefer to work on infrastructure, scalability and “difficult problems” - the most prestigious areas. It can be difficult to observe professionals enthusiastically working on front-end projects and user interfaces. This is the opposite of what you can see in other consumer markets, where everyone wants to develop things that users directly touch, and you can poke a specific part with your finger and say “I did it”. On Facebook, the server side, such as news feed algorithms, targeted advertising algorithms, memcache optimization, etc., are first-class projects that experts want to work on.
    • Commits that affect some high-priority functionality (for example, a news feed) pass a code check before merging (approx. Per. “Merge”). The news feed is very important, so Zuckerberg himself looks through any of its changes, but this is an exceptional case.
    • [Correction - thx epriest] “There is a mandatory verification of the code of all changes (by one or more specialists). I think the paragraph simply explains that Zuck does not look at each change personally. ”
    • [Thx fryfrog fix] “All changes are viewed by at least one person, and the system is such that anyone else can take and view your code, even if you didn’t ask. Otherwise, this may lead to the intentional introduction of malicious code into unverified code. ”
    • Specialists are responsible for testing, fixing bugs, and supporting their work after launch. Several unit-testing and integration-testing frameworks are available, but they are only used from time to time.
    • [Thx fryfrog fix] “I would also like to add that we, of course, have a QA, just not an official group. Each employee who is in the office or connected via VPN uses a version of the site that includes all the changes that are in the queue for the next calculation. This version is constantly updated and usually 1-12 hours before the whole world sees it. All employees are strongly advised to report any bugs found, and all this works very well. ”
    • re: surprised by the lack of QA or automated unit tests - “most specialists are able to write error-free code. This is something they don’t see the point of doing in most companies: when there is a QA department, it’s easy to simply throw everything to them to find mistakes. ” [Please note that this was a subjective opinion, I wrote this because of the striking contrast that is seen in the standard practice of developing other companies].
    • [Thx epriest fix] “We have automated testing, including push-blocking tests, which must be completed before the release is posted. We absolutely do not believe in the phrase “most specialists are able to write error-free code“, we more believe that this is reasonable as one of the basic principles of development. ”
    • re: surprised by the lack of influence / control of PM - managers have a lot of independence and freedom. The key to independence is to build really good relationships with technical directors. You need to be tech-savvy enough not to offer stupid ideas. In addition, there is no need to ask for permission or pass any roadmap / backlog checks. "My product director doesn't even know all the things that are in my roadmap." Accordingly, there are several PMs, but they all feel that they have great responsibility for a really important area in the company, with personal interest.
    • By default, all code commits are packaged in weekly releases (Tuesdays).
    • With additional efforts, the changes can be posted on the same day.
    • The releases on Tuesdays require the presence of all the experts who committed the code in the previous week for the release candidate, which should be uploaded.
    • Before the start of the release, specialists must be present on a special IRC channel for a “call to lay out”, otherwise they will be punished with a public “shame”.
    • A team of operationalists launches the release, gradually rolling it out onto the servers.
      • Facebook has about 60,000 servers.
      • There are 9 concentric levels for rolling out a new release.
      • [Thx epriest fix] “The nine stages of the calculation are not concentric. There are 3 concentric steps (p1 = internal release, p2 = small external release, p3 = full external release). The remaining six stages are auxiliary levels such as internal tools, a video download server, etc. "
      • The smallest level is 6 servers.
      • For example, every Tuesday release rolls out to 6 servers (level 1), then a team of operating officers monitors these 6 servers and makes sure that they work correctly before rolling out to the next level.
      • If there are any problems in the release (for example, errors fall, etc.), then the calculation is canceled. The specialist who made the failed commit is called to correct the error. Then the calculation starts from the beginning.
      • Thus, a release can go through levels many times: 1-2-3 fixes. Return to 1. 1-2-3-4-5-correction. Return to 1. 1-2-3-4-5-6-7-8-9.

    • The operation team is really well prepared, united and takes care of their business. Their server metrics are more than just reports of errors, load metrics, and memory usage — they also include custom metrics. For example, if a new release changes the percentage of people using Facebook, the operations team sees this in their numbers and therefore can stop the release to find out the problem.
    • During the release calculation, the operations team uses paging based on IRC, which can send information to engineers via Facebook, e-mail, IRC, IM and SMS if their attention is required. Ignoring the messages of the operationists leads to public "shame."
    • As soon as the code is pumped to level 9 and it is stable, the weekly tab is considered complete.
    • If the functionality was not developed in time for the day of the weekly calculation, then this is not so critical (if it does not contain hard external dependencies) - the functionality will simply be fully implemented when it is completed.
    • Receiving svn-blammed complaints, public shame or too frequent delay in projects may result in the dismissal of a specialist. “This is a very highly effective culture.” People who are unproductive or not super-gifted really put themselves at risk. Managers will literally take, take the underperforming to the side within 6 months after hiring and say "It just didn’t work out, you are not suitable enough for this culture." In general, this applies to any level of the company, even those hired at the C-level and VP-level were quickly fired if they were not super-productive.
    • [Correction, thx epriest] “People are not called to look for errors. They are called only if they asked the changes to be included in the release, but not to support the changes when something went wrong (and if they did not find anyone to replace). ”
    • [Correction, thx epriest] “Because of complaints you will NOT be fired (translator's note: I mean svn-blame). We are extremely lenient in this regard, and most of the top experts have laid out at least one terrible thing, including myself. As far as I know, no one has ever been fired for making this kind of mistake. ”
    • [Correction, thx fryfrog] “I also do not know anyone who would be fired for the errors cited in the article. I know people who accidentally dropped the site. They work hard to fix what caused the problem, and everyone learns from it. Public shame is much more effective than the fear of being fired, in my opinion. ”


    It will be extremely interesting to see how the development culture on Facebook evolves over time, and especially to see how this culture can continue to expand with the expansion of the company to thousands of employees.
    What do you think? Will a “developer-driven culture” work in your company?

    Original article

    If there are any inaccuracies or errors in the translation, please write in a personal, I will make corrections.

    Also popular now: