> Over time, engineers realize that Code Search is more important than their IDE.
You nailed it.
One particularly hard problem Google faced w.r.t. IDE integration was the intersection of thousands of RPCs defined by protobuf declarations => generated protobuf code in 10 different languages being dong transparently by blaze + not checking in generated code => where does the IDE based indexing find the generated source files, and then tie that back to the original .proto file declaring the RPC declaration. Blaze had the information available through a query, but needed ways for the user (the IDE) to optimize the query plan so that it could deliver just the proto related dependencies with sub-second response time to keep the human users happy.
Once you found the .proto file that defined your data they could leverage the the monorepo and search over it. Let's say I am a Java programmer working on a server endpoint and want to change the format of some data I am stuffing into a protobuf. I could find the original declaration, and then find all the instances of objective-C code that our iOS apps were using to consume it. Trivially easy with the combination of global code search and a global dependency graph.
Caveat: Code search and a monorepo let you do some amazing things. But there is a LOT of cost which Googler's tend to nostalgically gloss over. piper, citc, kyte, critique, and depserver* represent (wet finger in the wind guess) probably $100M of development effort.
Context: Googler from 2007-2025, worked on API serving infrastructue, desktop apps (yes, we checked out source code to windows laptops), blaze/bazel, and a few others. I've seen all the developer tooling problems.
*depserver is a essentially a service that holds the entire blaze dependency graph every file up to every buildable object across the code space. That drives the automatic testing infrastructure.
+1. CitC and it's relationship to version control (perforce/piper) is central to all development.
Let's say there were 100M file in the monorepo (an underestimate). You obviously never want to do a git clone of that. But what if clone and checkout were free? That's what CITC did. Creating a new workspace took less than 1 second and got you a FUSE filesystem that looked like it had everything in the repo. But nothing was actually downloaded until you opened files. And your local changes were also stored in the service. And this was available to the CI machines. See where I am going. CI did not have to clone the repo and apply your branch. CI just had your changes available. If you were just testing your 10 files there was no cost to having 100M other source files that were unrelated to your project.
This solves many problems that git worktrees attempt to address, but 10 years earlier, at less local disk cost, and more performant. I miss that.
I want to start by saying that I do not want to diminish or disparage the work that Russ, Rob, Laurent, and others have done. It has made the Google code base better. That is an inarguable fact. Nor do I want to pick on buildifier or gofmt or any other tool as a singleton problem. I'll talk about buildifier because that is what I personally fight with. Others may have different demons. (YDMV - "your daemon may vary". I'm taking the authorship of that one now, in case it ever takes off).
But back to the point... formatting rules without firm, incredibly strict enforcement ends up being a tax on the janitors - the people who clean the code base and do large scale changes. That makes me sad. These are the people who care a great deal about code health, and their work is hindered by the lint checks that we have imposed.
Let me give an example.
I'm trying to eliminate a constraint in the build system. It's a "small" large change - only O(30K) instances. (Yeah, Google scale is different). I have an incredible wealth of tools available to me to automate the process. For the benefit of the Googlers, I can identify Blaze targets to change, use buildozer to fix them, and ship off CLs to review. But the changes I want to make are often ones which should be reviewed by the code owners, and not globally approved. So possibly O(10K) individuals might be involved in reviews.
Let's explore the problem. First, shouts to y2mango for bringing up incremental formatters. This should be the default for all tools. And another to flymasterv for raising the question of "why not just format as each person touches a BUILD file". Here's the situation.
1. buildozer is really good at rewriting BUILD files syntactically correctly.
2. It has an unfortunate side effect of not being incremental. It calls buildifier to rewrite the entire file.
3. We update the formatting rules to make them stricter over time. That means that a "correct" BUILD file on January 1, might require changes on March 1.
4. Buildifier findings are advisory, rather than mandatory.
5. No team is staffed with repeating the monumental work this post started with.
The reality on the ground is that little touched BUILD files become stale, and would require a formatting update over time. It is actually worse than that, because many teams take the path of ignoring buildifier warnings and committing their working code anyway. Without continual BUILD file reformatting there is a lot of stale floating around. [Root cause: We could fix this by promoting people for doing that repeat work. But we don't. We promote for the initial sprint.]
And then a janitor comes along.
I use bulldozer to fix a problem. It reformats an ancient BUILD file completely (not incremental). I send it to the code owner. They see changes far beyond my 2 line fix. They reject it, or ask for a change to only the two lines that actually mattered. Sure. I can hand build the change once or twice. But not for a few hundred, or thousands files. So.... I have to hack up an incremental format. Or, it turns out that users are very happy if I don't bother with formatting at all, and just change single lines. It's not that any individual is right or wrong. It is that they all have a choice and a preference and Google created a policy that allowed individual teams to have a choice of strict compliance or not. That is the failure.
If you are going to have a policy about code formatting:
- make it hard mandatory for everything except a "break glass" situation
- if the policy can evolve, staff a team with enforcing it globally
The fact that Google, as a company, does not reward this behavior does not take away from any individual's accomplishments. This post may sound grumpy to an outsider, but I am constantly amazed at the tools I have available to fix things on an enormous scale. The friction is usually only where we have good intentions, without the policy teeth to enforce alignment with the intentions. That's a management problem, not a technical one.
I can't speak to what has happened with Buildifier, but in general you are right. It has to be a hard rule that if you change the format rules, you have to reformat everything to match.
If that means it's too hard to change the format rules, then don't change the format rules. And if you don't reformat, then it has to be a clear rule known to everyone (or written down somewhere you can point to) that incidental formatting changes are acceptable and not something you are allowed to push back on.
I can speak to Go and gofmt, and there we are VERY reluctant to change formatting rules. It does happen for the odd corner case once in a while, but nothing that would cause "changes far beyond my 2-line fix".
That is missing the point about the misdirection of costs. Your suggestion forces the people doing meaningful semantic changes, into involuntary servants to the goal of cleaning the stylistic problems. It's fine to have a policy that costs a little to each of the owners of their own code. It's a tax for the overall good. It becomes a problem if the cost of compliance is shifted to "the next person who looks at it." That encourages people to not look at it.
Being written in Java is essentially a non-issue. Bazel ships as a binary that self-extracts a JRE for its own use. You don't have to install Java unless you want to build Java apps.
The real issue for highly reusable products like Qt is portability of the build tool. CMake works on the long tail of OSes, some of which do not have any usable JRE available. That limits the ability to use Bazel as the only build tool.
For Bazel to be useful to projects like Qt (and other horizontal libraries like openssl, ICU, curl, ...) one would need a system for generating CMakefiles from BUILD files. The developers of those apps could use Bazel to improve build and test scaling, and then generate CMake (or other build tool) files as part of their packaging and distribution. I don't propose that this is an easy thing to do - all rules would need translation into other languages - just that it is a path which may benefit some projects.
LOL. Understatement of the year, Laurent. :-)
reply