Could it be that whenever coders today need some fairly trivial functionality, they tend to go out and find a library that contains it. So you end up with lots and lots of libraries where only a tiny bits of them are used. Just a hypothesis though.
There are many reasons why apps are so big, but I think you are right about a major part of the problem. Development environments these days make it very easy to add in third party libraries for very little effort.
At a previous job we had a monolithic java server that ended up at over 350Mb of compiled code simply because each development team had imported whatever libraries they thought they needed. In some cases, 2 or 3 versions of the same library were included.
How did they get multiple versions of the same library to work together? Java loads things via the class path so I would have thought that would cause some sort of error.
Some libraries change the namespace (package names) between major versions, specifically to allow transition from one to another in a gradual manor, or to start following namespace guidelines better.
Most of it is actual code - not assets. For some types of apps (see: games) assets do take up a significant portion of total size, but for most everyday apps bloat by code over-inclusion is likely a bigger problem than asset-bloat.
I wonder if it's possible to get major open-source libs to move towards more fine-grained build targets and internal dependency management, so that devs don't pull in a gigantic binary when they're only using a small slice of the functionality.
Depending on how they are made, how the linker is configured, and the phase of the moon, static linking can help by just taking the required functions.
Also I think we, as programmers, have taken the "don't reinvent the wheel" principle too far. The idea is to use 3rd party code to (1) save time writing, (2) reduce the risk of bugs, and (3) lower the maintainance burden.
But this makes sense only if the benefits outweigh the corresponding costs of integration, which also (1) takes time, (2) might be done wrong (especially because you don't understand the part you added), and (3) creates a maintainance burden. Of these, only #1 is solved when your development environment makes adding new libs quick and easy.
> I wonder if it's possible to get major open-source libs to move towards more fine-grained build targets and internal dependency management, so that devs don't pull in a gigantic binary when they're only using a small slice of the functionality.
Fwiw, this is already a trend in JavaScript land. Libraries are moving towards many small packages so you can import only the parts you need either manually or using a tool like Webpack to throw away what isn't used.
I don't think that's the right solution to the problem. Nobody wants libleftpad.so, and as someone who works on a distribution the very concept is horrific (making distribution packages for every three-line package does not make anyone happy).
What I think GP was arguing for is that you have libstring.so which you can strip down to just having leftpad or w/e with Kconfig or similar configurations (preferably at link time not build time -- otherwise you still have the same problem).
JavaScript people do not create libleftpad.so, they create libleftpad.o (metaphorically speaking), which will not clutter up your distribution, it will just clutter up some apps. I'd rather see 50 little files like libleftpad.o than 5 gigantic libraries.
But I agree with what you're saying in the second paragraph, which actually sounds just like what my GP meant by using "Webpack to throw away what isn't used". Having unused parts of larger libraries cut out would be a much cooler solution than just telling everyone to eschew kitchen sinks in favor of many tiny, bespoke items. Especially since finding the right small libraries is much harder than finding a kitchen sink that just works and has a sizable community behind it
> which will not clutter up your distribution, it will just clutter up some apps.
Not necessarily. Distributions have to build everything, and in openSUSE we have an integrated build system[1] which has a lot of very useful features (rebuild when dependencies are updated, and automated QA runs before releases). Those features require you to have all of the dependencies of a project in an RPM package. Even if you don't end up shipping that package, the package is used for tracking security issues and backports and so on. You can't pull anything from the internet during a build, you have to depend on your BuildRequires.
Now take any average JS application that has over 800 dependencies. Ruby was already bad enough with ~80 dependencies, but adding another order of magnitude is just not maintainable. One of my colleagues gave a talk at LCA about this problem[2].
I had no idea about this - thank you for bringing it up! But we seem to be discussing different ideas. I was trying to argue about apps, specifically iOS apps, like the original article was about. iOS apps aren't distributed on openSUSE, so if some devs make some kind of libleftpad.o for Objective C developers making iOS apps, I'm saying that's fine, and I doubt it would clutter up your distro because Objective C isn't exactly known for it's cross-platform adoption :)
If we're going to talk about linux packages, aren't they often written in languages with amazing optimization skills? If I write some C99 code that uses a gigantic library but I only use like 2% of it, my compiler will cut out the 98% my code doesn't use, so libleftpad sounds like an awful idea. That's one reason why packages on linux distros aren't too big.
But I'm talking about iOS apps where, as others have pointed out, the available optimizations suck[0], and as such, I think that having libleftpad.o included in everyone's iOS apps isn't a big deal (note that iOS doesn't really have a nice way to create libleftpad.so anyway AFAICT because all code for your app is supposed to be sandboxed so no one else can mess with or use it). I agree that it would be really cool to just cut out the 98% of $GIGANTIC_LIBRARY that isn't used at compile time, but since Objective C doesn't seem to have that now, I think small things would be a really nice way to give users more space on their phones without removing features.
Ah okay, I was talking about things like electron applications or desktop/server javascript projects. If you want to use $latest_js_web_framework and package it in a distribution, you're going to be in a world of pain (the same applies for Rust because their package management was based on NPM).
> I'd rather see 50 little files like libleftpad.o than 5 gigantic libraries.
And then you'll have to use software written in 50 different styles, with 50 different type and error conventions, with pieces that couple together or not at random.
> And then you'll have to use software written in 50 different styles, with 50 different type and error conventions, with pieces that couple together or not at random.
All of which matters not one hoot to folks _using_ the software.
Right, and most people could have much more free space on their phones (or, more apps). I'd much rather write some glue code than ask my users to dedicate 275 MB to install my app, not even counting cached data or anything else. There's a reason many of my friends and I have deleted apps like LinkedIn - it's not that we don't like them, it's that the the apps were too big for the little value they gave us, especially when considering the web versions give us almost all the same capabilities, and only cost a couple KB of persistent storage (cookies + history + maybe some localStorage or whatever).
Maybe you're not familiar with how it works. The idea is that you do, let's say, "import lodash" then your packaging system says "A-ha! But you're only using these 5 methods!" and only packages those.
Behind the scenes, lodash is built out of many modules, but you as a developer don't need to think about that.
Because tooling isn't perfect yet, you have the option of helping it out manually and being explicit about importing submodules. We were able to reduce our compressed JavaScript size by some 60% using a combination of automatic and manual tricks like that at my day job.
Right, but from what I've seen (though I'm not a JS developer, so I might be wrong) is that most projects are not designed like lodash. They are small, single-purpose libraries that would be much better served as part of a module of a larger library. That was the point I was trying to make with the whole "libstring" and "libleftpad" point.
I've seen people try to package JS projects inside a distribution. >800 nested dependencies is simply not sane nor sustainable. The fact that they're small is even less excusable because it means that there was an active decision to not consolidate them into a single dependency that is properly maintained.
If you want "just pull in the things I use" you can already get that by having a static library, and if you were going to be shipping a private copy of the .so file in your app then surely you would be better off with a static library -- you don't get any of the benefits of the DLL (sharing with other apps, can update the DLL without shipping a new copy of the whole app), so why pay the cost of it?
(If you link against the static library then the linker pulls in only the .o files that the app used, so assuming that the library was written in a sensibly modular way you pay only the binary size cost for more-or-less what you use. The linker can probably discard at a finer granularity than whole-object-file these days, but only-the-.o-you-use has been there for decades.)
Probably only widely used static C library with fine grained dependency tracking is glibc. The source code structure required for that is probably one of the major reasons for people to call glibc's source code unreadable and/or unmaintable.
Young minds don't like it when someone questions the wisdom of how they reinvented the wheel, even when the question comes from confusion instead of malice.
This happens a lot, especially with dynamic linking. I have a project that uses Qt, OpenCV, CUDA, PCL and VTK - fairly standard stack for 3D imaging and visualisation.
Since you normally need to bundle dependencies to account for different versions, this adds up quite fast. Qt adds 20MB for OpenGL, 15MB for the VC++ redist, about 30MB for other core libraries. Some stuff in OpenCV requires Nvidia's performance primitives (NPP), so in goes nppi64_80.dll - that's 100MB alone. opencv_cudaarithm310.dll is 70MB and even opencv_imgproc310.dll weighs in at 30MB. And on and on.
So yes, one little call to cv::cuda::remap adds in a boatload of dependencies when all the algorithm is doing is using a lookup table to sample an image.
It's pretty ironic that dynamic linking's main motivation was to eliminate duplicate code and reduce code size. Deploy a DLL once, every software uses it and doesn't have to include it.
Now it has turned out exactly the opposite way. If we went back to linking object code together we would get smaller sizes. Instead we have to include huge DLLs.
> dynamic linking's main motivation was to eliminate duplicate code and reduce code size
I'm pretty sure the main motivation is/was to allow patching of dependencies independently from applications. Very popular shared libraries might save space overall, but that is a secondary effect.
I had a fun time with this once. The Jetbrains Toolbox app is written using Qt and is 150mb. I opend the .app package on mac, deleted the frameworks directory, and symlinked it to my system's Qt install.
They seem to build it against a reasonable version of Qt, I haven't updated it in a while (because it's just a launcher), and when I do I'll just re-symlink against whatever Qt is currently installed (which I do keep updated)
I understand your argument, but in this particular case things aren't volatile enough that it causes any problems, and if it does it's just as easy to solve.
Depends on the language. If you're using something like C++, you can probably remove a lot of dead code. But with more dynamic languages, like Objective-C or Ruby or JavaScript, it becomes difficult or impossible to prove that a given chunk of code is never used. In Objective-C (which I'm most familiar with), the linker will keep all Objective-C classes and methods because it has no idea if you might be looking things up by name at runtime and invoking them that way.
Even if you can throw away dead code, you can run into bloat because a library has a massive set of foundational APIs that the rest is built on, and using one little feature of the library ends up bringing in half the library code because it's used everywhere.
You could use static analysis to find out if anything calls objects dynamically by name at run time which could be a useful optimization. To be safe this would be one bool value for the entire code base, but with some care it would help. A larger issue is size is simply not considered a significant issue.
With Objective-C, that will always come back "true." Even if you never use dynamic lookups, Apple's frameworks do. I suspect the same is true of other languages.
Nibs and storyboards are full of this stuff. They instantiate classes by name, call methods by name, etc. Some example documentation if you're curious:
Objective-C's dispatch is built around looking up methods by their selector, which is effectively an interned name. Looking up the selector can be slow, but once you have one, invoking a method using a dynamic selector is as fast as invoking one with a selector that's known at compile time.
> Interesting, I find that surprising due to the overhead involved.
Due to the frequency of this, objc_msgSend (which handles dynamic method calls) is hand-written in assembly, with caching and "fast paths" to improve speed. The overhead can usually be brought down to that of a virtual function call in C++.
That's the thing about performance. If you do something a million times, it's usually OK if the first time takes a thousand times longer than the fast case, as long as subsequent times are fast.
Look at how much code is written in Ruby and Python. Their method dispatches are way slower. To put it in perspective, it takes CPython about an order of magnitude longer to add two numbers together than it takes Objective-C to do a dynamic method dispatch.
The interesting part of this is that all this dynamism and dependence on runtime actually improves performance in comparison to C++. For example it is perfectly possible to implement objc_msgSend in portable C such that it is on modern superscalar and heavily cache dependent CPUs on average faster than C++-style virtual method call.
Really? That seems highly unlikely, and doesn't match with any of the speed testing I've done.
When you get down to it, an ObjC message send performs a superset of the work of a C++ virtual call. A C++ virtual call gets the vtable pointer from the object, indexes into that table by a constant offset, loads the function pointer at that offset, and calls it. An ObjC message send gets the class pointer from the object, indexes into that table by a constant offset, loads the method cache information at that index, uses the selector to look up the entry in the cache's hash table, and then if all goes well, it loads the function pointer from the table and jumps to it.
Depends on if you're statically or dynamically linking. When statically linking the final binary shouldn't contained unused functions from external libraries.
Leftpad is not an example of the issue mentioned. Leftpad is an example of a small module that was being used by a lot of projects. GPP is talking about pulling in larger libraries which contain code not actually used by the app.
(Also Electron bundles 'just the rendering library from Chromium' [1], not 'all of Chrome').
Well it wasn't that you weren't being specific, more that you were casually maligning the JS community unfairly, when the article isn't even about web/JS :)
(I also don't agree with 'don't need to'. The main takeaway from the leftpad debarcle was the fixes to the npm module deletion policy, and hopefully people learning they shouldn't rely on an 'npm install' for production deployments! Whether people should use small modules is still up for debate, there are trade-offs [1] [2]).
As I understand it, when you link your binary, the linker will only include the used parts of the libraries. Linking to a 10M library but calling only one function will not increase your binary size by 10M, probably just a few bytes.