Writing C software without the standard library

userbinator · on Nov 29, 2016

A value in the range between -4095 and -1 indicates an error, it is -errno.

The syscall/errno stuff has always seemed unusual, inelegant, and inefficient --- instead of just returning a negative error code directly, the function returns the vague "an error has occurred" -1, and you have to then check errno separately after that. It only adds insult to injury when you realise that the kernel itself isn't doing it, but the syscall wrappers. And thanks to POSIX standardising this mechanism, the alternative will likely never get much adoption; of course, if you write your own syscall wrappers like this article, then you can skip that bloat.

For now this guide is linux-only, but I will be writing a windows version when I feel like firing up a virtual machine.

Unfortunately the Windows syscalls are not officially documented and even less stable than on Linux, changing even between service packs.

http://j00ru.vexillium.org/ntapi/

http://j00ru.vexillium.org/ntapi_64/

At least on Linux the first few (i.e. the oldest, most common and useful) syscalls have not really moved around over the years:

https://filippo.io/linux-syscall-table/

masklinn · on Nov 29, 2016

> At least on Linux the first few (i.e. the oldest, most common and useful) syscalls have not really moved around over the years

IIRC raw syscalls are an officially supported kernel API, that's why you can have alternate libc implementations (e.g. musl), and Linux is an oddity in that, on most systems even if the syscalls are fairly stable there are no actual guarantees with respect to them, and the only officially supported interface to the kernel is the standard library. OSX does not allow statically linking libSystem for that reason for instance.

valarauca1 · on Nov 29, 2016

     IIRC raw syscalls are an officially supported kernel API

The term you are looking for is We don't break userland. Once a systemcall goes live it is engraved in stone. This is why if you look though the syscall table you see call, call2, call_ext.

ketralnis · on Nov 29, 2016

You're implying that exposing syscalls as the ABI instead of a higher level ABI is the only way to get backwards compatibility but that's not true. OS X can change its syscalls without breaking userland in exactly the way GP says: don't allow static linking of e.g. libSystem and its ABI is the ABI you're expected to use instead of directly calling syscalls.

Sure, Linux makes promises about its syscalls. But that's not the only way to get robust backwards compatibility and as GP says it's the oddity in that respect.

sebcat · on Nov 29, 2016

golang also targets syscalls instead of the C standard library (or other libraries except for windows, and maybe others), which is interesting on e.g., Darwin:

https://github.com/golang/go/issues/17490

masklinn · on Nov 29, 2016

Yeah the linked issue (16570) is the most interesting one, with the Go runtime breaking multiple times in the runup to Sierra as Apple changed the ABI of the underlying gettimeofday syscall.

iainmerrick · on Nov 29, 2016

Wow, that seems like a really bad approach for Go to take. So is anybody actually using Go on OS X? I guess maybe not, if all the real Go deployments are on servers running Linux.

zimpenfish · on Nov 29, 2016

I do a bunch of development work using Go on OSX but I do deploy the resulting solutions on Linux.

iainmerrick · on Nov 29, 2016

Aha, that's kind of what I'd figured. Work on Mac, deploy on Linux.

It makes me think twice about e.g. using Go as part of the build workflow in a big project if it's going to randomly stop working on various OS X versions.

bogomipz · on Nov 30, 2016

>"golang also targets syscalls instead of the C standard library"

What would have been the reasons for targeting syscalls directly instead of the C standard library?

masklinn · on Nov 30, 2016

> What would have been the reasons for targeting syscalls directly instead of the C standard library?

Since the standard library probably assumes (and requires) a C stack, linking against the standard library would require cgo (or some other specific workaround) on non-linux platforms.

aduffy · on Nov 30, 2016

I'm guessing it has to do with the fact that you can build go binaries not linked with libc, they also expose syscall.RawSyscall anyway that all the other syscall wrappers call out to last time I looked

Cyph0n · on Nov 29, 2016

Interesting, thanks for linking that.

0xcde4c3db · on Nov 29, 2016

Another example: for OpenBSD part of the manual upgrade procedure (not the recommended method, but useful if you don't have a console to run the installer) is to make a copy of /sbin/reboot, because there's a good chance that the new /sbin/reboot you're about to install will have the wrong syscalls for the running kernel.

goldfire · on Nov 29, 2016

> Unfortunately the Windows syscalls are not officially documented and even less stable than on Linux, changing even between service packs.

You can get pretty close though; it's possible to skip the C runtime and most of the other user space libraries and call into ntdll directly. Many of the functions it exports are fairly thin wrappers over the system calls.

int_19h · on Nov 30, 2016

ntdll.dll is still not a stable API - if you use anything exported from it (excepting a few documented ones), you basically have zero guarantee that your app will still be working with the next Windows update.

hermitdev · on Nov 29, 2016

I didn't see it mentioned at a cursory glance of the comments, but syscalls != standard library. Standard Libraray for C is that which is defined by the ISO C standard. syscalls are whatever *nix decides them to be. Additionally Standard Library != posix.

Sure, there may be some overlap or interleaving between Standard Libraries, syscalls, and posix, but they are definitely not the same.

77pt77 · on Nov 29, 2016

> The syscall/errno stuff has always seemed unusual, inelegant, and inefficient

And non-reentrant.

errno is usually defined as TLS (thread local storage).

It's a mess.

bodyfour · on Nov 29, 2016

Yes, if UNIX had supported multithreading from the start, there is no way they would have chosen "errno" as the solution.

Obvious in hindsight, not so much in the 1970s.

bogomipz · on Nov 30, 2016

Can you elaborate on how "errno"'s design is rooted in the fact that there was no multithreading at the time? I am not understanding the connection. Thanks.

lomnakkus · on Nov 30, 2016

Multithreading usually means that threads share the address space and so can (inadvertently) stomp on each other's "errno". (Hence making it Thread-Local Storage to solve that issue. Doesn't solve the reentrancy issue, though.)

If multithreading were pervasive nobody in their right mind would choose to use a global variable for error status codes.

bogomipz · on Nov 30, 2016

Ah yes that makes sense. Thanks!

PeCaN · on Nov 29, 2016

Somehow VMS managed to avoid that same mistake in the 70s.

77pt77 · on Nov 29, 2016

I keep hearing that a lot!

VMS sounds amazing, but somehow I'm still thinking this might be nostalgia.

Ace17 · on Nov 30, 2016

Is TLS actually needed for anything else than errno/GetLastError/SDL_GetError/etc. ?

77pt77 · on Nov 30, 2016

Thread id :)

vxNsr · on Nov 29, 2016

He claims that your code will be easy to port but then goes straight to Linux system calls.

Still I like the idea. This is something that should be covered in a CS 102 type course. I know way to many cs guys who have no idea how to debug, let alone how their is being implemented.

Mithaldu · on Nov 29, 2016

> He claims that your code will be easy to port but then goes straight to Linux system calls.

Many linux developers believe "code is portable" means "can run on different linux distros".

Edit: To be fair, he said architectures, and i think he meant CPU architectures, i.e. AMD64 and i386.

pdw · on Nov 29, 2016

But still, Linux syscalls are different on each architecture.

justincormack · on Nov 29, 2016

Indeed, stupidly different. Different syscall numbers, different argument orders, different values for constants, different struct layouts, ... Some architectures, like MIPS, are particularly bad, with nods to Irix compatibility thrown in.

The BSDs are all exactly the same for every architecture, sane.

drewg123 · on Nov 29, 2016

This dates back to how Linus did the first non-x86 port of Linux to the DEC Alpha. Rather than copying the x86 linux syscall conventions, he used the DEC OSF/1 syscall conventions to ease bootstrapping. This is a perfectly sane approach. But he probably should have reverted to a native syscall numbering rather than leaving this bootstrapping hack in place.

colin_mccabe · on Dec 1, 2016

Different platforms have different sets of registers and alignment requirements. You want to pass things in registers if possible. If you want to avoid inefficiency, you need platform differences. And this is all hidden from users and even developers anyway.

qb45 · on Nov 29, 2016

Does that mean you can run Irix binaries on MIPS Linux?

startling · on Nov 29, 2016

...which is why he used the SYS_ #defines.

kbp · on Nov 29, 2016

When I was learning C, I loved Plauger's book on the standard library for just that reason; it's very informative to be guided through the implementation of the whole thing, not only to see how it works, but it's also a good way just to be introduced to everything in it without any magic. It's an excellent 2nd book on C, I think.

jgord · on Nov 29, 2016

Plauger is one of my unsung heroes.

I discovered a little book by him after wading thru recursive descent parsers chapter by Herb Schildt... I don't want to get down on Schildt, I enjoyed that too, he showed some cool stuff..

but Plauger had me thinking that programming could be a craft, an art, a noble intellectual pursuit.

bogomipz · on Nov 29, 2016

What is the title of the book by Plauger you are referring to?

kbp · on Nov 29, 2016

"The Standard C Library"

bogomipz · on Nov 29, 2016

Thank you.

imron · on Nov 29, 2016

I've written Windows programs without the std library. The Win32 has plenty of replacement functions you can use instead though so it's much less work than what's presented here.

You get the same sorts of benefits though.

userbinator · on Nov 29, 2016

Win32 is itself a layer on top of the syscalls, and already provides much of the functionality of the C standard library.

imron · on Nov 29, 2016

Yes, that was partly my point.

You can get reduced executable size and a lack of dependencies on various msvc*.dlls, without giving up much of the functionality of the C standard library and without having to write it all yourself.

qb45 · on Nov 29, 2016

Fine, but you get this thanks to already being linked with these megagodzillas like kernel32.dll. Try without them.

ekr · on Nov 29, 2016

On windows nt, ntdll gets loaded in all processes, it's hardcoded in the kernel (http://gate.upm.ro/os/LABs/Windows_OS_Internals_Curriculum_R...). That's where the loader resides, so you can't go without it. And that's where all the Zw/Nt* functions/ i.e. where the syscalls are.

So you can't really go without it.

tubs · on Nov 30, 2016

Except the new picoprocesses, which don't get ntdll (or anything really, I guess).

vertex-four · on Nov 29, 2016

A program which isn't linked against the Win32 stuff is an invalid program, since syscalls in Windows are not at all stable.

lolisamurai · on Nov 29, 2016

Suppose you want to port to some architecture not supported by libc. If you were using libc you would have to find a replacement that works and targets that arch or port libc yourself. If you wrote everything from scratch, instead, you just have to read the specification and add support to your code. That's what I meant.

Of course, if your target archs are all supported by libc, porting is much easier with libc.

time4tea · on Nov 30, 2016

Well, not if you expect that you will have exactly the same time/space behaviour and pre/post conditions.

This is why lots of embedded, secure and defense software doesn't use standard libraries, and printf will instead be called sio04583 or something....

Also.. Don't forget that you can get gcc to get rid of unused code when compiling static executables, and use sstrip (yes, two 's' - it's a different program) to strip even more, if it's an ELF binary...

sqeaky · on Nov 29, 2016

Depending on the scope of your project porting libc itself might be good idea. Then you simplify porting any tools you need that also use libc. Then you can port a few extra dependencies, then more tools...

bogomipz · on Nov 30, 2016

What are some archs that arent't supported by libc? Just embedded systems?

mfukar · on Nov 29, 2016

Portability is impossible without the standard library.

Sphax · on Nov 29, 2016

What ? You just write your OS layer for each platform and it's portable.

gtufano · on Nov 29, 2016

True. The OS layer you write (and port to other OSes) is called "the standard library". :-)

frederikvs · on Nov 29, 2016

But at least now you know what's in there, what's going on behind the scenes. And on top of that, it's filled with all sorts of brand new bugs and vulnerabilities! ;-)

makapuf · on Nov 29, 2016

A .h file with different syscalls IDs and perhaps a few inline functions is hardly a standard library though

vidarh · on Nov 29, 2016

Not exactly syscall ids: I built network services (telnet daemon, snmp client and server, DNS) that compiled without change for win32 (both on Windows and some embedded win32 compatible abomination) and Linux with just a handful of defines and function wrappers as part of a contract once. A lot of apps can get away with very narrow interfaces to the OS.

masklinn · on Nov 29, 2016

> A .h file with different syscalls IDs

That's not portable even across versions of the same OS, many OS don't support raw syscalls and don't make any guarantees about them. That's why you can't statically link libc on OSX for instance, libSystem is literally the system's interface and necessarily dynamically linked.

makapuf · on Nov 29, 2016

That would be the few inline functions so to abstract a bit the syscalls.

masklinn · on Nov 29, 2016

It doesn't abstract anything, if the syscall changes your "abstract a bit" will be broken all the same. On many if not most OS the machine code side of the syscalls can change with no notice, Windows has changed syscalls in minor revisions, Go broke several times during the Sierra beta due to syscall changes (because it handrolls gettimeofday(2) whose assembly calling convention changed).

The proper abstract interface on non-linux systems is the standard library.

posterboy · on Nov 29, 2016

As far as PortableOSIX compliant syscalls are concerned, they are literally portable.

qb45 · on Nov 29, 2016

The userspace/kernel interface for invoking syscalls which the OP calls with assembly code is obviously not portable. Not even between similar OSs on one architecture and absolutely not between architectures.

POSIX only specifies a set of C library functions. Once you stop using them, you are on your own.

qb45 · on Nov 29, 2016

> He claims that your code will be easy to port but then goes straight to Linux system calls.

These are nowadays supported by Windows and few BSDs too, who needs more portability than that :)

LeoNatan25 · on Nov 29, 2016

"Supported on Windows" is a strong claim. They are supported in the Linux subsystem, but asking your users to install that seems far fetched to me.

pawadu · on Nov 29, 2016

> He claims that your code will be easy to port but then goes straight to Linux system calls.

Well, it is hard to see an alternative to that outside supplying your own OS with the library...

leeter · on Nov 29, 2016

A few thoughts:

* The space savings are moot as other processes such as the daemons are going to load libc into virtual memory anyway, and the kernel shares libc's page among all processes.

* This adds a lot of LOC you have to maintain, instead of shoving it off on the compiler/libc vendor, this increases the chance of bugs.

* This will prevent the use of VDSOs to optimize high volume system calls like gettimeofday.

* It's still probably good to know how these happen, even if you're not doing them yourself.

* The only place this would really see benefit is in a single process environment, however in those cases I would suggest a unikernel anyway for simplicity sake.

taeric · on Nov 29, 2016

I'd be interested to see benchmarks to consider point one. At face value, I fully agree with your point and doubt it has any benefit. However, it is less that has to be swapped in for your program to run. Would be curious if this has an odd cache friendliness for an application.

The tooling answer to this, it seems, would be to support statically linked libraries. But again, I would want to see numbers before personally worrying about this.

leeter · on Nov 29, 2016

So a few things to consider in regards to point 1:

* Your executable is going to be loaded as a whole page regardless of how large it is, on most platforms this means you'll need at least 4k of User VM.

* You'll need a page table, which has its own overhead. If someone was going to push back on the libc assertion I'd expect it here, a the PTEs for libc and and the VDSOs cannot be shared between processes (as far as I'm aware).

* I would expect it to in theory RUN faster assuming it was a small toy program like the example, this is because there is less work to be done even with shared pages.

taeric · on Nov 29, 2016

Right, my question is along the lines of avoiding the pages of libc. An easy question here would be how many pages libc takes up. I'm assuming not many, but more than one.

I think there is a strong argument that this page is often already paged into memory from everyone using it. However, if the function you used from it would have fit in the pages you were already using for your application, I could imagine some benefit.

I continue to stress, though, that this is just imagined. Numbers would be first thing I would have to collect before acting on this. (And I hope it doesn't sound like I am tasking you or anyone else with this. That is not my intent.)

leeter · on Nov 29, 2016

At least two is the best answer I can give without specifying a specific libc and architecture. Something to think about is that the binary size of libc is only half the story. Even if the executable portion of libc fits in a single page. A libc implementation has a lot of per thread and per process statics it holds onto.

A good libc developer could actually put these into separate pages based on how often they change. In other words, if a static only is ever set once then coalesce it into a page with other statics that are only set once and are not process dependent.

Why that's important: because of fork, when fork creates a new process it sets the parent processes pages read only and then preforms copy on write when they are modified. In theory you can share both the binary and some of the statics between all the processes.

jstimpfle · on Nov 29, 2016

He spoke about the space savings of the binary itself (which must be debug things and statically linked code). There is no sharing of that , unless multiple processes run that same binary.

leeter · on Nov 29, 2016

Unless the binary size exceeds VM page size this has no real value even in the unshared case. Because the OS is still going to have to map a full page for the executable to be loaded into.

That said techniques to allow a binary to fit under page size do have value; as the unused extra page can be used for other things.

Symmetry · on Nov 29, 2016

The OS may map a full page for the binary but I'd expect that the time to load the binary from disk to memory would be saved. And once in memory the program would certainly take up fewer cache lines when executed.

slededit · on Dec 1, 2016

At 4K the majority of the load from disk will be seek time (even on an SSD).

jstimpfle · on Nov 29, 2016

That's totally true, and there is more overhead associated to processes. 8K executables are negligible. Some other languages produce executables in the dozens of megabytes.

bogomipz · on Nov 29, 2016

Can you elaborate on why this "will prevent the use of VDSOs"?

Why can't I just call gettimeofday() which will access the vdso page mapped into my address space?

leeter · on Nov 29, 2016

Because the code that knows how to call the VDSOs is in libc, and how VDSOs work is very architecture dependent. Not saying you couldn't in theory do it. But at that point you might as well just load libc.

dvfjsdhgfv · on Nov 29, 2016

The guy is definitely a fan of old-school minimalism: http://weeb.ddns.net/0/articles/modern_software_is_at_its_wo... I have to say I miss the old days of Gopher, too. It was so much easier to focus on the content back then.

badsectoracula · on Nov 29, 2016

I find myself agreeing with him 100%. Not just on Gopher (although i did write a Gopher client some time ago - http://runtimeterror.com/tools/gopher/) but on the entire rant about wasting resources, UIs that waste screen real estate and become unusable in smaller resolutions, fonts that only look good with anti-aliasing and have weird misplaced pixels with antialiased disabled (which i also do), websites that make reading things harder and waste unnecessary resources on bling and fluff with all those javascript frameworks slowing them down, pagination often being replaced with "endless scrolling" which makes it hard not only to skip large bits of content but also hard to see how much content is there in the first place. And of course the newest worst trend of all - using an entire web browser as a UI framework for a text editor (i mean honestly, how people decided that HTML and CSS are the best technologies to use as the foundation for user interfaces?).

The only bit i'd disagree would be with games, at least on AAA games since i have some experience there and -at least at the engine level- there is still a lot of low level wizardry being done there.

thund · on Nov 29, 2016

  > I've even seen functionality regression for the sake of modern and
  > "responsive" design in many cases.
  >
  > For example, you used to be able to browse youtube favorites in
  > pages. Whoops, not anymore! Now you need to scroll down and
  > painfully wait for the site to make its stupid animation and load
  > the next page.
  > What's worse is, you can't just skip through pages. You have to
  > load EVERYTHING and it all stays loaded, so after 10-20 pages the
  > browser starts to lag and hog cpu just to scroll.

I see this issue more and more, scrolling is broken in so many sites, even popular ones like FB. Plus, often search doesn't look in the "scrollable" content, i.e. you need to load (=scroll) all these "pages" and rely on the browser search function. Many small things like this, e.g. snippets of code that don't fit the DIV box and you need to scroll horizontally. And news pages lagging on a i7 w/ 16Gb or RAM. It's all so sad.

flukus · on Nov 29, 2016

> Plus, often search doesn't look in the "scrollable" content, i.e. you need to load (=scroll) all these "pages" and rely on the browser search function.

That's a feature. They want all this valuable search data so the make you search via the site instead of the browser.

veli_joza · on Nov 29, 2016

I agree that most of software stack seems bloated and wasteful, but there are good reasons for it. The IT has exploded in last few decades and it's no longer just engineers and computer scientists writing code. If they were, software would maybe be of higher quality, but it would be rare, expensive, and insufficient to support all industries.

Web is especially good example. With all its short-comings, it's still sandboxed and portable platform with very capable (and ugly) programming language for delivering same codebase across desktop and mobile devices (even servers, unfortunately). There are many hits and misses like TODO apps built on Electron, but that's to be expected with any technology and isn't a reason to dismiss all the good applications.

To me, consuming all the hardware resources of each new generation seems to be the price we have to pay to build more complex software in larger and larger quantity.

What's bothering me is that it doesn't seem to be converging in desired unification of different applications and services. Everyone is building their own stacks and environment without much care for interoperability. For example, few years ago it seemed clear that XMPP is a way forward in chat clients that everyone should adopt to enable federation. Nowdays we are expected to have 3+ chat clients installed on phone and remember which contact is on which service. I really don't understand how users put up with it.

Jaruzel · on Nov 29, 2016

> (although i did write a Gopher client some time ago - http://runtimeterror.com/tools/gopher/)

I keep meaning to finish up my Gopher Client and release it upon the world...

https://www.weegeeks.com/upload/Jar-Gopher-Browser-Full-Scre...

IshKebab · on Nov 29, 2016

> wasting resources

I don't consider variable-width fonts, nice margins and dynamic word wrapping to be a waste of resources.

There's a sane middle ground between Electron and the 70s.

lj3 · on Nov 29, 2016

That ground is mostly unexplored. IMHO, the sane middle ground between Electron and the 70s isn't shoving an entire browser inside a desktop app. It's using modern programming and compiler techniques to get close to the metal without having to deal with all of the shortcomings of C/C++. It's also using UI libraries that make creating desktop UIs as easy as making a website, but with better performance.

Sadly, neither of the things I mentioned above exist yet. People are working on languages that fit that space, but most of them aren't done yet. Nim is the only one that's officially released.

As for desktop UI libraries, your choice these days seem to be between bad and worse. I'm working on my own desktop UI library, but I don't know what the hell I'm doing. It'll be a learning experience, if nothing else.

prewett · on Nov 29, 2016

You would be well-served to use Qt and Cocoa enough to get familiar with them. You may or may not like the language, but the organization and consistency of both are excellent. In particular, QT's signals/slots, naming consistency, and layouts make writing UIs easy without looking at the documentation (once you've used it a bit). Cocoa is well thought out, but a bit dated. (UIKit is not so well thought out, and forces MVC on you, which is often unnecessary and unhelpful, and then it completely botches UITableView by muddling up the M, V, and C. Qt's QTableView is much saner.) Unfortunately, it's also hard to write a UI in code in Cocoa, since you have to write the positioning system yourself due to no layouts. How they handle different screen resolutions is brilliant, though.

You might also want your UI library to be cross-platform, which will limit your choice of languages, especially if you ever want to port to mobile.

IshKebab · on Nov 30, 2016

I agree. Something like QML is sort-of what we need, but that is still too closely tied to C++ and not mature enough.

I'm hopeful about Go GUIs though. Go is an easy language to use and performant enough for complex GUI apps. Google abandoned gxui but it seems like they are backing Shiny at least a little.

lj3 · on Nov 29, 2016

> The only bit i'd disagree would be with games, at least on AAA games since i have some experience there and -at least at the engine level- there is still a lot of low level wizardry being done there.

He explicitly calls out people "writing poorly performing code on top of pre-made engines". I'm pretty sure he's aiming most of his ire at non-indies using Unity3D or Unreal.

joveian · on Nov 29, 2016

Indie games do this too. My favorite examples are:

Hatoful Boyfriend (remake that uses Unity), a simple graphical choose your own adventure with no animation at all that is painfully slow on my 3-4 year old laptop (although this must be intentional for some reason, it is hard to imagine how that could be achieved purly by abuse of unity)

Lethis - Path of Prgress a 'hipster retro' sim city in a steam punk world that fails to work at all with Intel integrated graphics.

omegaham · on Nov 29, 2016

Personally, I don't mind this at all, mostly because they wouldn't be writing anything if it weren't for Unity and Unreal.

Retra · on Nov 30, 2016

They could be decomposable libraries rather than singular massive engines.

angusp · on Nov 29, 2016

I can get behind some of that, I do hate the argument "it doesn't matter if it's inefficient, modern computers are fast" because everyone taking that stance results in things staying the same speed. When you do encounter modern software that's well optimised and not bloat, it's a real pleasure to be in awe at it's speed (Redis is an example I can think of off the top of my head)

That being said, I'm happy the web isn't still just plaintext or minimally styleable

TheArcane · on Nov 29, 2016

Member when webpages were all content and no ads?

gcatlin · on Nov 29, 2016

Cached version http://webcache.googleusercontent.com/search?q=cache:OAPjMoZ...

nwmcsween · on Nov 29, 2016

The comment section where gcc puts in ident info can be omitted with -fno-ident and syscall(2) is usually a very thin wrapper[0]. If you follow the musl syscall(2) it simply maps errors to errno[1] and uses the fancy count-args-in-macro[2] to call off the respective $arch/syscall_arch.h[3] syscall$n numbered functions.

[0] https://git.musl-libc.org/cgit/musl/tree/src/misc/syscall.c

[1] https://git.musl-libc.org/cgit/musl/tree/src/internal/syscal...

[2] https://git.musl-libc.org/cgit/musl/tree/src/internal/syscal...

[3] https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/syscall...

justincormack · on Nov 29, 2016

Highly recommend reading the Musl source code if you want to find out how things work. Don't bother trying to look at Glibc.

shakna · on Nov 29, 2016

Cannot agree more.

glibc's source feels like archeology: there's so much history and remnants of bygone eras.

musl feels like a structured, well-engineered and specified piece of architecture.

This isn't a knocj against glibc. But it was grown, not built.

musl and the team's great documentation have been incredibly handy when I've been building tightly constrained applications.

Sean1708 · on Nov 29, 2016

Why does syscall use varags then assume that there will be 7 arguments?

beeforpork · on Nov 29, 2016

If the asm was written a little more cleverly, the syscalls would avoid almost all moves, because the compiler'd put everything in place:

  _syscall5:
    mov %r9, %r10
  _syscall3:
    mov %rcx, %rax
    syscall
    ret

And then:

  extern unsigned long _syscall3(
    unsigned long, unsigned long,
    unsigned long, unsigned long);

  extern unsigned long _syscall5(
    unsigned long, unsigned long, unsigned long,
    unsigned long, unsigned long, unsigned long);

  #define syscall0(NUM)             _syscall3(0,0,0,NUM)
  #define syscall1(NUM,A)           _syscall3(A,0,0,NUM)
  #define syscall2(NUM,A,B)         _syscall3(A,B,0,NUM)
  #define syscall3(NUM,A,B,C)       _syscall3(A,B,C,NUM)
  #define syscall4(NUM,A,B,C,D)     _syscall5(A,B,C,NUM,0,D)
  #define syscall5(NUM,A,B,C,D,E)   _syscall5(A,B,C,NUM,E,D)

ant6n · on Nov 29, 2016

Is that going to be inlined properly?

oso2k · on Nov 29, 2016

A little self promotion but mostly because it addresses some of the other commenters concerns about malloc (or the lower-level api around sbrk): a couple years ago I wrote rt0 [0], a small (mostly minimal) C runtime for i386 & amd64 that makes it easier to replace libc & crt0 (as long as you have the kernel headers installed). Also, as part of the examples, I wrote wrappers around the sbrk syscall. Pretty easy to do and all documented in the repo. I expect to eventually port the lib to arm (raspberry pi) and aarm64. There's also lots of references to other small c runtimes. I'll be adding this one as well.

[0] https://github.com/lpsantil/rt0

pawadu · on Nov 29, 2016

I think this is unnecessary when you got <stdint.h>:

    typedef unsigned long int  u64;
    typedef unsigned int       u32;
    ...

if you define your own types like this you may need to revise them when you switch architecture or even compiler.

Now you could argue that this is part of the standard library, but I actually see it as a part of the standard C language.

DSMan195276 · on Nov 29, 2016

You're getting down-voted (Which is unnecessary IMO) but you're really not wrong. `gcc` itself provides `stdint.h`, it's not actually part of libc - or rather, you can use it without actually having a libc in place. Generally this is a good move, because you can always write a `stdint.h` replacement on arch's that don't have one, but on ones that do you're guaranteed to get the types correct.

sincerel · on Nov 29, 2016

I use stdint.h too, but I'm honestly curious if there is any common platform around today where one of the following asserts fails:

    int main() {

        assert(sizeof(signed char) == 1);
        assert(sizeof(short)       == 2);
        assert(sizeof(int)         == 4);
        assert(sizeof(long long)   == 8);

        return 0;
    }

I'm not interested in the language lawyering, because yes I know the standard provides more freedom to compilers. I just think those definitions are very universal for any real computer that would otherwise run my software. Please don't bring up Windows 3.1, that's about as relevant to most of us a PDP-11.

And for what it's worth, using typedefs based on the above provides more readable printf strings. This is hideous:

    int main() {

        int64_t portable = 123;
        printf("Ugly: %" PRId64 "\n", portable);
        
        return 0;
    }

Where this is acceptable:

    int main() {

        long long palatable = 123;
        printf("Better: %lld\n", palatable);
        
        return 0;
    }

DSMan195276 · on Nov 29, 2016

Well there's a few things to keep in mind. For one, `sizeof(char)`is guarenteed to be `1` - the standard simply says so. That doesn't mean that a `char` is 8-bits long, however. On certain systems, like TI DSP's, `char` is 16-bits long, and thus `sizeof(short)` is also 1. `sizeof(int)` may be either 1 or 2, I can't recall (both are standards compliant).

All that said, POSIX requires `CHAR_BIT == 8`, which then basically ensures what you wrote to be true. So if you're willing to target POSIX (or POSIX-supporting systems) then such a thing is perfectly fine.

The `stdint.h` types are still better though, IMO, because they make your intentions a lot more clear.

jblow · on Nov 29, 2016

I think int is 8 bytes on the PS4. (I just got bitten by this...)

I_deny_it · on Nov 30, 2016

How does the PS4 declare a 4 byte int? If that's "short", is there a way for a 2 byte int?

DSMan195276 · on Dec 1, 2016

I haven't programming on the PS4, but an 8-byte `int` sound very suspect and fairly unlike (but not impossible). That said, it wouldn't be a huge issue. `short` could either be a 2-byte or 4-byte int (Either would be standards compliant), and `char` would presumably still be byte-sized (Not doing so would be a fairly big issue to deal with).

That leaves out either the 2-byte or 4-byte int from the standard data-types, but you can gain that back by simply using a compiler attribute or compiler-defined type to allow access too it. While that sounds non-standard, it really wouldn't be that bad because it could simply be used in `stdint.h` to expose the standard `int16_t` and `int32_t` types, which could be used like normal.

bluecalm · on Nov 29, 2016

A lot of good programmers do that and the reason given is that types in stdint.h have ugly long names like uint32_t. "_t" thing rubs a lot of people the wrong way so I am not surprised they change it to something more pleasant. I like using uint64, int32 etc. but the convention where it's u64, i32, f32, f64 is something I would get behind.

nathan_f77 · on Nov 29, 2016

> When we learn C, we are taught that main is the first function called in a C program. In reality, main is simply a convention of the standard library.

Well, I've already learned something new. I assumed that convention was from the compiler. This is a great resource.

gibsjose · on Nov 29, 2016

While this seems mainly useful as an academic exercise, the `printf "#include <unistd.h>" | gcc -E - | grep size_t` bit to easily grep in header files was worth the read.

112233 · on Nov 29, 2016

indeed. under bash, it can be shortened to

cpp <<< "#include <stdio.h>"|grep size_t

which is super convenient

rikkus · on Nov 29, 2016

It's interesting to read the sources[1] of lots of djb's[2] code, as he often works around problems with (or perhaps dislikes the style of) standard libraries by re-implementing parts.

[1] https://github.com/abh/djbdns/blob/master/str_len.c [2] http://cr.yp.to/djb.html

csl · on Nov 29, 2016

On a side note, isn't the choice of exactly four unrolls very architecture specific? As in, it works, but may be sub-optimal for your specific machine. I've done the exact same thing myself, and IIRC its performance varied a lot between which ISA it was compiled for.

This is almost what Duff's device solves, except then you need to know the length beforehand.

rikkus · on Nov 30, 2016

Absolutely. It's a (possible) optimisation that is either based on evidence (seems likely, because DJB) or hope. Actual behaviour is impossible to predict on untested platforms.

My assumption is that DJB tested this locally and found enough of a speedup that it was worth it, considering the very low added complexity and risk of major degradation / defects on untested platforms.

capnfantasic · on Nov 29, 2016

Fantastic until you need to malloc. You're reimplementing libc, but at least you know what's going on at every level.

DSMan195276 · on Nov 29, 2016

`malloc`'s not really that bad. There's a few different approaches you can take, but none of them are terribly complicated since the two basic memory allocation interfaces, `sbrk` and `mmap`, are fairly simple in terms of usage for generic allocations. But getting it all working and bug free still takes time. Same with stuff like `printf` and `scanf` (Though I'd actually argue those are harder to write then `malloc` if you're looking to be feature complete. `printf` has billions of features and I'm pretty sure `scanf` requires some extra black-magic internally).

There's no doubt that this is a fun project though - if you or someone-else enjoys this type of stuff, you should definitely try your hand at writing a simple Unix kernel or similar, you'd probably enjoy it.

On that note though, the writers aversion to inline assembly is unfortunate. It's a necessary evil for this type of programming. The syntax is ugly, but it's not really that hard to get used too (Especially since the large majority of inline assembly is just a few lines long, or even just one line long). In particular, the syscall wrappers can be done in a one-line piece of inline assembly, and then you can avoid the function-call overhead for the syscall by placing the inline assembly in a `static inline` function in your headers (Or a macro if you prefer), as well as avoid the extra .S file (Which IMO is the better part - it's always easier when you don't have to mix different languages like that).

I would also add that, while I used to share the aversion for AT&T asm syntax the author does, virtually all of the assembly code out there related to linux is written in AT&T, so it's worth it to get used to it and at least be able to read it. On that note, you can use the Intel syntax in inline assembly though, if you prefer, so even if you hate AT&T with a passion you can still write inline assembly ;)

pg314 · on Nov 29, 2016

You can get surprisingly far without using libc's malloc/free. E.g. TeX, the typesetting system by Knuth, implements its own dynamic memory handling. It has a large static array of bytes, and allocates from that when needed.

majewsky · on Nov 29, 2016

That's an arena allocator: https://en.wikipedia.org/wiki/Region-based_memory_management

Arenas are really nice if you're allocating a lot of objects of the same size, whereas malloc() must be prepared to handle a lot of different memory usage patterns.

pg314 · on Nov 29, 2016

I don't think it is. Differently sized objects can be allocated and released individually. Have a look at part 9 of [1]. In an arena based allocator you typically deallocate all the objects in an arena at once.

TeX basically uses a special purpose implementation of malloc/free, with a static array as backing instead of memory requested from the OS with mmap(2) or sbrk(2). The main reason is portability (the original version was released in 1978 using WEB/Pascal).

[1] http://brokestream.com/tex.pdf

nuclx · on Nov 29, 2016

FreeRTOS also provides a few malloc implementations backed by static arrays (not dependent on sbrk), which can be useful for running malloc-based test code on embedded platforms without native malloc: http://www.freertos.org/a00111.html

vidarh · on Dec 1, 2016

While one of the benefits of an arena allocator is to be able to deallocate everything at once, it's not that unusual to have an arena allocator that you can deallocate from "early" if needed.

std_throwaway · on Nov 29, 2016

Unless you also need to free, it's pretty simple.

shultays · on Nov 29, 2016

Also easy if your free/malloc is just a wrapper around munmap/mmap.

capnfantasic · on Nov 29, 2016

Of course.

When I wrote that comment I asked myself should I have written "malloc" or "malloc/free" - surely one implies the other.

vidarh · on Nov 29, 2016

For short-lived processes that do lots of allocations and where you can rely on the OS to release resources, just leaving out the deallocations is often faster.

Of course, you need to be careful as if you write code like that in a language without garbage collection, it's inherently not reuseable - retrofitting deallocation is often really painful because it gets easy to adopt patterns that make object ownership etc. unclear when you don't have to ensure it's easy to deallocate in the right order.

marcosdumay · on Nov 29, 2016

Not surely. Many programs are written in a way that allocates all the needed heap space at startup, and just reuse it forever. And those are overrepresented on the minimal-system kinds of environment.

dom0 · on Nov 29, 2016

Oh well, with 16 GB RAM even in laptops, who needs free anymore? Just restart the program. It's simpler anyway.

makapuf · on Nov 29, 2016

ironic for a minimalist/anti bloat pamphlet to start with "with 16GB ..."

pjc50 · on Nov 29, 2016

malloc() isn't particularly hard; K&R provides a working implementation using a freelist and sbrk in about a page of code. It's printf() that's the horrendous feature-crammed nightmare.

coreyp_1 · on Nov 29, 2016

It's posts like this (and the accompanying comments) that make me realize how much I still have left to learn!

One of the reasons that I love HN is how informative you all are!

lolisamurai · on Nov 29, 2016

The server is getting hit pretty hard right now, did not expect this much traffic. In the meantime, you can find a bbcode mirror of the guide here: https://ccplz.net/threads/writing-c-software-without-the-sta...

kriro · on Nov 29, 2016

Some of the reasons that he mentions for avoiding the standard library could also be mitigated by using another library like dietlibc (I played around with it back in the day, last release seems to be from 2013): https://www.fefe.de/dietlibc/

partycoder · on Nov 29, 2016

This is required if you do systems programming (e.g: kernel development).

lokedhs · on Nov 29, 2016

Or demo coding on old hardware. I sometimes write demos for the Atari ST (68000-based home computer launched in 1985), and the modern way of doing that is to develop on a modern computer, and cross-compile to a native ST executable.

The main loops are all assembler, but the support code is in C, but the C code is used as a more expressive assembler, and linking with libc requires way too much memory.

All this means that I don't even have things like memcpy() available. In a way it's a quite liberating way to program, since you are in full control of the hardware.

I guess yesterday's computers is today's embedded hardware.

DaiPlusPlus · on Nov 29, 2016

I'm curious what the use case of memcpy is in highly-optimised software. Are there any scenarios where copying bytes is better than using a char*+length tuple?

unwind · on Nov 29, 2016

There can be hardware-driven requirements that force you to simply have data in a particular place in memory, and if you want that data to hang around you might need to manually move it somewhere else.

There's also the case where your API accepts a pointer and a size, but you don't want to have lingering pointers into the caller's memory, so you have to copy the data over to the "inside" of the API. This kind of design is perhaps less common in demo software, but certainly plausible in embedded products which at least try to be somewhat optimized.

lokedhs · on Nov 29, 2016

That is exactly it. For example, on the Atari ST you display graphics by copying the bitmaps to the screen address.

Much of the C code is used during precomputation of data before the actual time-critical code is run. This involves copying lots of data in order to set it up so that as little computation as possible is performed in the actual time-critical parts.

CyberDildonics · on Nov 29, 2016

Are you seriously asking if there are valid uses of memcpy? Serialization, concurrency, io to name some very specific uses.

flukus · on Nov 29, 2016

> Executables are incredibly small (the http mirror server for my gopherspace is powered by a 10kb executable).

Is this ever an real issue, even on any embedded system in the last 20 years?

targ2002 · on Nov 29, 2016

I have seen colleagues have to do very similar things on embedded platforms, when there is a large pressure on price, a smaller amount of flash on an MCU will make the MCU cheaper.

mhandley · on Nov 29, 2016

Even if you've got the RAM, it never hurts to improve CPU cache locality.

capnfantasic · on Nov 29, 2016

> Is this ever an real issue, even on any embedded system in the last 20 years?

Ask Cisco when they cut the Linksys routers' RAM in half a few years ago. Every byte counts. Component cost savings add up when you make a few million of them.

shakna · on Nov 29, 2016

AVR atmega based devices generally have about 2kb SRAM, and 32kb flash. Maybe 2kb EEPROM.

m_eiman · on Nov 29, 2016

This is true, but the stdlib provided by the compilers aimed at these chips is usually very bare bones and size-optimized to begin with. So it's likely hard to save much space by reimplementing subsets of it.

shakna · on Nov 29, 2016

True, but you often end up avoiding parts of the stdlib like malloc anyway, because they tend to be heavy handed on the board.

(I've never needed to remove stdlib yet).

BinaryIdiot · on Nov 29, 2016

Honestly, if I can, I try to cut down on sizes of compiled binaries and even minified and combined web app code. Sure many times it's unnecessary but if I can save on bandwidth and memory typically that means I'm including less crap that can have problems, too. I see it as including far more benefits than simply getting a file size smaller.

But sometimes it can take so much extra work that it's not worth it. So gotta do the cost benefits analysis or hell if it's something you're interested in doing just do it anyway.

bronxbomber92 · on Nov 29, 2016

Yes, executable size is still a modern day problem. Imagine you're shipping an OS; do you want the hundreds of thousands of executables in your system to all be a few percentage larger when you're trying to ship to a customer who may only have 16 or 32 GB of storage space?

Of course, removing libc won't be your first (or second or third or ...) step for removing bloat from a mature codebase.

flukus · on Nov 29, 2016

If you'r trying to reduce bloat in this OS would be dynamically linking anyway wouldn't you? I've installed those 300MB printer drivers so I'm all for reducing bloat. This just seems to be on the extreme end of things.

huhtenberg · on Nov 29, 2016

It's not a real issue, no.

It's a statement of one's professional competency.

nitwit005 · on Nov 29, 2016

I've tried this myself. What you'll run into is that you tend to need a few things that are non-trivial:

An implementation of malloc/free

Functions to parse and print floats (somewhat system dependent)

Assembly implementations of any trigonometric functions used

While there is code that goes to that effort (The Go runtime comes to mind), it's quite a pain for "normal" code.

bogomipz · on Nov 29, 2016

I had a question about this sentence:

"It's often necessary to either push useless data or simply align the stack pointer when the pushed values don't happen to be aligned."

That's kind of hand-wavy. How do we "simply align the stack pointer"?

david-given · on Nov 29, 2016

All modern platforms have decreasing stacks, so just AND it with ~(alignment-1).

Of course, you now need to keep track of the old stack pointer so you can restore it. Most code saves the old stack pointer into the fp, so you can just do sp = fp to undo any pushes without needing to care about how much was pushed; but it's cleaner and more efficient to have the compiler arrange things on the stack so that everything's already aligned and you don't need to do it programmatically.

...why, yes, I have spend the past couple of months with my head buried inside a compiler backend; why do you ask?

JoeAltmaier · on Nov 29, 2016

On most architectures, by decrementing it appropriately. E.g. subtract 4 to align from a 4-byte to an 8-byte boundary.

bogomipz · on Nov 29, 2016

Thanks for the responses.

sytelus · on Nov 29, 2016

It would be wrap this up in lightweight libc. There is uSTL for C++: https://msharov.github.io/ustl

jxy · on Nov 29, 2016

It's a very good learning process. But once your project scales up, you are essentially writing your own libc.

And there is no portability. It only works with the specific architecture's calling convention and the specific c compiler.

dispose13432 · on Nov 29, 2016

> xor rbp,rbp /* xoring a value with itself = 0 */

Is this faster than a (const) mov ?

vram22 · on Nov 29, 2016

Apart from the replies to your question here, there was another interesting sub-thread on HN on this topic recently:

"Also, I like how returning 0 is "xor eax, eax"."

https://news.ycombinator.com/item?id=13052503

That led to multiple replies on it.

capnfantasic · on Nov 29, 2016

Traditionally yes. On the latest CPUs - who knows.

SonOfLilit · on Nov 29, 2016

Probably yes, because Intel knows this is the code every compiler outputs for zeroing a register.

Also, the reason it is "faster" is that the encoding is 1 byte, vs. 9 bytes (in 64 bit) for "mov rbp, 0" - roughly, 1 for "mov rbp,", 8 more for a 64 bit "0".

bonzini · on Nov 29, 2016

Technically you could get by with 5 bytes for "mov ebp, 0".

Another reason why it was faster was that the processor recognized it and avoided partial flags stalls after an "inc". But in 64-bit code you rarely have "inc" at all, so it matters less. On the other hand, a few years ago XOR had a false dependency on the register you're clearing; I'm not sure it is still that way on more recent processors.

SonOfLilit · on Nov 29, 2016

I tip my hat to you, your analysis is far more interesting than mine.

bonzini · on Nov 30, 2016

Wrong too, it's partial register stalls not partial flags stalls.

thewavelength · on Nov 29, 2016

What is necessary to do this with C++? Is there a tutorial available on the web?

posterboy · on Nov 29, 2016

behold the demoscene https://in4k.github.io/wiki/c-cpp

focused on windows, because grafix. also, linux guys are more likely to use C anyway.

I only remember a comment about avoiding exceptions, which I might have read first in the context of micro controllers.

https://www.google.de/search?q=site%3Apouet.net+c%2B%2B+exce...

https://www.google.de/search?q=site%3Apouet.net+c%2B%2B+stdl...

DaiPlusPlus · on Nov 29, 2016

Your first paragraph makes me wish this site supported Markdown.

sctb · on Nov 29, 2016

We detached this subthread from https://news.ycombinator.com/item?id=13061454 and marked it off-topic.

BinaryIdiot · on Nov 29, 2016

Yeah off topic but if HN supported markdown (and GitHub's flavor of markdown so we could take different types of syntax with their language) would be amazing. There would be a ton more coding examples and discussion, in my opinion, if this were to happen.

How do we summon Dang? :)

masklinn · on Nov 29, 2016

Code coloration wouldn't be that useful (if the snippets are that long they should probably go in a separate pastebin or a proper repository, reddit's comments do just fine without)

On the other hand backslash escaping (the lack of it and emphasis clusterfuck drives me nuts), block quotes, inline code/monospace and links I really miss.

Section titles, lists and tables would be nice as well, though not deal breakers.

Considering the maintainers have repeatedly refused to make any improvement to comment formatting in almost 10 years since HN was created, I'm not holding my breath though, it's obvious nobody cares about comment authorship and craft.

posterboy · on Nov 29, 2016

my netiquette says, more than about three lines of code should go in a pastebin anyway

lsaferite · on Nov 29, 2016

Then in a few years when someone is reading old posts the links to old services are broken or the service is gone. If the code is central to the comment, why would you put it some place other than the comment? If it's not small enough you can collapse it inline and if it's so large that you don't want it with the comment then perhaps you should be rethinking what you are posting.

tldr; practicality and longevity should trump netiquette.

BinaryIdiot · on Nov 29, 2016

I agree and this was my original thought. For example Reddit, for most of its life, relied on imgurl for hosting images. Before the single place images constantly broke after services died. With imgurl being third party even it has led to broken images at times (though with significantly less frequency). Now reddit is doing their own.

Hacker News is very technical and code heavy. Seems to make sense to me that some may want to communicate / discuss code itself. I could even see it opening up more conversations like "this is my implementation of X; thoughts?" or "do this in any other language" challenges.

wott · on Nov 29, 2016

That's a very personal netiquette :-)

fao_ · on Nov 29, 2016

Or at least the backtick syntax for inline code

masklinn · on Nov 29, 2016

And being able to escape "*" instead of the current clusterfuck.

clifanatic · on Nov 29, 2016

Interesting - my McAfee web washer blocked this site. Don't know why.

valarauca1 · on Nov 29, 2016

A lot of web/internet filters block ASM related content. My company uses Barracuda Networks filter and most ASM references/content are blocked, Reason: Hacking

mjevans · on Nov 29, 2016

They're not technically wrong... it's just 'hacking' in the traditional black magic/voodoo manipulation of actual systems components sense. That is, literally taking a hacksaw to a circuit board and altering it or making a new board.

taocipian · on Nov 29, 2016

the C standard library is not perfect but good enough

00k · on Nov 29, 2016

An essential function of stardard library is to wrapper the syscalls. Besides that, you can make a live without the library. But why would you do that?

eliangidoni · on Nov 29, 2016

I can't believe this post has 409 points. Are we in the 80's again ?

SFJulie · on Nov 29, 2016

Myth busted : printf("Hello world") is simple and is a relevant C program for a beginning.

The "hello world" example is just the first step to annihilate your capacity of understanding how thinks works by relying on institutional black magic, that maybe wrong

(see all the scanf bugs that have been living in C code for so long and all bugs coming from respecting the old's man wisdom)