C is like the marketing slogan to Othello (reversi):
A minute to learn, a lifetime to master.
The language is fairly simple in terms of features and aspects you have to learn to use it, but concepts like pointers and direct memory management are difficult to master. Programming in C is like building a building out of bricks; bricks are relatively simple objects, building a building out of them is not a simple task.
Agree with both parents. "Simple" to me means that the code does exactly what I ask: no more, no less. For better or for worse. No behind-the-scenes magic (garbage collection, etc). I know exactly how much memory I am using, where it's allocated, and how it's passed around. Again, for better or for worse. With great power comes great responsibility!
Has very little to do with the language, apart from what undefined and platform dependent behavior allow the compiler to trick around. Of course, if the compiler goes nuts or is buggy, anything can happen "behind the scenes", but as far as I'm aware, this has nothing to do with the language as specified in the standard.
The problem is when what you ask for is incompatible with how it's done. Hence me today observing a price field in a record containing $9.500000000000000000002, knowing that making the obvious problem there "easy" will result in painful data manipulations elsewhere, and remembering that under certain conditions a Boolean can be neither true nor false. Asking a binary computer to handle non binary data just plain gets weird sometimes.
I think that pointers are hard is self fulfilling prophecy. We studied them in the 9th grade. Because nobody bothered to tell us they were hard almost my whole class - around 80% after the first two lessons groked pointers and linked lists.
Same was with pointer arithmetic and function pointers.
Right now I am struggling mightily with monads mostly because the voice in my head tells me the thing all of the internet is singing - they are hard don't bother.
Is it a small language? The standard takes a dense 179 pages to describe the language. I guess it's debatable if that is small or not. I certainly don't agree that it's a simple language though. There are myriad rules and exceptions to rules to remember. When and where one integer type will be converted to another. When overflow is defined and when it's undefined. Just those things alone mean that much real world C code is full of undefined behaviour[1] just related to arithmetic on numeric types, because the rules are hard to remember and reason about.
John Regehr asked people to submit str2long() implementations that didn't execute undefined behaviour. Despite being well warned about avoiding overflows and other undefined behaviour only 35 of the 78 submissions passed the test suite[2].
Even a trivially small 2 line C function can result in different results on the common compilers depending on which compiler and which optimisation level you use[3]. Compiler engineers struggle to agree on and correctly implement the "simple" rules of C. To me that's an indication that they aren't so simple.
I would say that C doesn't do anything behind your back as long as you don't ever use an optimising compiler or you pay very very close attention to the standard. If you forget then suddenly your check for pointer arithmetic overflow has been "helpfully optimised away" behind your back for reasons that are not immediately apparent at all[4].
I do agree that apparently simple rules can lead to complexity in use, and C is full of this too. For example you would think in a lower level language like C viewing a chunk of memory as a different type would be trivial, but the strict aliasing rule means you have to be a language lawyer to understand what is allowed and what isn't[5] as it makes the intuitive solution into undefined behaviour (which is extra pernicious since most of the time it will work as intended, until somebody compiles the code with a smarter compiler at a high optimisation setting).
Compared to other languages, 179 pages for the full specification is pretty small. Java and C sharp have that more than three that size. C++ is roughly five times as large. If you're a fan of functional languages, Haskell is almost twice as large. Even Scheme, an extremely simple language, is only about 30 pages shorter.
The book "The Definition of Standard ML" is ~150 pages for a language that is simple (IMO) and rich. Bonus, the definition is not written in prose, but with typing rules and operational semantics.
> Bonus, the definition is not written in prose, but with typing rules and operational semantics.
While that is certainly nice, I suspect it makes it hard to really compare based on just pagecount. I doubt prose and such a formal definition like that are equally dense.
Agreed. From what I've read [1], the syntax rules and semantic rules take up about 40 pages with the rest of the book being "introduction, exposition, core material, appendices and index".
I wouldn't say that any of that means that C itself is complex. It means your computer's native instruction-set architecture is complex (and full of undefined behavior), and C, being simple, just gives you relatively transparent access to it rather than trying to abstract it and standardize it and generally clean it up.
That does not contradict what was said. C says "undefined" so that compilers are free to simply let each CPU do whatever is easiest for that CPU. On any particular CPU with a particular compiler you get consistent behavior. But when you switch CPUs, watch out.
Thus YOUR SPECIFIC CPU's instruction set may be well-defined. But if you say "your CPU" to a group of people with different CPUs, there may be no simple statement that is well-defined and generalizable across all of them.
> C says "undefined" so that compilers are free to simply let each CPU do whatever is easiest for that CPU. On any particular CPU with a particular compiler you get consistent behavior. But when you switch CPUs, watch out.
That was how it worked in the 1990s. Nowadays, a C programmer needs to figure out which undefined behaviors are justifiable and which should be avoided at all costs because they will be used by the compiler to justify optimizations. And Signed arithmetic overflow used to be in the first category, now it is in the second one. So is the use of uninitialized variables.
Not to appeal to authority, but I worry about these things for a living:
If you cannot be bothered to read that much, then please simply compile int f(int x) { return x + 1 > x; } at different optimization levels with the compiler you already have, and observe the values for f(INT_MAX) in each case.
By small, I meant that you can hold every little part of the syntax and normal behavior in your head pretty easily. Yeah, there are edge cases (important ones even) and you can build god-awful complicated expressions if you want. But look a the K&R books -- they are slim. It just not that big.
By not-doing-stuff-behind-your-back, I mean that you can map the code you write into machine instructions in many cases. An optimizer will move stuff around on you but it is typically local(-ish) manipulations. That's much less black magic that garbage collection. You have to manage memory yourself.
By being-ballet, I mean that it is hard. I am a swing dancer. The best dancers I know all took ballet classes as kids. None of them dance ballet today. It is probably coincidence/age but the best programmers I know all spent a decade or more working in C. I think working in C gives you a level of understanding about what a computer does that a higher level language doesn't. That said -- if you want to get stuff done, use a higher level language. I'm really good at C coding but I'm x2-10 faster when in C#.
>By small, I meant that you can hold every little part of the syntax and normal behavior in your head pretty easily
Well I think that is where we disagree. I don't find it easy to hold all the rules of C in my head. It's plenty large enough that there are things I hardly ever use. Even the common parts of the language like arithmetic on integer types can get complicated very quickly if you want to be sure your code contains no undefined behaviour or works correctly for INT_MAX etc. Often you have to understand not just what the standard says but also what your compiler/target architecture does for the many implementation defined things.
>Yeah, there are edge cases (important ones even) and you can build god-awful complicated expressions if you want.
You don't have to write long or complicated expressions for things to get tricky. That was the point of this example: http://blog.regehr.org/archives/482 It's a simple function yet mainstream compilers got it wrong for years.
I don't consider these things as edge cases because they come up all the time and have caused countless serious bugs in real world C code.
>By not-doing-stuff-behind-your-back, I mean that you can map the code you write into machine instructions in many cases.
That is becoming less and less true with modern compilers. Vectorizers will kick in at different optimisation levels and depending on various heuristics that I'm not sure even the compiler authors would be confident in predicting for more complex code. They can perform a lot of complicated transforms. Undefined behaviour means lots of code can be modified in fairly unintuitive ways.
>An optimizer will move stuff around on you but it is typically local(-ish) manipulations
The size of the spec in part reflects the intensity and vastness of real world usage, having to get into painstaking detail regarding otherwise ignorable issues. It agonizes over minutiae because enough people are going into those dark corners that rules and expectations must be laid out for a profusion of obscurities.
If you have a relatively small number of users who understand the gist of the language, it can be expressed in a few pages.
If you want the language to be useful beyond a trivial description, you'll have to add some complexity, which leads to weaknesses.
If your language becomes world-scale popular, you're going to have to spend specification space dealing with things like explaining that under certain conditions yes in fact a Boolean variable can have a value other than "true" or "false".
1. Your first example ends with "C may be a small language, but it’s not a simple one."
By your argument, it feels like you'd say the language is large because you have to understand that rule if you really cared about consistent results everywhere. I agree with the author -- small but not simple.
Platforms differ and it leaks into the language precisely because it is such a simple language. It is simple like HTML 1.0 is simple. It is simple like the Bill of Rights is simple. There are a limited set of rules but there a lot of undefined behavior as a result.
I guess you believe HTML is complex because you have to understand CSS these days to do anything.
2. I find it interesting that you use what is clearly an edge case and then argue that because they are common it is not an edge case.
The example of "int foo(char x) { char y = x; return ++x > y; }" is almost the textbook example of an edge case. Seriously, don't trust me. Ask around and see if you can find 10 people who know C well that would consider this mainstream (excluding embedded developers).
There are countless serious bugs in real world C code because (a) there is so much damn real world C code and (b) it doesn't exactly protect from shooting yourself in the foot.
My experience working with a big program that ran on Windows, 2-5 flavors of UNIX, and VMS (both DEC and Alpha) is that the bulk of the real world errors in C code do not have anything to do with undefined behavior across platforms. They have to do with memory management (null pointers, buffer overruns, etc) and poorly written macros.
3. I agree with you that adding optimizers introduce a whole set of things you have to hold in your head that push you into 'large' territory. Just like programming on a GPU makes you rethink everything about how you organize code and writing for embedded code has its own set of rules.
But how does that make the language large?
My point is that a language that says "we manage memory on your behalf inside of a VM" is doing a lot more for you.
Boy, are we beating this thing to death or what...
...
I like C quite a bit but I wouldn't want to make a living programming in it today. I drop back into C when I have a compute kernel that needs it but 99% of the code remains in C#. C is small but too simple for the problems that I'm solving today.
But the article contains many examples of the compiler doing things behind your back. For example:
> The answer depends on whether the optimizations are turned on. If they are then the answer is 3 (the first definition is inlined at all occurrences until the second definition). If the optimizations are off, then the first definition is ignore (treated like a prototype) and the answer is 4.
Even if you don't do something like that a modern C compiler will do all sorts of things behind your back. Like removing null checks[1] or pointer overflow checks[2]
However it's going to take some time to learn all the gotchas in string functions... Failing to zero-terminate strings (strncpy), potential buffer overflows (many), global internal buffer and overwriting an argument (strtok), etc etc etc etc
To get an even better explanation on what the word "simple" means and what it derives from, I would recommend "Simplicity Ain't Easy" by Stuart Halloway[1]. While it is very similar to "Simple made Easy", it focuses a lot more on the etymology of the words "simple" and "complex" and how people misuse the word.
Someone else at Cal summarized C succinctly when he wrote "C is the machete of programming languages. It's great for clearing a quick path through the jungle of programming problems, but be alert as there are no safety mechanisms to protect the wielder of this wonderful weapon."
-David Patterson
http://www.informit.com/promotions/promotion.aspx?promo=1389...
These little teasers are starting to annoy me. Congratulations, you can twist C to be as unintuitive as you'd like it to be. But why would you write code like that to begin with? I don't care what value __ return -3 >> (8 * sizeof(int)); __ returns because no sane program anyone contributes to will have constructs like that.
C is like chess. There are few simple rules (compared say to C++ spec). But knowing the rules doesn't mean you'll end up beating Kasparov. It still takes skill and practice to be a good C programmer.
So, the language itself, is pretty simple compared to other popular programming languages as far as it has a simple syntax. I can teach someone the rules of chess pretty quickly, It doesn't mean I'll create a chess grand-master in a day or two.
I disagree. C comes with a lot of edge cases and subtleties that can surprise even people like myself who 'know' C.
Some examples: Exact semantics of restrict, C99 inline semantics (eg I wasn't aware that it's possible to make a non-extern/non-static inline definition into an extern one with a single redeclaration), effective typing rules and whether it's possible to circumvent them, whether or not it's undefined behaviour to cross boundaries of the sub-arrays of a multi-dimensional array if it doesn't happen in a single expression, ...
The difference is between gimmicky ways one can use ambiguity to produce convoluted edge cases vs knowing enough to do actual work. Sure C being low level and and having a relatively short specification is ripe for compiler specifics and hardware specific hacks. But still think it has the basic core defined and that is pretty small.
With C++ and its typical libraries it is a bit different. It has so many features (templates, classes, streams, friends, shared pointers, unique pointers, distructors, constructors, polymorphism rules, and combination of those) that code gets complicated without using obscure features, just sticking to the standard ones gets hairy and needs someone who knows the whole spec.
The difference is that I don't want my programming language to take me a lifetime to become reasonably skilled with it. There are many languages that are simpler than C, and easier to use as well.
C is not a simple language. It is confoundingly difficult for a machine to parse. Add in the fact that it pretty much requires a preprocessor to be useful and it get's even more difficult. Clearly the article is not about whether the C paradigm is complex. It obviously isn't, because it doesn't come with a whole lot of pre-baked abstractions. But for parsing reliably, it's an absolute nightmare.
tl;dr: C is simple. Bad code is bad. Shitty compilers are shitty.
This needs to be qualified. Simple compared to what? C is simple when compared to C++, Java, and many other languages. Obviously, C is simple because it lacks syntactic sugar, classes, polymorphism, templating (generics), memory management, etc. So, uh, yeah. It's simple.
The fact that bad code can be written in a language doesn't really make the language non-simple. I could write bad code in any language (often times I do!) - so I'm not sure how any judgment about a language can be made with these kinds of examples. As far as the GCC/VC examples are concerned, they are a non-issue. Shitty compiler keywords are shitty[1]. We know. This is one of the many pains of writing cross-platform code in compiled languages. These examples are contrived and I highly doubt most come from production code.
Signed/unsigned conversions and conversion to/from pointers of functions and arrays cause real, production-code confusion. In terms of how much mental overhead the language takes up - how many edge cases you have to keep in mind to read code in the language (because if you're programming right then you spend more time reading than writing) - no, it's not simpler than Java, because classes, generics and memory language are more straightforward and less confusing than C's typing rules.
Languages have a complexity budget, and C blew its on a squillion different kinds of integer and a bunch of arbitrary-seeming rules to minimize the amount of typing needed to change between them. That was a good tradeoff in its day, where hand-optimization was practical and program source needed to be small. It's not an appropriate language to use now outside of very specific circumstances.
You say this as if Java doesn't have its fair share of oddities. It seems you're saying that the highly granular typing found in C doesn't have an equally-annoying analogue in Java; it does[1].
That's not a corner case, it's a particular instance of a general Java misfeature. In Java, "==" is an abomination that you must never use and will give essentially random results; "equals" is what you use to compare things for equality, and this is consistent throughout your codebase. Is it annoying? Yes. But it's easy to remember because it applies everywhere; any case of "==" in a java source file sticks out like a sore thumb. (Also any use of an array, anywhere, for anything. Sometimes I think I should write "Java: the good parts").
C is simple like the game Go is simple. Both are defined by (comparatively) simple rules, yet the possible situations that both present are far more complex than languages or games defined by far more complex rules.
No, assembly languages are not simpler, except for a few processors, mostly didactic.
Programming in assembly generally requires understanding registers, how the processor works, lots of branching variation, a few other ways to loop, memory organization on the processor, and at least tenths or hundreds of different opcodes, some with surprisingly subtle differences between them.
C is not simple because its BNF is not simple and hard to interpret. You can have a simple (feature wise) C like language with a much cleaner syntax. ie Pascal, Modula, Oberon. You can simplify it a lot just by dropping += :)
> Why does the following code return 0 and not -1?
This questions appears twice in the article.
Note that shifting a 32-bit integer (no matter the sign) by 32 is undefined behavior (§6.5.7, 3). I'm using Clang, and its `int` has 32 bits even on a 64-bit system.
To the compiler, it shouldn't care where a label exists. If you're telling it to go to label L:, execution will jump there. GCC appears to look out for the programmer and always prints an error (and I don't see an option to shut it off). Oracle Studio compiles it with no issue and the above statement returns 1.
I just slapped it into Xcode and compiled using LLVM 4.2 (-std=gnu99), and it works fine. Using Xcode's LLVM GCC 4.2 setting (-std=gnu99 by default), it chokes.
Conclusion: I kind of lost interest at this point. :-) I don't quickly know how to get it to compile using GCC, though.
There isn't a single feature in C that isn't present in more safer languages like Modula-2 or Turbo Pascal. They are as powerful, or even more, than C.
Just because C won the battle with those languages, it does not mean we need to live with its design issues ad eternum.
* Eliminate sources of undefined behavior
* Remove implicit type casting
* Have a way to check the hardware overflow flag from the language
* Saner syntax for declaring variables (i.e. function pointers)
* get rid of null-terminated strings
* a module system instead of the preprocessor
• Undefined behavior avoids massive performance penalties on hardware that wouldn't match the defined behavior, so it's a feature and unlikely to go away (compilers may warn you though).
Yeah, with luck they will be part of C++17, you just need to wait 4 years for them to be defined and then around 5 more for all major compilers, across all OS to support them.
They are not even being discussed for the next C standard, and thus similarly to C blocks, it will remain a clang language extension.
1. Undefined behavior is because not all hardware is the same
2. Pedantic but it's nul terminated, null is something either the same or completely different depending on the implementation. Also what would you recommend for non null strings? passing a struct around of *s and size_t len, _that_ is a horrid idea.
C is a small language. C is gussied-up assembly. It is the ballet of programming language.
People who say C is simple really mean that it doesn't do anything behind my back.