Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What type of Machine is the C Preprocessor? (theorangeduck.com)
62 points by cremno on Jan 27, 2014 | hide | past | favorite | 47 comments


"We imagine Turing Machine memory having an infinite tape which is just there, not having a tape that must be generated as the head moves."

This is one of those times when you're probably better off saying and thinking "unbounded" rather than "infinite". The point of a TM is not that it "literally" (to the extent that term applies anyhow) has "infinite" cells in it, the point is that there is no point in the computation in which it will reach for another cell and be told that there isn't one. (So there isn't even a way to represent that.) Many "infinite" things are better conceived of as "unbounded". For another example, "infinite lists" in Haskell. Obviously, they can not concretely manifest infinitely many cells in memory, but there is no point at which the runtime will reach for "the next cell" and be told it has reached the end of the list.

That is also why it's better to model our real computers with TM math, even though they obviously do not have literally infinite amounts of memory. As long as a software process runs along and is never told that it is out of memory when it asks for more, we get fairly TM-like behavior, or we get a crash if it does run out. By contrast, while the FSM model is in some sense more mathematically accurate, it is less useful for modeling real computers, since it is so weak.

I think this formulation also better intuitively explains why we use infinity so often in our proofs; by saying "and there's always another one if you want it", we remove the case where we have to handle there not being another one. And that case can get quite hairy, as anyone who has watched their machine self-immolate after it discovers it is out of RAM can attest to. :) (It can also be every bit as mathematically tedious to deal with, too.)

The further I got through grad school, the more I said "unbounded" rather than "infinite".


I once read an analogy that said to think of the "infinite tape" of a Turing machine as similar to how physicists think about infinite parallel plates, etc. The infinite case can be easier to treat theoretically, and also give key insights, that might not be attained in the finite case. Even though you have an understanding that any real system is finite, imagining an infinite system is a useful approximation.

And as long as you're operating in a region that's not close to the edge, the edges don't make a difference to the outcome.

Likewise, while real machines are really DFA's since they have a finite amount of memory, thinking of them as having infinite memory is a much better approximation. And as long as you don't "get close to the edges" by e.g. trying to allocate more memory than your machine has available, it works perfectly.


"The further I got through grad school, the more I said "unbounded" rather than "infinite"."

I think there's a huge difference between simply saying "unbounded" compared to going through that detailed explanation you just gave. In the latter case, it's very clear what the difference is. But without the explanation I'd guess that people will think something like this:

    unbounded -> without bound -> infinite
In other words, without the explanation "unbounded" becomes a synonym for "infinite".


"Unbounded" means "any finite number, no matter how large." For many people, this might be what they think of when they hear "infinite," but in mathematics, "infinity" is an entirely different concept.


Even in math "infinity" often means "unbounded". For example, in real analysis if we have a sequence x_n, we write lim x_n = ∞ to mean "given any real number alpha, we can find a natural number K, such that x_n >= alpha for all n > K".


As I mention in another comment, I am specifically referring to the fact that when we computer scientists or discrete mathematicians use the term in proofs, we were more often interested in the "unbounded"ness than something with "actual" infinite cardinality... the fact that I understand the difference is precisely why I tended to start saying unbounded more often. Using infinity to mean unbounded is really too powerful, and a proof should use the least power necessary. (It was only an intuitive feeling at the time; I have a much better understanding of that fact now.)

When infinity is the relevant concept, use it.


This reminds me of the distinction between random and arbitrary. Seems to trip people up sometimes.


The relevant distinction isn't between "bounded" and "finite", but between the size of a container and the size of its elements.

The difference between "bounded" and "infinite" is the difference between "measure" and "cardinality". That is , "bounded" refers to a notion of distance, while "finite" refers to a measure of count. There are bounded infinities (such as the set of all rational or numbers between 0 and 1), but there are no unbounded finities.

A container that contains finite objects of all sizes must be an infinite/unbounded container.


Depending on the context unbounded and infinite may mean very different things, for example the surface of a sphere being unbounded but finite.


That's why I didn't simply drop the word infinite. There are cases where it is appropriate. What is the cardinality of the natural numbers? Infinite... "unbounded" is definitely not right there. But when we use infinity in proofs, we are more often using it to mean "unbounded" rather than truly a "manifested infinity", to use a sloppy phrase; we're using it to get rid of the nasty edge case, not because we expect a real implementation of whatever the proof is proving to actually require infinite RAM.

Also, by "we", I should clarify I mean discrete mathematicians and computer scientists, who are usually dealing in the discrete case. In the continuous case, infinities come up "for real" more often.


A sphere is bounded but infinite*

http://en.wikipedia.org/wiki/Compact_space

*Infnite cardinality of points, not infinite in measure.


I don't think there's a specific name for this kind of Turing Machine, but it looks to me like it's primitive-recursive, similar to Hofstadter's BLooP ('bounded loop') language. It's basically the difference between for loops, which specify up-front how many times to loop, and while loops, which keep going until some dynamic condition is met. For loops are total, while loops are Turing complete. Note that these are the CompSci definitions of for and while; most languages with "for" syntax are actually implemening while loops.

There are some interesting kinds of Turing Machine which use limited tapes, rather than limited transition tables. For example, a monotone Turing machine can have heads which only move one way. They're useful for modelling demand-driven input and output (one read-only, monotone input tape, one write-only, monotone outpu tape and one read-write non-monotone work tape). Other machines that I've seen in research are 'enumerable-output machines', which can only edit their previous output if it ends up lexicographically higher, and 'Generalised Turing Machines' which can edit their output as long as each bit takes a finite number of steps to 'stabilise'. These machines have been investigated by Jurgen Schmidhuber, among others.


I agree. After reading in the article that:

> The main argument against Turing completeness was that my implementation did not support unbounded recursion.

I’m almost sure they are Primitive recursive function, but I didn’t have time to write the complete proof. http://en.wikipedia.org/wiki/Primitive_recursive_function#Li...


From a previous article of the same author: http://theorangeduck.com/page/c-preprocessor-turing-complete

> This is a fairly clear distinction and a compelling argument. Clearly there are languages where you can effectively express unbounded recursion function() { function() } and the C preprocessor is not one of them.

>* But there are a number of subtleties going on here. For example, if we consider the set of macros I created to be the "machinery", and the "language" to be the Brainfuck input to my system, then it is indeed Turing complete. That is - I have created machinery which can simulate Turing complete languages. Taking all the above definitions into account, consider how odd it is that Turing complete machinery can be expressed in a non Turing complete language.*

And from the “Brainfuck interpreter written in the C preprocessor”: https://github.com/orangeduck/CPP_COMPLETE

> Currently the maximum recursion depth is set to around 1000 and the data array size around 100. These can be easily extended but for now, as a general rule of thumb computations exceeding 1000 steps may not run.

Here is the problem. We can consider the one-step-interpreter: it’s a function that has as arguments a program, the current instruction index and the whole memory state, and this function computes the next instruction index and the next whole memory state. This one-step-interpreter is primitive recursive. He put that function inside a bounded loop, which is also implementable as a primitive recursive function.

So essentially he created an interpreter that runs a program for at most a fixed number of steps, where this number is an argument of the function (or worse, in this case a constant). Interpreter(Program, Memory, MaxSteps) is a recursive primitive function. To be Turing complete, he needs to write InterpreterForEverIfNecesary(Program, Memory).

With a fixed MaXSteps value, it can’t compute all recursive primitive function.


This situation crops up in total languages like Coq. For any iterative process, like the stepping of a Turing Machine, we can create an infinite stream of iterations, eg.

Cons (tape, head, state) (Cons (tape, head, state), (Cons ...)))

But since our program is total, we can only extract the first N states, for some finite N. Compare this to a Turing Complete language like Haskell, where we can also define an infinite stream of iterations, but we can also traverse this stream indefinitely.


"What we are missing is the ability to re-enter old states."

That's deadly. Any machine with a finite number of states that cannot re-enter a state it has previously been in must necessarily halt on all input. That makes it strictly less powerful than even an FSM.


It is worth noting that must necessarily halt on all input is quite a desirable property in a macro language evaluated at compile time.


That is arguable. All else being equal it might be true, but all else is not equal. The benefit of never having your compiler enter an infinite loop may not be worth the cost in terms of lost expressive power.


Has it been proven that the ability to write undecidable programs confers expressive power such that you can't have one without the other? It seems self-evident that running forever can't be useful, so you should be able to say WLOG that useful programs always halt. So the set of all useful programs is strictly smaller than the set of Turing machines. Is there a formal reason why Turing machines are desirable?


> Has it been proven that the ability to write undecidable programs confers expressive power such that you can't have one without the other?

Yes.

> It seems self-evident that running forever can't be useful

Really? Do you not think that it might be useful for, say, an operating system to run forever?


>> Has it been proven that the ability to write undecidable programs confers expressive power such that you can't have one without the other?

Yes.

Can you provide a reference?

>> It seems self-evident that running forever can't be useful

Really? Do you not think that it might be useful for, say, an operating system to run forever?

It's nice when the OS has the option of shutting down deterministically.


Do you think it is useful to implement an operating system entirely at compile-time, in an implementation which uses separate compile and run phases?


I'm not sure I follow what you're talking about. Operating Systems are tangential to the point, though. When I said all useful programs halt, the kinds of programs I meant are the kind used to prove things about computation. Those programs are simplified abstractions of real programs, and they generally compute one result and then halt (if they halt). Being simple makes those programs easier to use in proofs than real programs. However, the proofs still generalize to real programs; a real-world program with multiple functions is equivalent to several single-function programs glued together together.

So my point in saying that all useful programs halt was really to argue that all useful programs must have the constraint that they their behavior is always predictable, or controllable, in some sense. In a trivial way, computers are always predictable, but Turing proved that there are cases where it's impossible to predict the eventual outcome of a program in advance, i.e., the halting problem. In precise terms, the outcome (halt/not halt) of some programs is said to be undecidable or uncomputable. Undecidability is a hallmark of Turing machines. But since real programs are written by humans, who can't solve undecidable problems, real programs will never be able to gain any advantage from undecidable behavior.


It's still a desirable property, it's just that it's a property in essential tension with another desirable property.


On the other hand, in Lisp/Scheme/... language family you have a Turing complete language for macro language, Lisp/Scheme/... respectively. In those languages, macros are powerful and must be used with care. The advantage is that complex macros are easy to write.


It is not that simple. I have a little problem with finding comprehensive definition of Turing completeness - in order to decide if certain machine is turing complete does one have to prove that it can also simulate infinite, not halting algorithms?

If yes, then not being able to re-enter old states disqualifies C preprocessor as a turing complete machine by considering simple machine that goes left and right till the end of time. However, is it really an algorithm? If there is no output one could simply replace that with just empty machine that halts for sure, therefore it can be simulated in C preprocessor.

On the other hand, if this is not requirement then we know that every algorithm we have to simulate to prove turing completeness halts eventually, there is no ability to re-enter old states but if algorithm halts we could just add enough more states (finitely many) that allows it to finish.


I would look at it a slightly different way. It's not that the definition of Turing completeness explicitly says that a language must be able to simulate "infinite" algorithms. It's that it must be able simulate what a Turing machine can do. Turing machines are quite capable of running non-halting algorithms, so the language must be able to as well.

Contrived example: One algorithm a Turing machine could perform would be to write the characters "TM" every two cells on its entire tape. The tape is unbounded, so this algorithm is, colloquially speaking, infinite. Clearly, you cannot simulate this with a halting machine like the C preprocessor.


This is exactly the problem I was talking about. In every definition I've read there was strict condition that set of all possible states Q is non-empty (and finite). But there is difference in definition of set of final states. In [1] there is a comparison - in columns labeled "Peter Linz" and "Wikipedia" F is defined as subset of Q. This implies that it also could be empty, although in column labeled "Michael Sipser" there is a special state q_accept that explicitly exist in Q, therefore F is not-empty. Unfortunately there is no information if q_accept has to be reachable from start state.

But consider this: How one could say that this machine is actually writing "TM" every two cells? If you investigate tape states while machine is still running it might change (on meta-level you my expect that this is actually true but you might for example make some bug that you're not aware of). The only way to be able to see that "TM" has been written is after the machine halts.

[1] http://www.cis.upenn.edu/~matuszek/cit596-2012/NewPages/turi...


> I have a little problem with finding comprehensive definition of Turing completeness

http://lmgtfy.com/?q=turing+completeness

> in order to decide if certain machine is turing complete does one have to prove that it can also simulate infinite, not halting algorithms?

Yes.


> Yes

And you're saying this based on what? Thanks for googling this for me, although the key-word was "comprehensive" not any definition.


Ah, then what you're looking for is a comprehensive description, not a comprehensive definition. (I gather your native language is not English?) For that you will have to read a book on computability theory (at least -- it depends on just how "comprehensive" you want to get).

> And you're saying this based on what?

Based on the definition of Turing-completeness, and the (trivial) fact that there exist Turing machines that do not halt. A Turing-complete system is, by definition, a system that can emulate any Turing machine. Because some Turing machines don't halt, any system that never halts cannot emulate a Turing machine that does not halt, and so by definition is not Turing-complete.


You're right, maybe I do not understand because of different meaning of those words in english, which is not, in fact, my native language, but let me try explain you what I think.

From wikipedia we read:

Turing completeness

A computational system that can compute every Turing-computable function is called Turing complete (or Turing powerful). Alternatively, such a system is one that can simulate a universal Turing machine.

From the first part I think compute means that from some input A there is output B. It is later explained in 'computable function' article:

Each computable function f takes a fixed, finite number of natural numbers as arguments. Note that the functions are partial in general, i.e. they may not be defined for every possible choice of input. If a computable function is defined for a certain input, then it returns a single natural number as output (this output can be interpreted as a list of numbers using a pairing function).

So if computable function is returning something that means it halts. We are able to compute every halting algorithm in preprocessor hence it is turing complete.

Second part of this definitions says that one has to simulate turing machine (and uses the word 'alternatively' so basically it has to be the same but I can go anyway). I think that 'simulate' in this context mean 'producing same output' rather than 'doing step by step the same'. Internal tape states at certain point in time should not matter because while running they might change (or if they matter we could just return list of pair {state,time} at the end).

There is no doubt that non-halting turing machines exists but I'm not sure if that apply to this particular problem anyhow since they are not computing anything.


> Second part of this definitions says that one has to simulate turing machine (and uses the word 'alternatively' so basically it has to be the same but I can go anyway). I think that 'simulate' in this context mean 'producing same output' rather than 'doing step by step the same'. Internal tape states at certain point in time should not matter because while running they might change (or if they matter we could just return list of pair {state,time} at the end).

> There is no doubt that non-halting turing machines exists but I'm not sure if that apply to this particular problem anyhow since they are not computing anything.

The exact steps of a Turing Machine don't need to be emulated exactly, only the output needs to be the same. However, it's trivial to turn your back-and-forth example into a Turing-computable function:

Let's call your back-and-forth machine "M1" and, without loss of generality, let's say it starts at cell 0, moves to cell 1, then back to cell 0. We can then define a new machine "M2" which outputs a list of M1's head positions at each step, based on a direct simulation of M1. In other words, M2 will output 01010101010101.... forever, never halting.

The C preprocessor cannot simulate M2, precisely because it cannot simulate M1.

Also, note that we don't need to wait for a Turing machine to halt before we read its output. There are many useful programs which should not halt, like servers and operating systems.


> I think that 'simulate' in this context mean 'producing same output' rather than 'doing step by step the same'.

That is correct. But "producing the same output" means not halting when a TM does not halt. Otherwise it is not the same output.

But halting is actually a red herring in this case. Because the number of states that a cpp-machine can visit is finite, it is easy to construct a TM (or even an FSM) that halts on all inputs but produces a different output than any given cpp machine for some input. For example, a TM can do bignum arithmetic that halts on all inputs, but for some input it must be the case that such a TM will produce a different output than any given FSM, and hence any given cpp-machine, since cpp-machines are strictly less powerful than FSMs.


Wouldn't it simply be a linear bounded automaton (LBA)? I understand you can implement recursion via some tricks but only as many steps as you predefine (see e.g. https://stackoverflow.com/questions/7508475/is-c-preprocesso...) This reeks like an LBA to me.


My first instinct is that you're right- at least, I don't think it can be any more powerful than an LBA: if there's a finite number of state transitions it can make, then it can only actually "get to" a certain distance along the tape, and you can just chop the tape at that point.


Nope, a cpp-machine is strictly less powerful than an LBA, or even an FSA. An LBA (and an FSA) can enter an infinite loop. A cpp-machine can't.


A few months ago I got fed up with losing track of what I had written in C. So I knocked up some python scripts to write templates for me, it displays the date and time the file was created and the filename in a comment. This concept can be taken much further.

CPP has been known to be abused by programmers as a general text processor, has anyone else started using another language to help them write C?


C++ and Objective-C both began as preprocessors for C.


Wow, it's surprising what you can do with it.


Just to clarify a bit, C++ and Objective were not written in the C pre-processor, but rather implemented as their own pre-processor program, analogous to Qt's moc.


Yes like Cfront [1]. I used that back in the day, the pain of it caused me to rewrite all my C++ in C though :p Call me a coward!

[1] -- http://en.wikipedia.org/wiki/Cfront


> Just to clarify a bit, C++ and Objective were not written in the C pre-processor, but rather implemented as their own pre-processor program, analogous to Qt's moc.

I bet these pre-processor programs were themselves written in C.


You mean like $ SCCS %G% %M% %W% $ ?


Good times. I used to have emacs mode hooks that ensure that "$Id$" was written into files. I don't miss that.


In what I considered minor blasphemy one of my friends wrote git-hooks to keep using them.


Cheers, your comment has just fried my neurons.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: