The Derivative of a Number

scythe · on Aug 19, 2014

Here's a lemma that helps to prove D(n) is well-defined:

LEMMA: D(pq•r) = D(p•qr):

D(pq•r) = pq D(r) + r D(pq)

= pq D(r) + rp D(q) + qr D(p)

= p (q D(r) + r D(q)) + qr D(p)

= p D(qr) + qr D(p)

= D(p•qr)

There are a number of generalizations of the notion of "derivative"; in particular, you can take the derivative of a regular expression:

https://github.com/scythe/dreg/blob/master/re-deriv.pdf

...which I've been slowly turning into a regex matcher.

gergoerdi · on Aug 20, 2014

If you're interested, regex matching via derivatives have been done before, see e.g. http://matt.might.net/papers/might2011derivatives.pdf

martincmartin · on Aug 20, 2014

Barbeau was a professor of mine back in the late 1980s, great to hear that he's still around. It was a wonderful class, with Ravi Vakil [1] and Nima Arkani-Hamed [2] in it.

[1] http://en.wikipedia.org/wiki/Ravi_Vakil [2] http://en.wikipedia.org/wiki/Nima_Arkani-Hamed

adamgravitis · on Aug 20, 2014

He taught our Calculus II course a few years ago, which had rather famous exams.

A couple years later I was poking around the library and found a book he'd just published entitled something like "50 challenging undergraduate mathematics problems", of which I was annoyed to recognize several :-)

dead10ck · on Aug 20, 2014

I feel like I'm missing something. Isn't the derivative of a constant just defined to be 0? Why does the definition in the article restrict it to primes?

martincmartin · on Aug 20, 2014

You're right, it's not the normal definition of derivative. He's defining a function on numbers that has some similarities to the derivative (the multiplication rule is the same, and therefore so are some other properties), but isn't the derivative.

It's common in mathematics to take a word that already means something, and re-use it for a different but related concept. For example, in group and field theory, you talk about addition and multiplication, but they're not necessarily the standard definition.

robzyb · on Aug 20, 2014

> It's common in mathematics to take a word that already means something, and re-use it for a different but related concept. For example, in group and field theory, you talk about addition and multiplication, but they're not necessarily the standard definition.

That's true, but I feel like the article mislead me because it didn't give me a proper and timely explanation of what it means by "derivative".

thomasvarney723 · on Aug 20, 2014

I find I have the same problem explaining computer technologies to people.

j2kun · on Aug 20, 2014

This article is defining a new notion which is different from the derivative of a function. It's called derivative because of its inspirations and similarities to the classical idea of a derivative. It's interesting because it appears to have close relationships to important open problems in number theory, such as the Goldbach conjecture.

And it's not restricted to primes, it's just defined first for primes and then extended to all integers later.

glandium · on Aug 20, 2014

What is confusing is that the article introduces both derivatives (of a function and of a number) as D, when both are definitely not the same thing.

EDIT: that is, the fact that they have the same name and the same symbol is what's confusing.

tel · on Aug 20, 2014

For better or worse that's very common. A big part of learning to read advanced math is keeping track of the "kind of thing" each variable is meant to be. If you do that, then overloading D is a convenience which illustrates similarities.

But if you're not a professional mathematician then such overloading is terrifically confusing. Actually, I'm fairly certain it still is even for professional mathematicians.

dead10ck · on Aug 20, 2014

Ahh, I see. That makes more sense.

However, I'm not sure I understand how the definition extends to all numbers. Edit: nvm, I'm dumb.

nilsimsa · on Aug 19, 2014

Shouldn't derivatives by respect to change of another variable. For example d/dx of f(x).

tel · on Aug 19, 2014

That's the intuition used to develop the concept, but it becomes increasingly difficult to apply that intuition to in more exotic locales.

Thus, it's important to eventually seek out more abstract ways of characterizing the derivative (and integral). In more advanced mathematics, you usually state that the derivative is any operation which follows two rules

    1. Linearity, d(ax + by) = a d(x) + b d(y)
    2. The Product Rule, d(xy) = x d(y) + d(x) y

and then try to squeeze things until that operation is defined uniquely.[0]

Likewise, it's often valuable to define integration as nothing more than the relationship such that

      I(region, derivative(quantity)) 
    = I(boundary(region), quantity)

which is known as the Generalized Stokes Rule. It basically is the "Fundamental Theorem of Calculus" on steroids and it gives a characterization of integration in terms of nothing more than it's algebraic/topological relationship with derivation... which is itself abstracted as mentioned above.

---

Why do all this? Because you can squeeze most of Calculus so that it depends only upon this "abstract interface" and then apply things you learned from calculus all over the place.

---

Finally, note that this is more like a "proposed" derivative than "the" derivative on natural numbers. The author notes that linearity fails, for instance. Thus, some intuition might "port over" but we shouldn't expect too much of it to do so.

Which echoes back to your original question—there's not really a notion of instantaneous change for us to be talking about... so how much sense does it make to talk about a derivative here?

Apparently, more than no sense at all, but less than you might want.

[0] Note that all we need to state this property of the derivative is a notion of multiplication and addition. This structure is, at its most abstract usually called a ring (but can be made even weaker if needed). An example "exotic" ring might be concurrent processes. If P and Q are two processes then P*Q is P "followed by" Q and P + Q is P and Q "together". Can we write a derivative here? Who knows? (As another comment in this thread suggests, this kind of formulation can be used to consider the "derivative of a grammar" to be a parser! It's also well-known that the derivative of an algebraic data type is its "zipper"!)

gus_massa · on Aug 19, 2014

I agree, there are usefull generalization of the derivative. But an important detail is that the operator that is discussed in this post is not linear, i.e. D(x+y) != D(x) + D(y), so it doesn't have all the expected ("intuitive") properties that the usual derivative has.

I don't know the origin of this notation, but I can make up a possible explanation. If you think that every prime is a function of an abstract variable x, so 2 = two(x) = 2 + x, 3 = three(x) = 3 + x, 5 = five(x) = 5 + x, ... evaluated in x=0.

Then, for example, 60 = sixty(x) = two(x) * two(x) * three(x) * five(x) = two(x)^2 * three(x) * five(x)

A number is a function of the primes of the factorizations. You must change the primes into the functions, but you must leave alone the exponents.

Then D is the standard derivative, plus evaluation.

D(60) = sixty'(x) = 2 * two(x) * three(x) * five(x) + two(x)^2 * five(x) + two(x)^2 * three(x) = 2 * 2 * 3 * 5 + 2^2 * 5 + 2^2 * 3 = 60 + 20 + 12 = 92

(I'm mixing the functions and the values when they are evaluated. It's usually not a good idea. If you do that in a Calculus exam the TA will be rightfully angry. But the notation in only text is horrible, so please forgive the technical details.)

With this idea if you have numbers A, B, C such that A = B * C, then A(x) = B (x) * C(x). But if A = B + C then A(x) != B(x) + C(x). This "explains" why the operator D follows the multiplication rule, but not the sum rule.

For example, ten(x) = two(x) * five(x) and six(x) = two(x) * three (x) then sixty(x) = two(x) * two(x) * three(x) * five(x)

But fifteen(x) = three(x) * five(x) and ten(x) = two(x) * five(x), but twenty-five(x) = five(x)^2 and clearly three(x) * five(x) + two(x) * five(x) != five(x)^2.

Edit: Don't take this "explanation" very literally, for example this idea doesn't extend to D^2 directly

D^2 (p^2) = D(p+p) = D(2p) = D(2)p+2D(p) = p+2

D^2(p^2) = D(p+p) != D(p)+D(p) = 2D(p) = 2

D^2(7^2) != (seven(x)^2)'' = (seven(x)+seven(x))' = seven'(x)+seven'(x) = 2seven'(x) = 2

The implicit transformations of numbers into functions and the evaluations in 0 make cause many problems.

JadeNB · on Aug 20, 2014

> But an important detail is that the operator that is discussed in this post is not linear, i.e. D(x+y) != D(x) + D(y), so it doesn't have all the expected ("intuitive") properties that the usual derivative has.

I think some care is appropriate here. The property you quote is additivity, not linearity; for linearity, one would like a ground field (or a ground ring if one is discussing linear maps between modules, I suppose). Since one is not considering any vector space / module structure on the natural numbers, this may hint why linearity (or even its weaker sibling additivity) is not thought to be necessary here.

(EDIT: With that said, I like very much your description of transforming numbers to functions.)

AnimalMuppet · on Aug 19, 2014

> That's the intuition used to develop the concept, but it becomes increasingly difficult to apply that intuition to in more exotic locales.

Then maybe it's better to use a different term for things that are different. Maybe it's better to keep the term "derivative" for the rate of change of one thing with respect to another thing, and to let the generalizations that aren't that be called something else.

Chinjut · on Aug 19, 2014

Perhaps. Would you also say "Maybe it's better to keep the term 'sum' for combining the count of two sets, and to let the generalizations that aren't that (e.g., adding integers, adding vectors, adding polynomials, ...) be called something else"?

AnimalMuppet · on Aug 19, 2014

I'd say that you're historically wrong - that "sum" was used for adding integers, and then found to extend naturally to adding vectors, polynomials, matrices, real numbers, complex numbers, quaternions, and so on. If you want to extend it to combining the count of two sets (because you're trying to re-found all of mathematics on set theory), then the word still fits.

Just don't try to make the set theory version the "real" version, and then try to deny the use of the word in other places. Those other places were using it first; you don't have the right to hijack the word.

Chinjut · on Aug 19, 2014

By "count of two sets", I don't mean to invoke set theory in any imposing, modern sense. Just the observation that, historically, we were adding counting numbers (in particular, non-negative ones with such properties as "The sum of x and y is always at least as large as x itself") long before we were adding integers.

Regardless, the point still stands: why would you allow the word "sum" to fit all those uses (disparate, but with a web of family resemblances), but not grant the same to "derivative"?

AnimalMuppet · on Aug 19, 2014

Because it seems to me that the web of family resemblances for "derivative" should include the rate of change of one thing with respect to another, not just that the product rule is satisfied. That is, it seems to me that the attempts to extend "derivative" are extending it to the point that the web of family resemblances no longer fits.

tel · on Aug 20, 2014

Unfortunately, all I can say to comfort that is that choosing "rate of change" as your centralizing analogy for "derivative" has been shown through the history of mathematics to be a great start, but a slow finish.

Frequently mathematics benefits a lot from abstracting to algebra because, at this point, it's purely about how to define elements and operations by their apparent behavior instead of by their metaphor or interaction with a larger idea (such as notions of space, continuity, rate, change... all of those require quite a lot of mechanics to get in place, while algebra is very light-weight).

As an example, there have been a lot of attempts to discretize calculus for computers. Usually, the goal here is to create a scheme of discretization which, in the limit, resembles the smooth computations we'd like to perform. This has been a successful program in practice, but it's known to be fraught with weird edge cases. It's easy to create discretized situations which violate intuition.

Much of the reason these failed is because they attempted to generalize from the notion of "rate of change".

There's also the idea of discrete calculus (not "discretized") which is what you get when you apply the algebraic laws alone to some very standard notions of discrete spaces (oriented simplicial complexes, in particular—the simplest discrete object which "has enough topology" to meaningfully have the algebraic laws of integration applied to it).

What you get in this case is a rich theory of discrete calculus which rederives half of manifold learning and graph theory as a special case. All of the laws follow precisely—and they must, as the entire construction was built to prevent such violations.

Finally, you can examine discrete calculus to find a notion of "rate of change" if you like. But it's alien from that which you might be familiar with from continuous domains. It would have been very difficult to arrive at this point trying to generalize that intuition.

But it's practically inevitable (not to say it's easy, just inevitable) to if you say that you want to take the algebraic structure of derivatives and integration and apply it to oriented simplicial complexes.

Chinjut · on Aug 19, 2014

It seems to me a sum should be at least as large as each of its summands (or rather, it once did). The world paid no heed, and life trudged on. I don't see a need to pick one particular archetypal trait or another and say the word "derivative" (or any other bit of mathematical jargon) mustn't ever be extended by analogy to a situation no longer directly manifesting that trait. A web of family resemblances doesn't depend on any one fiber running through all of it.

It's not as though the similarity of terminology is chosen with intent to confuse; the intent is to illuminate. The name for the generalization is chosen to match its more familiar relative because it is often _useful_ to think in terms of the analogy, imperfect though it be. [It seems humans are such that we would never find our way to powerful abstractions without such overloading; the combinatorial explosion of names would be too great to comprehend.]

scarmig · on Aug 19, 2014

I think the concept of sum predates integers by a substantial margin. It's a bit of a theft to extend it from its true domain of applicability, the (positive) natural numbers.

akjj · on Aug 20, 2014

"Mathematics is the art of giving the same name to different things." - Poincaré

tel · on Aug 19, 2014

Sometimes it's called "derivation". In Grassman algebra you call it the exterior derivative. Ultimately these are all motivated by trying to find "the" derivative in different contexts and eventually greater and greater generalities... So despite the mild opportunity for confusion they really are in a sense each named "derivative".

grayclhn · on Aug 19, 2014

Oh man, are you going to hate operator overloading.

tel · on Aug 20, 2014

To tie some intuition back, here's how

      I(region, derivative(quantity)) 
    = I(boundary(region), quantity)

is just the Fundamental Theorem of Calculus.

What you need to do is take "region" to be an interval of the real line like [a, b]. Now, the boundary of "region" is the set of the end points, {a, b}. Thus, we've now transformed this equation to

    I([a,b], d(quantity)) = I({a, b}, quantity)

If we see I as being a sum, it's clear that the right-hand-side must be the same as a regular sum, though we need to account for the idea that we're summing from a and to b. Ultimately, we do this by negating where we're coming from[1]

    I([a,b], d(q)) = -q(a) + q(b)

Then we just have to recognize I([a,b], _) as the definite integral from a to b

    DefiniteIntegral(a, b, d(q)) = q(b) - q(a)

and this is a statement that the definite integral of a function on an interval [a,b] is the difference of the antiderivative of that function evaluated at the endpoints---the standard fundamental theorem of calculus!

But notice that in this transformation we've destroyed some information. No longer is it so apparent that "boundary" and "derivative" have some kind of kinship. We also cannot easily generalize this notion to higher dimensional spaces (unless we already know the Stokes Law trick).

[1] This is a bit arbitrary. If we chose it the other way the equation would still hold, it'd just not be the standard convention and thus less recognizable. The reason we choose this is because calculus, it turns out, depends upon a notion of orientation. We have to know whether we're going with or against "the flow".

clebio · on Aug 20, 2014

What an excellent reply. All spot on, but worth it just for this bit alone:

> Apparently, more than no sense at all

JadeNB · on Aug 20, 2014

> Likewise, it's often valuable to define integration as nothing more than the relationship such that

    >  I(region, derivative(quantity)) 
    > = I(boundary(region), quantity)

> which is known as the Generalized Stokes Rule.

I like your explanation very much, but I think that calling the integral 'the relationship' (emphasis on the definite article) satisfying Stokes is an overstatement. There are many such; for example, I(_, _) = 0.

tel · on Aug 20, 2014

Generally my whole argument is heavy on existence and light on uniqueness, but it's a great point to emphasize it! :)

kvb · on Aug 20, 2014

Great answer, but on a pedantic note, it's not quite true that the derivative of a data type is its zipper. The derivative is the one-hole context, but the zipper is more like a one-substructure context instead. There's a bit more elaboration here: [1].

[1] http://en.wikibooks.org/wiki/Haskell/Zippers

tel · on Aug 20, 2014

True, I should have been more precise. The zipper just is a technique for using one-hole contexts to make a path as you drill through a recursive type.

nilkn · on Aug 19, 2014

The standard way of generalizing the derivative is to require it to be linear and to satisfy a version of the product rule. See, for instance, derivations:

http://en.wikipedia.org/wiki/Derivation_(differential_algebr...

In the case of this article, the proposed definition is not linear, so it is indeed a bizarre candidate for a derivative.

cperciva · on Aug 19, 2014

In a sense this is d/d{primes}. D(8) is how fast 8 changes with respect to 2, while D(60) is how fast 60 changes if 2, 3, and 5 simultaneously increase (at the same rate).

e12e · on Aug 20, 2014

So... we can use it to determine the age of the universe, if we know at which point in time 6 * 9 = 42 held, by extrapolating back to at what point 6 * 9 = 0? ;-)

[ed: looks like my multiplication signs got eaten by hn]

pfortuny · on Aug 19, 2014

No, it can be an operator on any kind of (usually) ring or more general algebraic structure. What you refer to is the "usual" derivative of functions of one variable, just one of many derivatives one can define.

quotemstr · on Aug 19, 2014

You can even take the derivative of a grammar to get a parser!

http://matt.might.net/papers/might2011derivatives.pdf

kazinator · on Aug 19, 2014

I implemented Brzozowski's regex derivatives to build a regex implementation back-end. That back-end is used whenever exotic constructs (negation, intersection) appear in the abstract syntax of the regex; in their absence, the implementation falls back on the NFA-graph-based back end.

kazinator · on Aug 19, 2014

Yes, this seems to build on the idea of regex derivatives. If regex derivatives can be used to transform a regular expression into a recognizer for strings, why not transform a more general grammar into a recognizer of strings.

sp332 · on Aug 19, 2014

Where would it be used? It seems like declaring that the derivative of every prime number = 1 is entirely arbitrary.

wyager · on Aug 19, 2014

Yes, it is completely arbitrary. However, this arbitrary definition follows certain rules and properties and can therefore be used for certain types of mathematical reasoning.

This is why, when we talk about rings and fields and such we say "multiplication-like" or "addition-like" operators. The operators defined for the algebraic structure may not be exactly like "standard" operators, but they still follow rules and you can still do cool things with them.

pfortuny · on Aug 19, 2014

Not entirely: it is the only way to do it coherently, but it is not so easy to explain.

wyager · on Aug 19, 2014

>it is the only way to do it coherently

There are many consistent ways to define the derivative of a number.

The way we are all familiar with is to define a number as a zeroth-order polynomial.

Chinjut · on Aug 19, 2014

If you like, think of this as first turning a number into the unique monic polynomial with negated prime roots whose value at 0 is that number, then taking the derivative of that polynomial at 0.

...Not that there's any particular reason you should like this.

joeframbach · on Aug 19, 2014

Think of Derivative as a function which inputs a function and outputs a function. When you think "derivative of 5 is 0", you imply "derivative of f(x)=5 is f'(x)=0"

This article seems to define a function called Derivative which inputs a number and outputs a number. In this case, "derivative of 5 is 1" actually translates to "derivative of 5 is 1".

tempodox · on Aug 22, 2014

Wonderfully strange. I had to implement it as a shell command:

https://github.com/tempodox/NumberDerivative