Barbeau was a professor of mine back in the late 1980s, great to hear that he's still around. It was a wonderful class, with Ravi Vakil [1] and Nima Arkani-Hamed [2] in it.
He taught our Calculus II course a few years ago, which had rather famous exams.
A couple years later I was poking around the library and found a book he'd just published entitled something like "50 challenging undergraduate mathematics problems", of which I was annoyed to recognize several :-)
I feel like I'm missing something. Isn't the derivative of a constant just defined to be 0? Why does the definition in the article restrict it to primes?
You're right, it's not the normal definition of derivative. He's defining a function on numbers that has some similarities to the derivative (the multiplication rule is the same, and therefore so are some other properties), but isn't the derivative.
It's common in mathematics to take a word that already means something, and re-use it for a different but related concept. For example, in group and field theory, you talk about addition and multiplication, but they're not necessarily the standard definition.
> It's common in mathematics to take a word that already means something, and re-use it for a different but related concept. For example, in group and field theory, you talk about addition and multiplication, but they're not necessarily the standard definition.
That's true, but I feel like the article mislead me because it didn't give me a proper and timely explanation of what it means by "derivative".
This article is defining a new notion which is different from the derivative of a function. It's called derivative because of its inspirations and similarities to the classical idea of a derivative. It's interesting because it appears to have close relationships to important open problems in number theory, such as the Goldbach conjecture.
And it's not restricted to primes, it's just defined first for primes and then extended to all integers later.
For better or worse that's very common. A big part of learning to read advanced math is keeping track of the "kind of thing" each variable is meant to be. If you do that, then overloading D is a convenience which illustrates similarities.
But if you're not a professional mathematician then such overloading is terrifically confusing. Actually, I'm fairly certain it still is even for professional mathematicians.
That's the intuition used to develop the concept, but it becomes increasingly difficult to apply that intuition to in more exotic locales.
Thus, it's important to eventually seek out more abstract ways of characterizing the derivative (and integral). In more advanced mathematics, you usually state that the derivative is any operation which follows two rules
1. Linearity, d(ax + by) = a d(x) + b d(y)
2. The Product Rule, d(xy) = x d(y) + d(x) y
and then try to squeeze things until that operation is defined uniquely.[0]
Likewise, it's often valuable to define integration as nothing more than the relationship such that
which is known as the Generalized Stokes Rule. It basically is the "Fundamental Theorem of Calculus" on steroids and it gives a characterization of integration in terms of nothing more than it's algebraic/topological relationship with derivation... which is itself abstracted as mentioned above.
---
Why do all this? Because you can squeeze most of Calculus so that it depends only upon this "abstract interface" and then apply things you learned from calculus all over the place.
---
Finally, note that this is more like a "proposed" derivative than "the" derivative on natural numbers. The author notes that linearity fails, for instance. Thus, some intuition might "port over" but we shouldn't expect too much of it to do so.
Which echoes back to your original question—there's not really a notion of instantaneous change for us to be talking about... so how much sense does it make to talk about a derivative here?
Apparently, more than no sense at all, but less than you might want.
[0] Note that all we need to state this property of the derivative is a notion of multiplication and addition. This structure is, at its most abstract usually called a ring (but can be made even weaker if needed). An example "exotic" ring might be concurrent processes. If P and Q are two processes then P*Q is P "followed by" Q and P + Q is P and Q "together". Can we write a derivative here? Who knows? (As another comment in this thread suggests, this kind of formulation can be used to consider the "derivative of a grammar" to be a parser! It's also well-known that the derivative of an algebraic data type is its "zipper"!)
I agree, there are usefull generalization of the derivative. But an important detail is that the operator that is discussed in this post is not linear, i.e. D(x+y) != D(x) + D(y), so it doesn't have all the expected ("intuitive") properties that the usual derivative has.
I don't know the origin of this notation, but I can make up a possible explanation. If you think that every prime is a function of an abstract variable x, so 2 = two(x) = 2 + x, 3 = three(x) = 3 + x, 5 = five(x) = 5 + x, ... evaluated in x=0.
Then, for example, 60 = sixty(x) = two(x) * two(x) * three(x) * five(x) = two(x)^2 * three(x) * five(x)
A number is a function of the primes of the factorizations. You must change the primes into the functions, but you must leave alone the exponents.
Then D is the standard derivative, plus evaluation.
(I'm mixing the functions and the values when they are evaluated. It's usually not a good idea. If you do that in a Calculus exam the TA will be rightfully angry. But the notation in only text is horrible, so please forgive the technical details.)
With this idea if you have numbers A, B, C such that A = B * C, then A(x) = B (x) * C(x). But if A = B + C then A(x) != B(x) + C(x). This "explains" why the operator D follows the multiplication rule, but not the sum rule.
For example, ten(x) = two(x) * five(x) and six(x) = two(x) * three (x) then sixty(x) = two(x) * two(x) * three(x) * five(x)
But fifteen(x) = three(x) * five(x) and ten(x) = two(x) * five(x), but twenty-five(x) = five(x)^2 and clearly three(x) * five(x) + two(x) * five(x) != five(x)^2.
Edit: Don't take this "explanation" very literally, for example this idea doesn't extend to D^2 directly
> But an important detail is that the operator that is discussed in this post is not linear, i.e. D(x+y) != D(x) + D(y), so it doesn't have all the expected ("intuitive") properties that the usual derivative has.
I think some care is appropriate here. The property you quote is additivity, not linearity; for linearity, one would like a ground field (or a ground ring if one is discussing linear maps between modules, I suppose). Since one is not considering any vector space / module structure on the natural numbers, this may hint why linearity (or even its weaker sibling additivity) is not thought to be necessary here.
(EDIT: With that said, I like very much your description of transforming numbers to functions.)
> That's the intuition used to develop the concept, but it becomes increasingly difficult to apply that intuition to in more exotic locales.
Then maybe it's better to use a different term for things that are different. Maybe it's better to keep the term "derivative" for the rate of change of one thing with respect to another thing, and to let the generalizations that aren't that be called something else.
Perhaps. Would you also say "Maybe it's better to keep the term 'sum' for combining the count of two sets, and to let the generalizations that aren't that (e.g., adding integers, adding vectors, adding polynomials, ...) be called something else"?
I'd say that you're historically wrong - that "sum" was used for adding integers, and then found to extend naturally to adding vectors, polynomials, matrices, real numbers, complex numbers, quaternions, and so on. If you want to extend it to combining the count of two sets (because you're trying to re-found all of mathematics on set theory), then the word still fits.
Just don't try to make the set theory version the "real" version, and then try to deny the use of the word in other places. Those other places were using it first; you don't have the right to hijack the word.
By "count of two sets", I don't mean to invoke set theory in any imposing, modern sense. Just the observation that, historically, we were adding counting numbers (in particular, non-negative ones with such properties as "The sum of x and y is always at least as large as x itself") long before we were adding integers.
Regardless, the point still stands: why would you allow the word "sum" to fit all those uses (disparate, but with a web of family resemblances), but not grant the same to "derivative"?
Because it seems to me that the web of family resemblances for "derivative" should include the rate of change of one thing with respect to another, not just that the product rule is satisfied. That is, it seems to me that the attempts to extend "derivative" are extending it to the point that the web of family resemblances no longer fits.
Unfortunately, all I can say to comfort that is that choosing "rate of change" as your centralizing analogy for "derivative" has been shown through the history of mathematics to be a great start, but a slow finish.
Frequently mathematics benefits a lot from abstracting to algebra because, at this point, it's purely about how to define elements and operations by their apparent behavior instead of by their metaphor or interaction with a larger idea (such as notions of space, continuity, rate, change... all of those require quite a lot of mechanics to get in place, while algebra is very light-weight).
As an example, there have been a lot of attempts to discretize calculus for computers. Usually, the goal here is to create a scheme of discretization which, in the limit, resembles the smooth computations we'd like to perform. This has been a successful program in practice, but it's known to be fraught with weird edge cases. It's easy to create discretized situations which violate intuition.
Much of the reason these failed is because they attempted to generalize from the notion of "rate of change".
There's also the idea of discrete calculus (not "discretized") which is what you get when you apply the algebraic laws alone to some very standard notions of discrete spaces (oriented simplicial complexes, in particular—the simplest discrete object which "has enough topology" to meaningfully have the algebraic laws of integration applied to it).
What you get in this case is a rich theory of discrete calculus which rederives half of manifold learning and graph theory as a special case. All of the laws follow precisely—and they must, as the entire construction was built to prevent such violations.
Finally, you can examine discrete calculus to find a notion of "rate of change" if you like. But it's alien from that which you might be familiar with from continuous domains. It would have been very difficult to arrive at this point trying to generalize that intuition.
But it's practically inevitable (not to say it's easy, just inevitable) to if you say that you want to take the algebraic structure of derivatives and integration and apply it to oriented simplicial complexes.
It seems to me a sum should be at least as large as each of its summands (or rather, it once did). The world paid no heed, and life trudged on. I don't see a need to pick one particular archetypal trait or another and say the word "derivative" (or any other bit of mathematical jargon) mustn't ever be extended by analogy to a situation no longer directly manifesting that trait. A web of family resemblances doesn't depend on any one fiber running through all of it.
It's not as though the similarity of terminology is chosen with intent to confuse; the intent is to illuminate. The name for the generalization is chosen to match its more familiar relative because it is often _useful_ to think in terms of the analogy, imperfect though it be. [It seems humans are such that we would never find our way to powerful abstractions without such overloading; the combinatorial explosion of names would be too great to comprehend.]
I think the concept of sum predates integers by a substantial margin. It's a bit of a theft to extend it from its true domain of applicability, the (positive) natural numbers.
Sometimes it's called "derivation". In Grassman algebra you call it the exterior derivative. Ultimately these are all motivated by trying to find "the" derivative in different contexts and eventually greater and greater generalities... So despite the mild opportunity for confusion they really are in a sense each named "derivative".
What you need to do is take "region" to be an interval of the real line like [a, b]. Now, the boundary of "region" is the set of the end points, {a, b}. Thus, we've now transformed this equation to
I([a,b], d(quantity)) = I({a, b}, quantity)
If we see I as being a sum, it's clear that the right-hand-side must be the same as a regular sum, though we need to account for the idea that we're summing from a and to b. Ultimately, we do this by negating where we're coming from[1]
I([a,b], d(q)) = -q(a) + q(b)
Then we just have to recognize I([a,b], _) as the definite integral from a to b
DefiniteIntegral(a, b, d(q)) = q(b) - q(a)
and this is a statement that the definite integral of a function on an interval [a,b] is the difference of the antiderivative of that function evaluated at the endpoints---the standard fundamental theorem of calculus!
But notice that in this transformation we've destroyed some information. No longer is it so apparent that "boundary" and "derivative" have some kind of kinship. We also cannot easily generalize this notion to higher dimensional spaces (unless we already know the Stokes Law trick).
[1] This is a bit arbitrary. If we chose it the other way the equation would still hold, it'd just not be the standard convention and thus less recognizable. The reason we choose this is because calculus, it turns out, depends upon a notion of orientation. We have to know whether we're going with or against "the flow".
I like your explanation very much, but I think that calling the integral 'the relationship' (emphasis on the definite article) satisfying Stokes is an overstatement. There are many such; for example, I(_, _) = 0.
Great answer, but on a pedantic note, it's not quite true that the derivative of a data type is its zipper. The derivative is the one-hole context, but the zipper is more like a one-substructure context instead. There's a bit more elaboration here: [1].
True, I should have been more precise. The zipper just is a technique for using one-hole contexts to make a path as you drill through a recursive type.
The standard way of generalizing the derivative is to require it to be linear and to satisfy a version of the product rule. See, for instance, derivations:
In a sense this is d/d{primes}. D(8) is how fast 8 changes with respect to 2, while D(60) is how fast 60 changes if 2, 3, and 5 simultaneously increase (at the same rate).
So... we can use it to determine the age of the universe, if we know at which point in time 6 * 9 = 42 held, by extrapolating back to at what point 6 * 9 = 0? ;-)
[ed: looks like my multiplication signs got eaten by hn]
No, it can be an operator on any kind of (usually) ring or more general algebraic structure. What you refer to is the "usual" derivative of functions of one variable, just one of many derivatives one can define.
I implemented Brzozowski's regex derivatives to build a regex implementation back-end. That back-end is used whenever exotic constructs (negation, intersection) appear in the abstract syntax of the regex; in their absence, the implementation falls back on the NFA-graph-based back end.
Yes, this seems to build on the idea of regex derivatives. If regex derivatives can be used to transform a regular expression into a recognizer for strings, why not transform a more general grammar into a recognizer of strings.
Yes, it is completely arbitrary. However, this arbitrary definition follows certain rules and properties and can therefore be used for certain types of mathematical reasoning.
This is why, when we talk about rings and fields and such we say "multiplication-like" or "addition-like" operators. The operators defined for the algebraic structure may not be exactly like "standard" operators, but they still follow rules and you can still do cool things with them.
If you like, think of this as first turning a number into the unique monic polynomial with negated prime roots whose value at 0 is that number, then taking the derivative of that polynomial at 0.
...Not that there's any particular reason you should like this.
Think of Derivative as a function which inputs a function and outputs a function. When you think "derivative of 5 is 0", you imply "derivative of f(x)=5 is f'(x)=0"
This article seems to define a function called Derivative which inputs a number and outputs a number. In this case, "derivative of 5 is 1" actually translates to "derivative of 5 is 1".
LEMMA: D(pq•r) = D(p•qr):
D(pq•r) = pq D(r) + r D(pq)
= pq D(r) + rp D(q) + qr D(p)
= p (q D(r) + r D(q)) + qr D(p)
= p D(qr) + qr D(p)
= D(p•qr)
There are a number of generalizations of the notion of "derivative"; in particular, you can take the derivative of a regular expression:
https://github.com/scythe/dreg/blob/master/re-deriv.pdf
...which I've been slowly turning into a regex matcher.