As far as research articles go, what I've noticed is that I never finish them on the first read. And I don't think anyone should expect to. Too often I see students start reading articles, getting lost after a page and just giving up.
It's perfectly normal to stop understanding after a while and putting the article down. What you need to realize is that it takes time for your brain to process new abstract concepts. Give your mind some time to digest these new concepts. Put the article down and think about it on your off time in the following days. Then pick the article again and see if the part you already read looks more trivial and if you're able to go further this time around.
Most of the time, you will. Repeat.
There is no shame in taking multiple reads and months in order to read a research paper: keep in mind that it took the author(s) years to reach the level of expertise they are showing in that article. Cut yourself some slack. But don't give up on the first try.
This is great advice for reading proofs. In particular, the bit about never diving into a proof until you know why you expect it to be true is a big one. I find it very helpful, after I've understood a proof, to try to put together the informal version of why it works, and try to explain that informal version to someone else.
Has anyone done rap genius for research papers? I think informal context associated with proofs would be a huge help in understanding. And even as a proof writer I'd put details in it that I wouldn't put in the formal proof.
Several years ago I started a website called WTFormula.com
Unfortunately I was the only contributor, and since I'm not a mathematician my contributions were very few. I haven't been able to convince other people with helping writing content, and eventually I dropped it for lack in interest (the domain is also available again). I still believe it would be a very useful and nice tool to have, though.
Currently I'm working on http://libflow.com (on my spare time) and I have a lot of ideas to improve it (this "rap genius for research papers" was one of the possible improvements I had in mind). Unfortunately the project is on hold because I currently don't have the money to move it to a dedicated server and enough free time to make major changes to the site.
So I'm going to make a controversial claim and then try to defend it:
There is a cultural problem in mathematical communication and pedagogy, which is keeping the discipline from having a wider accessibility and appeal.
I come from a background where I ended up studying mathematical logic without having any mathematical skills to speak of at all. You'd be surprised how far you can go in the former without any training in the latter. I'm now teaching myself maths.
The reason this is significant though is because of the notion of a proof as it is defined in mathematical logic is different to the proofs you see presented in mathematical papers and textbooks. In mathematical logic, what we care about is the precise mapping between the syntax and the semantics. The ideal (which for some systems is entirely met) is a perfect functional mapping between the two. And because we care about this - a proof is something which is defined carefully so as to ensure that no invalid WFF (well formed formula) is derivable from your axioms and rules of inference. Obviously maths doesn't want to prove falsehoods either, but the difference is in the level of explicitness. Every line in a proof of logic must have it's justification written beside it - whether it be by use of a rule of inference, or a previously derived theorem. You don't skip steps just because to you they seem 'obvious'. And my wonderful logic teacher at the time would take off marks whenever we did.
Reading proofs in mathematical logic - even long and complex ones - is more often than not straightforward. Even the meta-logical results about the systems I studied were presented with greater clarity than your average proof in normal mathematics.
Why is this? Well it seems to me that mathematicians un-apologetically skip steps in their proofs all the time. And it's extremely rare that they will state their justifications. To me this is as bad as un-commented, un-documented code with no unit tests. Any coder that inherited that sort of code would be rightfully annoyed!
And this leads to situations that require this blog post. I have a friend who is a maths PhD (who has been a great help in my mathematics studies) - who defends this aspects of mathematics. His main argument is that of simple convenience. The problem, he says, is not the presentation but my lack of mathematical experience. If every proof had to be written for those with little mathematical experience, then no maths would ever get done.
As I get more experienced with Maths - I can't disagree that reading proofs gets easier. But I feel his reply misses the point. These gappy proofs are endemic in pedagogical literature as well - not just research papers. And a pedagogical text shouldn't be about the convenience of the author. Different textbooks assume different levels of background knowledge and I find it to be a complete mess. Besides this, I generally have little sympathy for this reply because of my background in logic which trained me to be explicit. Whereas all these mathematicians are growing up thinking that it's normal to just assume everyone knows what you're on about.
And, besides the teaching context, in the research context I've heard things aren't exactly great - you get this absurd situation, described to me by my friend, where mathematicians gather at conferences - present papers, but generally have little understanding of what anyone else is talking about. Mathematicians seem fine with this, apparently. Why? I believe this to be a cultural reality that could be changed - not something essential to the nature of the discipline.
Don't get me wrong - there are some wonderful examples of mathematicians that are working really hard to change this. I found the Ohio State university calculus course on Coursera to really try to be innovative in their pedagogical approach (Their text book was a bit meh - but their online exercise application was brilliant). But in general, I just don't see a lot of this.
On one hand I disagree because mathematics is abstraction built on top of abstraction many times over. It can be incredibly dense with information. An example comes to mind - Stokes' theorem. It's an amazingly elegant piece of math. It can be stated very neatly with 6 symbols - the integral symbol, omega, d omega, Omega, del Omega and an equal sign. If you were to unravel what each of the symbols meant using the closest terms related to them and then did the same for those terms and you continued some more, you'll have a small book.
This kind of thing is paralled in programming where you rarely want to just push bits around. I imagine you as a person who specializes in assembly complaining about the lack of rigor in more abstract languages.
However I agree with you because the above stuff is sometimes just an excuse for writing terribly. Firstly, there are practical considerations about the sheer length of text. Then you have problems of explaining. A professor writing a textbook can be far detached from that past when they first learned the stuff. My own teachers always told me to never write "obviously" unless I also explain why it is obvious (It can be very useful to explain to the student what needs to be obvious and why).
The best solution I can think of is writing with the online world in mind. A mathematical paper does not need to be short and concise. I envision two versions - the long online version and the shorted published version. There are after all practical and physical considerations in the latter while the former shouldn't be a big deal. In this way whoever is interested in more than the overview can access the longer online version and learn from it.
The biggest problem I see in this scheme is that of laziness - not everyone will want to write long texts.
Bear in mind though that logic books and their proofs generally rely on a 'stack' as well. You prove a theorem and then you're able to use that theorem in your proofs going forward - with the theorem itself being used as the justification.
From a pedagogical standpoint - the act of writing that justification is enormously valuable since the book will also have an index where you can quickly look up a theorem if you've forgotten the basics of it.
Perhaps the difference is more in the fact that maths textbooks will often rely on theorems proved in other mathematical contexts (where it doesn't make sense to provide the proof in the context currently being explored). The problem being that there can be no reliable way to index that information. Logic textbooks, however, present systems that are much more self-contained (if not completely).
When I took Logic for CS we were taught the Natural Deduction proof calculus, which mirrors the kind of proofs we do in maths. We had an exercise during the course to write out which ND inference rules were used in every step of some natural language math proof, and it really shows you how one sentence can pack 10 or more inference rules if you try to unpack it!
He also put a similar question on the test which had you first prove a theorem using natural language, and then you had to find the use of ND inference rules in your proof (that wasn't easy... even though you wrote the proof yourself).
I think the best response to this is Terence Tao's post, "There’s more to mathematics than rigour and proofs". I'll just quote verbatim:
"One can roughly divide mathematical education into three stages:
The “pre-rigorous” stage, in which mathematics is taught in an informal, intuitive manner, based on examples, fuzzy notions, and hand-waving. (For instance, calculus is usually first introduced in terms of slopes, areas, rates of change, and so forth.) The emphasis is more on computation than on theory. This stage generally lasts until the early undergraduate years."
The “rigorous” stage, in which one is now taught that in order to do maths “properly”, one needs to work and think in a much more precise and formal manner (e.g. re-doing calculus by using epsilons and deltas all over the place). The emphasis is now primarily on theory; and one is expected to be able to comfortably manipulate abstract mathematical objects without focusing too much on what such objects actually “mean”. This stage usually occupies the later undergraduate and early graduate years."
The “post-rigorous” stage, in which one has grown comfortable with all the rigorous foundations of one’s chosen field, and is now ready to revisit and refine one’s pre-rigorous intuition on the subject, but this time with the intuition solidly buttressed by rigorous theory. (For instance, in this stage one would be able to quickly and accurately perform computations in vector calculus by using analogies with scalar calculus, or informal and semi-rigorous use of infinitesimals, big-O notation, and so forth, and be able to convert all such calculations into a rigorous argument whenever required.) The emphasis is now on applications, intuition, and the “big picture”. This stage usually occupies the late graduate years and beyond."
The transition from the first stage to the second is well known to be rather traumatic, with the dreaded “proof-type questions” being the bane of many a maths undergraduate. (See also “There’s more to maths than grades and exams and methods“.) But the transition from the second to the third is equally important, and should not be forgotten."
I don't disagree that there is a lot of conceptual stuff going on behind any proof... but I contend that the same is going on behind logic proofs as well. You wouldn't gain a good understanding of a particular logical system (what motivates it, makes it interesting etc, why we work toward proving the particular theorems that we do) by just working through a set of proofs. Yet logic strives for explicitness in presentation, but mathematics doesn't seem to.
It may just be that you're right and if I ever get that high up in understanding I'll similarly look down upon the plebs and discount their frustration. But consider - if I get that high - I must have used SOME LADDER OR OTHER... the plebs I think, could be forgiven for thinking that the masters are tossing away their ladders as soon as they are done with them.
Fully agree, and thanks for writing it up so clearly. I'll be bookmarking your comment.
The same problem happens in computer science research. A lot of the formal CS proofs and axioms and rulesets and all that are very badly and informally written. All variables are single-character (preferably from a non-latin character set, because hey, why use `contextSet` when you can write Ξ). Variables are overloaded, often even inside the same scope.
In fact, I've seen proofs about variable binding and scoping where the proofs themselves contained unbound, undefined variables named "a". That's like writing your healthy lifestyle blog from the McDonald's.
I'm not sure where I fall on this. Sometimes I want to machine-validate
all of my proofs, and sometimes I feel it doesn't matter.
A couple arguments from the other direction:
- Reading and verifying each step of a proof is not necessarily enough
to understand the proof as a whole. So writing a proof without any
steps skipped doesn't alleviate the need to write a sketchy 'human'
proof to go along with it. Lamport's hierarchical proofs[1] are one
way to do both.
- It takes a long time to justify every step, especially if you want
to give references.
- Is it really necessary for us to be 100% certain that a proof is
correct? As a proof ages and more people read it without finding
errors, it becomes less and less likely that it's incorrect. For
most practical purposes, that's good enough.
I am OP's "friend who is a maths PhD". I'd just like to expand Grovulent's version of my position a bit to make it a little more convincing. To my mind, all of the problems are explained by the following three things:
1) Echoing vlasev [1], I think a very good analogy of the relation between proofs in mathematical logic and proofs in "normal" mathematics is that of the difference between programming in assembly (portability issues notwithstanding, though mathematical logic is probably a friendly RISC arch) and programming in a dense high-level language like Haskell. The productivity gain from using the high-level language, that permits skipping all those pesky unimportant details, is astronomical, and no amount of complaining from people who find Haskell hard to grok is going to stop us using it. (Drink the Koolaide, my friend, drink it up!) I think it is uncontroversial that this be true for research mathematics, but of course we don't want to just throw students in the deep end, which is why the education system attempts to build this knowledge gradually, which leads me to...
2) More practically, I think most peoples' problems learning mathematics come from (if you'll forgive a mathematical metaphor!) the person being out of phase with the material. This is subtler than what Grovulent described. Excellent material exists for all stages of one's mathematical development, of this I am certain; the hard part is ensuring that one is using material that is hard enough to be challenging, but not so hard as to be crushingly incomprehensible. This goes for people doing research in mathematics as well as people being introduced to calculus for the first time, and is very difficult to know when you are the student in question. We generally rely on the education system and expert advice to match students to the appropriate material, and clearly the system doesn't always work very well given how many people are out-of-phase with the material in front of them. In general, most mathematicians love talking about mathematics, and the love teaching it too! They love helping people understand and appreciate this marvel of human ingenuity. Just browse Math.SE[2] and Mathoverflow[3] for a while if you don't believe me.
3) Finally, an additional element is that many people expect to be able to learn mathematics quite easily without much or any deep reflection, because that's how they've always learned things in the past. But you simply can't learn mathematics like that, and you'll be disappointed and frustrated if you try to. (I certainly concede that mathematicians could do a better job of warning students about this.) Learning mathematics involves being in an almost constant state of uncertainty, and of making many, many mistakes, often in front of many other people, and being comfortable with all that. This is a state of affairs that many people find disconcerting, and there's nothing mathematicians can do about that bar suggesting that the student chill out and try not to let their ego get in the way of their education.
Just on point 2 - you say it's a problem of matching the right material for the student.
Let's test that intuition. If you're an educator, and you needed to choose a proof to present to your students on say - the Binomial Theorem. Which of these two would you choose?
Now - I would submit that irrespective of who your students are - choose the second. It's explicit - it doesn't skip steps. No one left behind. If your students are clever enough to follow the first proof - which skips heaps of steps - then giving them the second proof isn't going to disadvantage them, because they can just skip it! It's a book! Sure - you've got more of a problem if you're in a lecture scenario - where more advanced students are going to get bored. But in the context of a textbook I really see no reason not to be explicit.
I think your examples illustrate each of my first two points perfectly. I don't think I've ever seen such a long and laborious proof for the binomial theorem as that in the second example. The two proofs are the same, of course (sometimes there exist several different ways to prove the same thing), the second is just way more explicit (also the first is missing equations 4.1 and 4.2, which are integrated into the second proof, making the first proof artificially shorter and artificially less comprehensible).
As to my second point, it seems pretty obvious to me that the goal of the second proof is to teach mathematical induction to people who are still struggling with basic arithmetic, not principally to prove the binomial theorem, since it includes weird things like putting "inductive hypothesis" and "inductive conclusion" in quotes, and he even writes in long hand that multiplication distributes over addition, i.e. that a(x+y) = ax + ay. Until one has internalised these basic arithmetic operations of course you're going to struggle with induction and the binomial theorem, and you're going to prefer to have examples where everything is spelled out in detail. And that's fine! But if you have internalised those details, making them explicit really does get in the way. Really. The core of the proof of the binomial theorem (which is to say, the reason why it's true) is the first proof you gave; but the second proof completely obscures that behind a haze of mechanical arithmetic manipulation and pattern-matching.
So, if you find the second proof clearer and/or more appropriate (which is fine!), all that says is that you are "in phase" with that material, it does not necessarily make one better than the other or more appropriate than the other in any other context or for anyone else. If students are having trouble with basic arithmetic manipulation, just don't force them to learn harder stuff until that's sorted out. They have to be ready for it. It is a good and happy thing that you have found material with which you are "in phase"; as I said last time, this is often very difficult. Please stop blaming your difficulties on the material when the problem is that you're using inappropriate material. Thanks! :)
Thanks for pointing out the contrasts (and the original article). I have battled this for a long time - there are classroom scenarios where there is only so much you can ask, and when you get to self-study situations, it gets even harder.
Can you tell me which book the second proof is from?
I've been experimenting with Coq[1] recently, which I really hope could be the best of both, where everything is formalized and complete (or else the compiles yells at you), but it's easy to build new abstractions and reference prior proofs, in a way that lets the reader just to the definition with a keystroke.
It's harder to use than I want it to be though. Still, I would recommend checking it out if you like explicit formal proofs. It's really good at that.
Also, I know how hard it can to get started. It really does help to have someone to ask dumb questions to, so feel free to email me (email in profile), or ask on SO.
I remember my Calculus I and II text book always seemed to skip an important step.
1. Some Math
2. A therefore obviously B
3. Q.E.D.
It was always the "obviously" part were I was lost.
The proofs in my more math heavy CS classes (especially Automata) were always a lot clearer even though I thought the material was conceptually harder than calculus.
Calculus is incredibly complicated. Keep in mind it wasn't properly formalised until the late 19th century. It depends a lot on the book and the student, but essentially, to read Calculus I+2\varepsilon you need a solid foundation in Calculus I+\varepsilon. So, usually most of the skipping is because the text assumes the prior is known and digested (or it is just a bad book.)
Most undergraduate students in the United States who major in mathematics take six courses in functions and limits (e.g. elementary calculus—(two courses), multivariable calculus, elementary analysis, complex analysis, and real analysis). Moreover, while the proofs in analysis are far more satisfying than the proofs encountered in elementary calculus, many students remain unsatisfied until their analysis courses in graduate school!
As far as I see it, every proof I have ever encountered has been of two types:
A => B, or A <=> B.
The first of which has the following structure:
A: Given conditions (a,b,c,...) are true;
B: We can say (..., x, y, z) are true.
The second having a structure:
A: Given conditions (a,b,c,...) are true;
B: We can say (..., x, y, z) are true.
AND
B: Given conditions (..., x, y, z) are true;
A: We can say (a,b,c,...) are true.
The easiest (and quickest) way for me to parse proofs given the above representation is by ensuring I know what the conditions (a, b, c, ...) mean and how removing one of them changes the proof, therefore ensuring I know exactly how each condition contributes to the final result.
(This is probably what he was talking about in his second point)
It's perfectly normal to stop understanding after a while and putting the article down. What you need to realize is that it takes time for your brain to process new abstract concepts. Give your mind some time to digest these new concepts. Put the article down and think about it on your off time in the following days. Then pick the article again and see if the part you already read looks more trivial and if you're able to go further this time around.
Most of the time, you will. Repeat.
There is no shame in taking multiple reads and months in order to read a research paper: keep in mind that it took the author(s) years to reach the level of expertise they are showing in that article. Cut yourself some slack. But don't give up on the first try.