## Blog: Quod Erat Faciendum — Our journey to centaur theorem provers. Part-1 of a 3-part series

QEF = “**That which was to be done”**

That = Machines constructing proofs

Done? NOT really

*[Euclid used QEF at the end of his constructions which were not proofs. I might have appropriated the Latin term just for effect — sorry Euclid!]*

As of 2019, machines can prove some theorems but not all. Why? How do we know the machine is even correct? Does the machine understand the notion of elegance (*whatever that means to humans*) when it comes to proofs or are we stuck with some long-winded convoluted logic that requires another machine to verify, adding an additional layer of complexity? Does it boil down to a combinatorial search problem? Will it require Artificial General Intelligence (whatever that means) to create a fully automated theorem prover that can resolve the million dollar unsolved problem of **P vs NP**? How should the knowledge (axioms, models, rules) be represented and reasoned on? Should we be using predicate logic as the underlying formal deduction system or some alternatives like univalent foundations? What exactly are axioms and how are they chosen?

*“For an answer which cannot be expressed the question too cannot be expressed. The riddle does not exist. If a question can be put at all, then it can also be answered.” – *Ludwig Wittgenstein**,** **Tractatus Logico-Philosophicus**

Sure! But only if **“ I don’t know”** is also a legitimate answer. Many of the above questions are loaded and do not have clear answers. But that’s what makes this space exciting. So lets go scratch the surface a bit and see how far we can get. In part-1, we will only look at the first few human and machine proofs.

Axiom reminded me of a scene from this graphic novel titled Logicomix: An epic search for truth. The student in the picture is Bertrand Russell.

The story goes that the idea of axioms drove Russell to eventually come up with the Russell’s or Barber’s paradox described as follows:

English: *“The barber is the one who shaves all those, and those only, who do not shave themselves. The question is, does the barber shave himself?”*

Set Theoretic: *“Consider the set S of all sets which are not members of themselves: is S a member of itself?”*

This is a paradox because of the universal quantifier around ‘y’ which will include every element in the domain including the barber ‘x’. So when x is assigned y, the sentence becomes: Shaves (x, x) ←> **~**Shaves (x, x) — a contradiction!

That deceptively simple question from Russell shook the foundations and was particularly detrimental to Gottlob Frege’s Foundation of Mathematics. The paradox exposed inconsistencies in the axioms that Frege was using to formalize his logic.

Good news: The logical self-contradiction was later resolved by an axiomatic system called ZFC (Zermelo-Fraenkel). And who is “C”? No one. It stood for the axiom of **C**hoice.

### The First Human Proof

Before we try to tackle the loaded questions above, I’d like to take us back a few thousand years to an arguably simpler time when there were no machines (*at least not the kind that we are used to now*) and humans were just beginning to create proofs.

Note that the definition of a proof has also evolved from something as general as using reason to convince someone that something is true to more descriptive ones such as the popular Bell’s 1976 definition in “A study of pupil’s proof-explanations in mathematical situations”:

“*A proof is a directed tree of statements, connected by implications, whose end point is the conclusion and whose starting points are either in the data or are generally agreed facts or principles*”.

So we might need to keep an open mind on what was considered a proof back in the day.

Thales of Miletus [**634–548** **BC**] is credited for having come up with the first ever proof for the statement “**A circle is bisected by any diameter**”. I have to admit though that this may only be in the Neighborhood of Truth since Thales’ original proofs did not survive and it is Proclus of Athens [**412–485 AD**] who attributed the theorems and proofs to Thales almost a thousand years later in his “A commentary on the first book of Euclid’s elements”. Proclus claimed that geometry began in Egypt and Thales brought geometry from Egypt to Greece. Add to that, the fact that Babylonian and Egyptian mathematics dates back to 2000 BC. So it is hard be certain that the mathematicians did not come up with some notion of proofs much earlier than recorded even though there is no explicit mention of mathematical proof. But I found this snippet from the 1977 book titled “The Phenomenon of Science: A Cybernetic Approach to Human Evolution”, particularly interesting because of how it highlights that the concept of proof was almost a way of thinking or life among the Greeks.

*“It is not surprising that the Greeks, with their democratic social order, created the doctrine of mathematical proof. Disputes and proofs played an important part in the life of the citizens of the Greek city-state (polis). **The concept of proof already existed; it was a socially significant reality.** All that remained was to transfer it to the field of mathematics, which was done as soon as the Greeks became acquainted with the achievements of the ancient Eastern civilizations.”*

Pythagoras of Samos [**570–495 BC**], a Greek philosopher, is credited for discovering a fundamental relationship in Euclidean geometry among the three sides of a right triangle. We learn this in school as the Pythagoras theorem. Remember the three positive integers (a, b, c) such that a² + b² = c²?

Since Pythagoras was born around **570 BC**, it is tempting to assume that all work related to the Pythagorean theorem happened after **570 BC**. Except, a Babylonian tablet dating back to 1800 BC (over a thousand years earlier), with inscriptions in cuneiform which had hints of Pythagorean triples was discovered. It is now popularly known as **Plimpton 322**.

The Plimpton 322 was originally thought to be some kind of Sumerian ledger-book or spreadsheet. Babylonians were known to keep records, so this kind of made sense. But in the 1940s some historians (Otto Neugebauer and Sachs) mapped or rather interpreted the 15 columns as being Pythagorean triples. I tried to make sense of the numbers and piece some pictures together from this reference and the above link. Here’s what the mapping looks like. I only verified the first row.

Pretty cool I thought. But it just appears to be an observation, albeit a very good one. To arrive at these calculations, maybe they required rich mathematical constructions and tools. Maybe. But can we call that a proof? Did it generalize?

**Takeaway-1: Observe and record patterns**

Don’t worry, we’ll get back to Thales soon but lets take a quick look at a timeline of a few other attempts from China and India that are interesting.

During the Zhou dynasty in China, one of the oldest Chinese mathematical texts called the **Zhoubi Suanjing** (or Chou Pei Suan Ching from **1046–256 BC**) was written. The book was dedicated to astronomical observations and calculations. Hidden in this text is a geometric proof of the Pythagorean theorem.

Much like Proctus compiled, synthesized and brought to light the works of Thales thousand years later, Zhao Shuang and Liu Hui who lived in the third century AD compiled and interpreted the Zhoubi Suanjing and Jiuzhang Suanshu or The Nine Chapters on the Mathematical Art respectively. They saw it as being very important to share with a wider audience and resurrected the effort thousand years later with their own commentaries and illustrations. One of the many topics was the Pythagorean theorem, or what was then known as the **Gou Gu** or **Gou Gu Xian **theorem or rule. Gou and Gu are the orthogonal sides of the right triangle while Xian is the hypotenuse.

As captured in this translation of Zhoubi Suanjing by Christopher Cullen, the theorem is described as a dialog between the Duke of Zhou and Gao of Shang, a sage of the Shang dynasty. Here’s the dialogue that begins with a question from the duke:

*“Long ago, the Duke of Zhou asked Shang Gao , “I have heard, sir, that you excel in numbers. May I ask how Bao Xi laid out the successive degrees of the circumference of heaven in ancient times? Heaven cannot be scaled like a staircase, and earth cannot be measured out with a footrule. Where do the numbers come from?*”

The response goes:

As the author notes, this is more like a reference to a 3–4–5 triangle rule rather than a generalizable proof. But there appears to be a certain geometrical style of reasoning in the ancient Chinese mathematical texts as shown below.

Liu Hui in his commentary on the Nine Chapters provides an explanation of the algorithm to the first three problems in the Jiuzhang Suanshu where the Pythagorean theorem is introduced and described. The explanation may be viewed as a statement about the conservation of area.

**Takeaway-2: Geometric thinking, rearrangement, notion of stability and conservation.**

Between 800–600 BC, out of India emerged this vedic text called the Baudhayana Shulba Sutras, which contained the Baudhayana theorem or the Pythagorean theorem. Here’s a snippet from the original manuscript in Sanskrit.

The relevant statement in the form of an empirical proof:

which translates to:

#### “A rope stretched along the length of the diagonal produces an area which the vertical and horizontal sides make together”

There is no mention of what the horizontal and vertical slides refer to. It could be a square or a rectangle. But it does call out a demonstrable relationship among the sides.

Shown on the left is an exhibit of the Baudhayana theorem in the Science Exploration Hall in Kolkata, India.

**Takeaway-3: Empirical Proof**

There have been several other proofs like Euclid (Greece), Bhaskara (India) to mention a few that are more popular but came later. So I’ll stop here and return to the first theorem attributed to Thales of Miletus by Proclus.

Thales Theorem: “A diameter of the circle is a straight line drawn through the centre and terminated in both directions by the circumference of the circle; and such a straight line also bisects the circle.”

Proclus goes on to say in his commentary that “** The cause of this bisection is the undeviating course of the straight line through the center**” (

*Proclus, A Commentary, 157.9*). And the method suggested was to imagine a line drawn through the centre of the circle, and then to imagine one part of the circle placed over the other, when it would be shown that each part is identical. Proclus continues: “

**”.**

*the one part, then, fits the other, so that they are equal. Consequently the diameter bisects the circle***Takeaway-4: Imagine and demonstrate by action**

So does that count as a solid proof? Is it any more or less formal than the other proofs we saw earlier? Should we consider Thales to have come up with the first human proof?

Maybe the ideas were independently developed around the world at the same time. Maybe people traveled and spread the ideas. We could either go down a rabbit hole of investigating how the word proof was defined and used back in the day, how rigorous it was (formalism, consistency, soundness, decidability), when the first one was created and by whom. Or we can ask different questions.

What can we take away from **how** our ancient mathematicians approached theorems thousand years ago? How do our modern mathematicians **approach** theorems today? How much of this can be encoded into a machine?

Lets register a few takeaways from the above “proofs” in history:

- Observe and Record Patterns (come
*up with hypotheses/explanations*) - Geometric thinking, rearrangement, notion of stability and conservation
- Empirical Proof
- Imagine and demonstrate by action

Before we jump to the first ever machine proof, do you think a machine can do the above? That is, imagine? think geometrically? observe patterns? know what to try and then try those things? demonstrate by action? understand concepts such as stability and conservation? Wait, but we’ve made so much progress in the last two thousand years. Does a machine even need to do what our ancient mathematicians did?

Lets keep those questions in mind. We’ll keep coming back to them.

### The First Machine Proof

Fast forward a few millennia. The first major theorems that were successfully proven with the help of machine were the Four Color Theorem in **1976** and the Robbin’s conjecture in **1996** by William McCune’s EQP (**EQ**uational **P**rover) theorem prover. The proof for Robbins’ conjecture was found in just 8 days using an RS/6000 processor with 30 Megabytes of memory.

To give you a sense of how readable or understandable some of these proofs are, here’s what the fully automated proof for the Robbin’s conjecture looked like in two different notations:

Yep. Not very human friendly but then if you’re in the field, this may appear a lot simpler compared to some gnarly proofs that run into pages. I’d like to focus more though on some of the earlier efforts in automated theorem proving in the 1950s which proved simpler theorems in Principia Mathematica (PM).

The story begins in **1953** with JOHNNIAC (**JOHN** von Neumann **N**umerical **I**ntegrator and **A**utomatic **C**omputer), a machine based on the Von Neumann computer architecture that was built by RAND Corporation and had the following specs:

- CPU Technology: Vacuum Tubes
- Memory Technology: Magnetic Core
- Memory Size: 4096 40-bit words in 1955 (1953–54 used Selectron tubes with 256 words)
- Memory Cycle Time: 15 microseconds (0.067 MHz)
- Weight: 5000 pounds

Just compare this 5000 pound computing machine running an instruction set containing 83 instructions (initially) to the 3 pound human brain from a few thousand years back and it still couldn’t do a lot of what our brains could/can!

In **1929**, Mojzesz Presburger, a Polish mathematician, logician and philosopher had shown that a restricted fragment of arithmetic, involving only the addition of integers was decidable, meaning, there exists a decision procedure for it. In **1954**, Martin Davis, a mathematician/logician implemented Presburger’s decision procedure on the original JOHNNIAC, which at the time could only handle simple theorems because of how resource intensive it was. But the great triumph was proving the theorem: **Sum of two even numbers is even **— possibly the first machine proof!

JOHNNIAC was also the computer on which Logic Theory Machine or just Logic Theorist (LT), one of the first theorem provers by Newell and Simon ran in 1956. The task set for LT was to prove that certain expressions can be derived by application of specified rules of inference from a set of 5 axioms. Two rules of inference were taken from Principia Mathematica namely Rule of Substitution and Rule of Detachment and one additional rule called Rule of Replacement was added. As defined in this original article by Newell and Simon,

*“A proof is a sequence of expressions, the first of which are accepted as axioms or as theorems, and each of the remainder of which is obtained from one or two of the preceding by the operations of substitution, detachment or replacement.*”

Sounds simple. But the magic (when it did work) was in the systematic algorithms to construct the proofs. Essentially these were heuristic symbol manipulation techniques that under the covers were a nested set of searches through memory. LT went on to prove 38 of the first 52 problems in chapter 2 of Principia Mathematica. And amazingly the proof for *2.85 in LT turned out to be more elegant than the one given in PM. As mentioned in Perspectives on the History of Mathematical Logic, Herb Simon wrote to Bertrand Russell late 1956 describing the work of the LT. In the letter, Simon stated

*“The machine’s proof is both straightforward and unobvious, we were struck by its virtuosity in this instance”*

to which Russell replied:

*“I am delighted to know that Principia Mathematica can now be done by machinery. I wish Whitehead and I had known of this possibility before we wasted 10 years doing it by hand. I am quite willing to believe that everything in deductive logic can be done by machinery*.”

LT was named as a co-author on a paper submitted to Journal of Symbolic Logic but was rejected! Yes, rejected!

Also captured in the Perspectives book above, are both the proofs from PM and LT. Apparently the one in PM had an error or it wasn’t obvious how step (2) could be obtained as indicated. But we’ll save the actual proofs for later as it requires learning the rules and axioms a bit.

Around the same time, Hao Wang another mathematician, logician, philosopher (*its beginning to sound like these 3 disciplines always went together*) got interested in automated theorem proving and developed decision procedures that were more efficient than LT. As described in the book titled “Mechanizing Proof: Computing, Risk, and Trust” by Donald MacKenzie, Wang’s successes revealed a fundamental inadequacy in the LT approach. Wang had this to say about LT:

*“There is no need to kill a chicken with a butcher’s knife. Yet the net impression is that Newell-Shaw-Simon failed even to kill the chicken with their butcher’s knife. They do not wish to use standard algorithms such as the method of truth tables… [But] to argue the superiority of “heuristic” over algorithmic methods by choosing a particularly inefficient algorithm [the “British Museum” algorithm, indiscriminately constructing sequences of deductions] seems hardly just.*”

To which Herb Simon responded: *“I remember explaining to him [Wang] on several occasions that he was in a different business than we are…We were trying to understand how human beings suggests that the reason they are able to do interesting things is heuristic search. Occasionally things that human beings do that way, we can bludgeon through on a computer, but why go that way?”*

Marvin Minsky in his famous 1960 publication of Steps Towards Artificial Intelligence attempts to reconcile the two views by saying that

*“a program to solve real mathematical problems will have to combine the mathematical sophistication of Wang with the heuristic sophistication of Newell, Shaw and Simon*”

but ends with the essence of the problem — **search**, saying this**:**

*“All these efforts are directed toward the reduction of search effort. In that sense, they are all heuristic programs. Since practically no one still uses “heuristic” in a sense opposed to “algorithmic, serious workers might do well to avoid pointless argument on this score. The real problem is to find methods that significantly delay the apparently inevitable exponential growth of search trees.”*

I would like to pause here and compare at a high level how humans and machines approached theorem proving initially (note that these all don’t hold today). Here’s a table with a possibly oversimplified or poorly abstracted set of dimensions. But I feel its still a good starting point. We can keep unpacking the features and differences as we dig into automated theorem provers.

So do machines prove theorems 2019? In Part-2, we will explore the landscape of automated theorem provers, the different approaches and rationale behind them and spend some time on interactive theorem provers. Our journey will look something like this:

*Source: Artificial Intelligence on Medium*