The blog posts suggests, "What are the prime factors of 15683615?" as an example, and Bard does indeed appear to write and execute (although I don't know how I can be sure it's actually executing and not hallucinating an execution) Python code and returns the right answer.
But what about, "What is the sum of the digits of 15683615?"
To find the sum of the digits of a number, you add together all the individual digits. In the case of the number 15683615, the sum of its digits would be:
1 + 5 + 6 + 8 + 3 + 6 + 1 + 5 = 35
Therefore, the sum of the digits of 15683615 is 35.
====
Good job ChatGPT. This sort of “near miss” for Bard has been my experience the entire time I’ve compared Bard and ChatGPT.
In the most hilarious example, I asked both to reword an email to someone named “Helena” to explain how I’ve been busy and did not return her email. I included her prompt for context, including her asking if I was on vacation. ChatGPT did a good job of adjusting my wording to be better understood by a non native English speaker and used her prompt appropriately. Bard spun a story about how I was vacationing with my non-existent children in Helena Montana.
It's frankly amazing how bad it is. On benchmarks and human comparisons, ChatGPT obliterates the small but growing open LLM community--but Bard is right in there with the open models. It's insane to me that Google, with the people who basically invented transformers and LLMs and basically unlimited TPUs, can't do better than a few hobbyists with like three A100s between them.
No one has trained a LLM of the open source level quality with just 3 gpus. Fine tuning sure, but pretraining the even the smaller models takes more than that.
To be fair, ChatGPT's answer is wrong when given the Bard example.
To find the prime factors of 15,683,615, we can use a process called prime factorization. We divide the number by prime numbers until it can no longer be divided evenly. Here's the step-by-step process:
We start by dividing 15,683,615 by the smallest prime number, which is 2. However, 15,683,615 is an odd number and not divisible by 2.
Next, we try the next prime number, 3. If we divide 15,683,615 by 3, we get 5,227,871. Again, this is an odd number.
We continue with the next prime number, 5. Dividing 15,683,615 by 5 gives us 3,136,723. Still an odd number.
Moving on to the next prime number, 7. Dividing 15,683,615 by 7 yields 2,240,516. Again, this is an odd number.
We try the next prime number, 11. Dividing 15,683,615 by 11 gives us 1,425,783. Still an odd number.
Next, we try the prime number 13. Dividing 15,683,615 by 13 yields 1,205,659. This is an odd number.
Continuing with the next prime number, 17. Dividing 15,683,615 by 17 gives us 920,755. Still an odd number.
We try the prime number 19. Dividing 15,683,615 by 19 yields 823,035. This is an odd number.
The next prime number is 23. Dividing 15,683,615 by 23 gives us 680,563. Still an odd number.
Finally, we try the prime number 29. Dividing 15,683,615 by 29 yields 539,475, which is divisible by 5.
Therefore, the prime factors of 15,683,615 are 3, 5, 5, 7, 11, 13, 17, 19, 23, and 29.
A couple of days ago I gave both of them a prompt similar to:
"Create a map with tect using only the letters B,W,S. The map should be a 20 line by 20 colum text . Each line should contain only B,W or S letters were W are walls, B is blank/background space and S represent "stairs" to outside of the room...."
The query was a it longer with more specs.
Neither ChatGPT nor Bard could give me a good answer. They used other letters , they made 21 or 19 chars lines. They made 5 or 6 line maps. They basically made a mess.
That's my current test for reasoning, analysis and intelligence for these things.
They are both pretty bad. I ask about templates for CI/CD and they imagine parameters that don’t exist, and no amount of wrestling it around can suppress this. People like to cherry-pick examples where they work great and then proclaim it’s the best thing since sliced bread, but it’s just simply not.
If that were the case, shouldn't google be equally capable of including so many examples in their own dataset?
Like, regardless of how it works under the hood, I as an end user just want a useful result. Even if ChatGPT is "cheating" to accomplish those results, it looks better for the end user.
The continued trickle of disappointing updates to Bard seems to indicate why Google hadn't productized their AI research before OpenAI did.
google isn't even able to keep google authenticator working¹. Since the last update it has its icon "improved", but it doesn't reliably refresh tokens anymore. Since we have a policy of at most 3 wrong tokens in a row, a few people of my team almost got locked out.
Feel free to downvote as I'm too tired to post links to recent votes in the play store :)
Sorry for the snark in this post, but I have been less than impressed by google's engineering capability for more than 10 years now. My tolerance to quirks like the one I just posted is, kind of, low.
¹ An authenticator app is a very low bar to mess up
This is like when their speech-to-text-service always got "how much wood could a woodchuck chuck if a woodchuck could chuck wood" right even if you replaced some of the words with similar words. But then failed at much easier sentences.
I downvoted you because you didn't give what's the correct answer in this case. (though it's easy, but it's better to give correct answer for reader save the thought)
But what about, "What is the sum of the digits of 15683615?"
Bard says:
The sum of the digits of 15683615 is 28.
Here's how I got the answer:
1 + 5 + 6 + 8 + 3 + 6 + 1 + 5 = 28
====
I don't think this is ready for prime time.