Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The blog posts suggests, "What are the prime factors of 15683615?" as an example, and Bard does indeed appear to write and execute (although I don't know how I can be sure it's actually executing and not hallucinating an execution) Python code and returns the right answer.

But what about, "What is the sum of the digits of 15683615?"

Bard says:

The sum of the digits of 15683615 is 28.

Here's how I got the answer:

1 + 5 + 6 + 8 + 3 + 6 + 1 + 5 = 28

====

I don't think this is ready for prime time.



Meanwhile care of ChatGPT:

To find the sum of the digits of a number, you add together all the individual digits. In the case of the number 15683615, the sum of its digits would be:

1 + 5 + 6 + 8 + 3 + 6 + 1 + 5 = 35

Therefore, the sum of the digits of 15683615 is 35.

====

Good job ChatGPT. This sort of “near miss” for Bard has been my experience the entire time I’ve compared Bard and ChatGPT.

In the most hilarious example, I asked both to reword an email to someone named “Helena” to explain how I’ve been busy and did not return her email. I included her prompt for context, including her asking if I was on vacation. ChatGPT did a good job of adjusting my wording to be better understood by a non native English speaker and used her prompt appropriately. Bard spun a story about how I was vacationing with my non-existent children in Helena Montana.

Bard just isn’t there yet.


It's frankly amazing how bad it is. On benchmarks and human comparisons, ChatGPT obliterates the small but growing open LLM community--but Bard is right in there with the open models. It's insane to me that Google, with the people who basically invented transformers and LLMs and basically unlimited TPUs, can't do better than a few hobbyists with like three A100s between them.


No one has trained a LLM of the open source level quality with just 3 gpus. Fine tuning sure, but pretraining the even the smaller models takes more than that.


Their big new model will be Gemini ("twins") apparently a DeepMind / Google Brain joint project already before they merged.


What it really shows is how far ahead Meta got in front of Google.


To be fair, ChatGPT's answer is wrong when given the Bard example.

To find the prime factors of 15,683,615, we can use a process called prime factorization. We divide the number by prime numbers until it can no longer be divided evenly. Here's the step-by-step process:

We start by dividing 15,683,615 by the smallest prime number, which is 2. However, 15,683,615 is an odd number and not divisible by 2.

Next, we try the next prime number, 3. If we divide 15,683,615 by 3, we get 5,227,871. Again, this is an odd number.

We continue with the next prime number, 5. Dividing 15,683,615 by 5 gives us 3,136,723. Still an odd number.

Moving on to the next prime number, 7. Dividing 15,683,615 by 7 yields 2,240,516. Again, this is an odd number.

We try the next prime number, 11. Dividing 15,683,615 by 11 gives us 1,425,783. Still an odd number.

Next, we try the prime number 13. Dividing 15,683,615 by 13 yields 1,205,659. This is an odd number.

Continuing with the next prime number, 17. Dividing 15,683,615 by 17 gives us 920,755. Still an odd number.

We try the prime number 19. Dividing 15,683,615 by 19 yields 823,035. This is an odd number.

The next prime number is 23. Dividing 15,683,615 by 23 gives us 680,563. Still an odd number.

Finally, we try the prime number 29. Dividing 15,683,615 by 29 yields 539,475, which is divisible by 5.

Therefore, the prime factors of 15,683,615 are 3, 5, 5, 7, 11, 13, 17, 19, 23, and 29.


Note that ChatGPT is using CoT (reasoning process before answer) while Bard isn't.


It's even worse than that. The simple prompt "Compute 1 + 5 + 6 + 8 + 3 + 6 + 1 + 5" returns 36.


One wonders what they mean in the headline by "logic" because it doesn't exactly strike me as the kind I learned in school.


A couple of days ago I gave both of them a prompt similar to:

"Create a map with tect using only the letters B,W,S. The map should be a 20 line by 20 colum text . Each line should contain only B,W or S letters were W are walls, B is blank/background space and S represent "stairs" to outside of the room...."

The query was a it longer with more specs.

Neither ChatGPT nor Bard could give me a good answer. They used other letters , they made 21 or 19 chars lines. They made 5 or 6 line maps. They basically made a mess.

That's my current test for reasoning, analysis and intelligence for these things.


They are both pretty bad. I ask about templates for CI/CD and they imagine parameters that don’t exist, and no amount of wrestling it around can suppress this. People like to cherry-pick examples where they work great and then proclaim it’s the best thing since sliced bread, but it’s just simply not.


My favorite so far is Copilot writing code with variables like “testO1”.

Took me an hour to figure out why it didn’t work.

O != 0


(facepalm emoji)


that's composition - there it's trying to stack its universal function approximators and the errors are propagating out of control.

You're also right about hallucinating the execution.

I was testing PLAM-2 today and I noticed it's quite a lot more resistant to sycophancy attacks...


Interestingly though, improved Bard isn't...


I haven’t noticed that, it’s still hallucinating badly.


I mean this is just an issue with convincing it to use code when it should, which seems surmountable.


oof, Bard...my three drafts for this example:

draft 1: The sum of the digits of 15683615 is 27.

draft 2: The sum of the digits of 15683615 is 26.

draft 3: The sum of the digits of 15683615 is 30.


ChatGPT may only be getting this right because so many examples are in its dataset.

Do we know if it has actually learned how to do the operation?


If that were the case, shouldn't google be equally capable of including so many examples in their own dataset?

Like, regardless of how it works under the hood, I as an end user just want a useful result. Even if ChatGPT is "cheating" to accomplish those results, it looks better for the end user.

The continued trickle of disappointing updates to Bard seems to indicate why Google hadn't productized their AI research before OpenAI did.


google isn't even able to keep google authenticator working¹. Since the last update it has its icon "improved", but it doesn't reliably refresh tokens anymore. Since we have a policy of at most 3 wrong tokens in a row, a few people of my team almost got locked out.

Feel free to downvote as I'm too tired to post links to recent votes in the play store :)

Sorry for the snark in this post, but I have been less than impressed by google's engineering capability for more than 10 years now. My tolerance to quirks like the one I just posted is, kind of, low.

¹ An authenticator app is a very low bar to mess up


I’ve had constant issues with 2FA through YouTube not functioning too. The quality rot is really remarkable.


This is like when their speech-to-text-service always got "how much wood could a woodchuck chuck if a woodchuck could chuck wood" right even if you replaced some of the words with similar words. But then failed at much easier sentences.


I downvoted you because you didn't give what's the correct answer in this case. (though it's easy, but it's better to give correct answer for reader save the thought)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: