People usually make the determination by reading at least part of the text and then find multiple smoking guns / llm-isms
The comment you responded to did not have those.
Fwiw, the article we're commenting on was likely not LLM written. The sentence structure is too convoluted, no LLM would've generated it like that - unless very carefully prompted ... But at that point it's no longer pure AI slop (imo).
Isn't that precisely the reason why we introduced the term hallucination? Because llms have historically always made up bullshit of they cannot answer directly... If they now nailed this to maybe the model not respond instead of responding incorrectly, then a lot of previously unusable usecases would become feasible.
So I feel like that's exactly the right metric and the way to track it wrt hallucinations.
The point is that it's not a useful metric on its own. For example, redirecting from /dev/null also achieves a zero hallucination rate.
We want the hallucination rate to decrease while the overall answer rate of queries remains sufficiently high. For more specifics, look into ROC and AUC.
Also if I were to guess the damages because of sci hub is higher than Anthropic training the models. I don't think I know anyone who didn't bought a book because the summary is available or they can ask about it to AI.
All AI companies should be forced to re-train their models without the offending materials, and this should also extend to all LLMs distilled from models exposed to copyrighted works. Also cover code under licences such as GPL as well. Not to mention patents and designs. This whole LLM business is a giant IP laundromat.
I see, i actually like these tells. It let's us easily distinguish garbage from someones thoughts.
And you can also see how brainrotten someone's gotten when they start accidentally sneaking in these tells into their normal communication.
As a matter of fact, after a full workday in which I'm essentially forced to read LLM garbage for 9h a day... I sadly notice myself adding the same fluff pointlessness to how I express myself.
like I caught a viral contagion that's actively siphoning my humanity away.
And expectedly, when coming back to those opinions with a less infected mindset, I frequently have to reevaluate these thoughts later on
And it's gonna be interesting wherever this narrative will shift over the next 5 yrs
I keep hearing that properties are in the biggest bubble yet in the USA - with the affordable housing shortage being a red herring, because real estate managers and boomers are unwilling/unable to reduce their prices - despite not getting renters/buyers because it would kick off a death spiral as their interests would consequently go up (because of lower security). Along with the ai layoffs etc
I'm not American so I only hear the occasional interview so don't have any idea if it's really as pressing as these industry professionals keep saying but I'm definitely at the edge of my seat watching...
I was on a quarter demo the other day and the project lead for ai innovation was talking about the things he's preparing for the company.
I will not address the things he pitched (as coming soon), as I'm a developer and (hopefully) not the target audience, but I was quiet surprised when they made a questioneer asking how many people use ai and how frequently. (The target demographic was middle management, product owners etc)
75% of people answering said they're using it daily and considered it an essential tool they need to work
Considering it was anonymous I was expecting lower numbers, honestly.
In the recent past, my department received an email from on high with a list of people who were yet to complete the "anonymous" survey.
I always assume my work-survey answers are traceable back to me, whether it's via self-doxxing with my answers, tracing links of the rootkit-level MDM software that can record my screen, but they pinky-promise to only use for remote assistance, in case I open a ticket with IT.
Talked to someone at a large company who had admin access to survey results (require to do some analytics). The survey was “anonymous” but results were geo-located, and had some information about the team they came from, which in many cases was enough to clearly identify people. There is a difference between “doesn’t have a persons name on it” anonymous and actually anonymized in a way hardened against figuring out who is who. I don’t think anyone really does the latter.
I've seen questions asking for my org, team size, role, and when I joined, and thought it would have saved me time had they asked for my employee number instead.
Most external survey providers claimed anonymity but in their T&Cs stated in a very roundabout way that they could provide some information to customers for quality purposes or something. Read “we’ll deanonymize some users if the paying customer wants it”. Internal survey tools are subject to internal management pressure.
Even when you use a tool like Microsoft Forms, where MS really can’t be bothered to deanonimize users unless 3 letter agencies get involved, it’s still possible to do timestamp matching between the proxy/VPN logs and the submission time.
Asume real anonymity only if the URL is the same for everyone and you can fill the survey from any computer on the internet.
But the explanation for why people overhype AI usage is probably simpler. They want to keep their license because it’s a nice perk. They’ll use it to get the gist of a long email thread without bothering the read the details, to get some meeting minutes without validating if that was actually what was said, to generate some crappy modern equivalent of wordart graphics for their presentations, and feel like the time saved to generate what most time is slop was worth it.
When I worked on this (outside of coding) it was a pain to find a use case that really benefited. These were all niche uses that fit an LLM like a glove. These rest was slop, I could see the usage reports, and the BS self reporting surveys. Everyone inflated the numbers and usage to justify keeping their license.
It's perfectly possible. Two tables, one stores answer responses only, the other just marks off who has responded. No link between them and you have anonymous data but can tell who hasn't responded.
Of course if you record created/updated timestamps on both, insert both records in the same order, accidently record the user code in the response data, take backups in between responses, have identifying questions or just don't have that many people responding it's easy/not hard to reverse engineer.
But it's quite possible to do right, I did it quite effectively almost by mistake years ago. Sent a customer survey out with generated codes as identifiers recorded with answers. Before sending reminder emails a script grabbed the codes, marked the customer as responded and wiped the code (so I could just get future responses where code was not null to mark next people off). Although I had timestamps the script meant customers were updated in blocks, there really wasn't any data to link them.
I know because the Boss was not happy he couldn't find out which customer had said what, and I had to point out all the communication (with customers and me) called it an anonymous survey, so why would I have saved them?
So it is possible, just not easy even if you intend it, and it's often not intentional...
If the participant has to trust the survey creator, then it is not anonymous. The survey creator can link the data.
If the survey creator has to trust the participant, the survey is anonymous. The participant can lie in the survey, lie about participating, or submit the survey multiple times.
Your example was not anonymous. But you did not break the participant's trust, thank you! (Or maybe you are lying.)
Anonymous example:
Sending a clean link to people to take the survey.
If not enough answers have been received, a reminder can be sent to all, with a clause, that says: "if you have already done it, you can ignore the reminder."
Never expect anonymous voting/quizz/whatever to be fully anonymous in big corporations, if its something about touchy topics and/or can affect employment/performance of given person results will be skewed. If metric becomes the target it ceases to be a good metric and all that.
It all rest on the shoulders of responsible manager(s) on how moral they are. Many are not.
It wasn't, and it was visibly updating while people were submitting their answers. I just rounded it as I don't remember the exact number at the time they closed the submission.
Could still be faked ofc, but I don't think they did.
> 75% of people answering said they're using it daily and considered it an essential tool they need to work
> (The target demographic was middle management, product owners etc)
This leaves a fairly wide set of options for what "essential" entails.
Do 75% of middle management and product owners actually need AI for their job? Seems unlikely.
Do 75% of middle management and product owners use AI to slop up emails, meeting "summaries", and reports? That's quite possible. Would they declare it to be an "essential tool"? One imagines they are not too fond of actually doing meaningful work.
It's quite easy to get high percentages like this when the AI is involved in make-work and the costs are low if not zero. The moment inference costs go up, most of this usage will evaporate.
The comment you responded to did not have those.
Fwiw, the article we're commenting on was likely not LLM written. The sentence structure is too convoluted, no LLM would've generated it like that - unless very carefully prompted ... But at that point it's no longer pure AI slop (imo).
reply