We've seen the US sanction the ICC, they have the Cloud Act and the Patriot Act. The US has shown both a willingness and a capability to weaponize your tech against you.
It should profoundly worry the world that three companies — whose heads all have frequent dinners at the White House — control virtually every phone, tablet, and computer in the world. If you expand that to data centres and clouds, email addresses, services and software it's far worse.
It should be considered a matter of national defence for basically all nations to ensure digital sovereignty.
(It doesn't matter who is in the White House, my point is it's a massive security nightmare to give this much control to one group)
I’ve done this in pure Python for a long time. Single file prototype that can mostly function from the command line. The process helps me understand all the sub problems and how they relate to each other. Best example is when you realize behaviors X, Y, and Z have so much in common that it makes sense to have a single component that takes a parameter to specify which behavior to perform.
It’s possible that already practicing this is why I feel slightly “meh” compared to others regarding GenAI.
Yes, I’ve been working on this and you need a clear semantic layer.
If there are multiple paths or perceived paths to an answer, you’ll get two answers. Plus, LLMs like to create pointless “xyz_index” metrics that are not standard, clear, or useful. Yet i see users just go “that sounds right” and run with it.
There are some days where it acts staggeringly bad, beyond baselines.
But it’s impossible to actually determine if it’s model variance, polluted context (if I scold it, is it now closer in latent space to a bad worker, and performs worse?), system prompt and tool changes, fine tunes and AB tests, variances in top P selection…
There’s too many variables and no hard evidence shared by Anthropic.
As a long time DS I sadly feel we filled the field with people who don’t do any actual data science or engineering. A lot of it is glorified BI users who at most pull some averages and run half baked AB tests.
I don’t think the field will go away with AI, frankly with LLMs I’ve automated that bottom 80% of queries I used to have to do for other users and now I just focus on actual hard problems.
That “build a self serve dashboard” or number fetching is now an agentic tool I built.
But the real meat of “my business specializes in X, we need models to do this well” has not yet been replaceable. I think most hard DS work is internal so isn’t in training sets (yet).
A dataframe API allows you to write code in Python, with native syntax highlighting and your LSP can complete it, in one analysis file. Inlined SQL is not as nice, and has weird ergonomics.
UDFs in most dataframe libraries tend to feel better than writing udfs for a sql engine as well.
Polars specifically has lazy mode which enables a query optimizer, so you get predicate push down and all the goodies if SQL, with extra control/primitives (sane pivoting, group_by_dynamic, etc)
I do use ibis on top of duckdb sometimes, but the UDF situation persists and the way they organize their docs is very difficult to use.
Map is one operation pandas does nicely that most other “wrap a fast language” dataframe tools do poorly.
When it feels like you’re writing some external udf thats executed in another environment, it does not feel as nice as throwing in a lambda, even if the lambda is not ideal.
Personally I find it extremely rare that I need to do this given Polars expressions are so comprehensive, including when.then.otherwise when all else fails.
That one has a bit more friction than pandas because the return schema requirement -- pandas let's you get away with this bad practice.
It also does batches when you declare scalar outputs, but you can't control the batch size, which usually isn't an issue, but I've run into situations where it is.
We've seen the US sanction the ICC, they have the Cloud Act and the Patriot Act. The US has shown both a willingness and a capability to weaponize your tech against you.
It should profoundly worry the world that three companies — whose heads all have frequent dinners at the White House — control virtually every phone, tablet, and computer in the world. If you expand that to data centres and clouds, email addresses, services and software it's far worse.
It should be considered a matter of national defence for basically all nations to ensure digital sovereignty.
(It doesn't matter who is in the White House, my point is it's a massive security nightmare to give this much control to one group)
reply