There's so many layers to it but the short version is yes.
For example: You could ban em dash tokens entirely, but there are places like dialogue where you want them. You can write a sampler that only allows em dashes between quotation marks.
That's a highly contrived example because em dashes are useful in other places, but samplers in general can be as complex as your performance goals will allow (they are on the hot path for token generation)
Swapping samplers could be a thing, but you need more than that in the end. Even the idea of the model accepting loosely worded prompts for writing is a bit shakey: I see a lot of gains by breaking down the writing task into very specifc well-defined parts during post-training.
It's ok to let an LLM go from loose prompts to that format for UX, but during training you'll do a lot better than trying to learn on every way someone can ask for a piece of writing
For example: You could ban em dash tokens entirely, but there are places like dialogue where you want them. You can write a sampler that only allows em dashes between quotation marks.
That's a highly contrived example because em dashes are useful in other places, but samplers in general can be as complex as your performance goals will allow (they are on the hot path for token generation)
Swapping samplers could be a thing, but you need more than that in the end. Even the idea of the model accepting loosely worded prompts for writing is a bit shakey: I see a lot of gains by breaking down the writing task into very specifc well-defined parts during post-training.
It's ok to let an LLM go from loose prompts to that format for UX, but during training you'll do a lot better than trying to learn on every way someone can ask for a piece of writing