At the end of the day, whatever the source file looks like, you’re turning it into an analog signal in order to make the speakers move. So how is it at all effective?
You can even send a digital stream of the bits over Bluetooth to speakers. So it seems like the ability to decode it is a fundamental requirement for it to be a useful file. So why even bother? It seems like it would be trivial to copy into a different format.
What you describe is called the “analog loophole”. At some point, your media needs to be converted into soundwaves and images. You cannot stop someone from recording those soundwaves and images. But you can make it inconvenient. At the end of the day, it’s a numbers game. A lot of people don’t want to spend hours of their time recording, cutting and labelling music, and are okay with paying a bit of money to get the end result, nicely packaged. The people who are willing to spend the time and effort to copy it are likely those that would not have paid anyway, since they are likely broke students, idealists and the like.