Advertisers buy from data brokers, not necessarily directly from Meta or Discord. Meta and Google act as data brokers themselves, but they also sell to other data brokers. Those data brokers, will definitely scrape your posts themselves, if they can’t buy them, or the derived data, directly.
Lemmy, and the Fediverse, has multiple instances that federate and get handed out copies of what we post. We don’t really know what’s going on at each and every instance, and there’s no way of knowing.
(don't do this)
If I was a data broker wanting to siphon data from the Fediverse, I’d set up several instances with fake communities and fake users, federate with the different shards of the Fediverse, have the fake users subscribe to as many feeds as possible (easier to do on Lemmy/Kbin than on Mastodon), create accounts on some of the larger instances to get the “Local” feed, and just wait for the data to arrive. It would miss some of the posts, mostly from smaller less federated non-Lemmy instances, but I’m guessing close to 99% could be siphoned with relatively little effort, and for cheaper than buying the data from any single instance. Scraping historical data is extra easy with instances returning some JSON and having clients parse it, be it in JS or in apps. Deleted messages can be either gathered with the custom instance setup, or retrieved from instances that didn’t honor the delete action (there still are some out there).
Advertisers buy from data brokers, not necessarily directly from Meta or Discord. Meta and Google act as data brokers themselves, but they also sell to other data brokers. Those data brokers, will definitely scrape your posts themselves, if they can’t buy them, or the derived data, directly.
Lemmy, and the Fediverse, has multiple instances that federate and get handed out copies of what we post. We don’t really know what’s going on at each and every instance, and there’s no way of knowing.
(don't do this)
If I was a data broker wanting to siphon data from the Fediverse, I’d set up several instances with fake communities and fake users, federate with the different shards of the Fediverse, have the fake users subscribe to as many feeds as possible (easier to do on Lemmy/Kbin than on Mastodon), create accounts on some of the larger instances to get the “Local” feed, and just wait for the data to arrive. It would miss some of the posts, mostly from smaller less federated non-Lemmy instances, but I’m guessing close to 99% could be siphoned with relatively little effort, and for cheaper than buying the data from any single instance. Scraping historical data is extra easy with instances returning some JSON and having clients parse it, be it in JS or in apps. Deleted messages can be either gathered with the custom instance setup, or retrieved from instances that didn’t honor the delete action (there still are some out there).