Proof that bots are manipulating content

HTTP_404_NotFound@lemmyonline.com · edit-2 1 year ago

Proof that bots are manipulating content

Tugg@lemmyverse.org · edit-2 1 year ago

I dont have much to add other than I am an experienced admin and was dismayed at how vulnerable Lemmy is. Having an option to have open registrations with no checks is not great. No serious platform would allow that.

I dont know of a bulletproof way to weed put the bad actors, but a voting system that Lemmy can leverage, with a minimum reputation in order to stay federated might work. This would require some changes that I’m not sure the devs can or would make. Without any protection in place, people will get frustrated and abandon Lemmy. I would.

Martineski@lemmy.fmhy.ml · edit-2 1 year ago

When I made a post saying that 90% (now ~95%) of accounts on lemmy are bots the amount of people saying that there’s no proof and/or saying to me that there’s a lot of people joining from reddit right now was astonishing.

Edit: one person said me that noone would make 1.6mln bots when there are only 150k-200k users on the platform, like WTF.

flambonkscious@sh.itjust.works · 1 year ago

Another thing is people are likely pre-creating bot accounts and then sitting in them in case additional protections are created…

The problem is, these accounts look to us just like any new user, lurking around getting a feel for the place - there’s no way to distinguish them until they start this bots acting in some fashion

bren42069@thelemmy.club · 1 year ago

that’s a problem with democracy itself as a concept

Ataraxia@lemmy.world · 1 year ago

Lol well it was fun while it lasted! Man there are some really greedy assholes out there.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

Well- I have not seen much evidence that supports this is actively being used… yet.

Just- bringing more attention to how easy it is to do.

Rottcodd@kbin.social · 1 year ago

The place feels different today than it did just a couple of days ago, and it positively reeks of bots.

I’m seeing far fewer original posts and far more links to karma-farmer quality pabulum, all of which pretty much instantly somehow get hundreds of upvotes.

The bots are here. And they’re circlejerking.

HTTP_404_NotFound@lemmyonline.com · edit-2 1 year ago

Yup. And, I would bet money, it will get progressively worse, unless steps are taken to prevent it.

towerful@beehaw.org · 1 year ago

Theres some that aren’t just money.
There are bots that mirror content from Reddit, just linking to them.
I’ve seen posts that are 3 or 4 crossposts (between community/instances) deep.

I want content.
I don’t want bot content

HTTP_404_NotFound@lemmyonline.com · 1 year ago

Give it a week or two, and you will start to see the emergence of tools to assist with combating these issues.

I am working on trying to build a GUI for one project to help combat spam.

There is also lemmy_helper And- its only a short matter of time before we gain access to much more powerful tools to help.

yesdogishere@kbin.social · 1 year ago

how about going through the 4chan approach of nobody cares, everybody spams whatever they like? then the corpos can wallow in their own poo?

HTTP_404_NotFound@lemmyonline.com · edit-2 1 year ago

What, corrective courses of action shall we seek?

(Tagging large instance owners)

@ruud@lemmy.world (lemmy.world)
@nutomic@lemmy.ml (lemmy.ml)
@TheDude@sh.itjust.works (sh.itjust.works)
@db0@lemmy.dbzer0.com (dbzer0)

I sent messages to these users, notifying them to come to this thread.

https://startrek.website/u/ValueSubtracted (startek.website)

They were able to get back with me- and provided this comment:

Thank you - we increased our security and attempted to purge our bots three days ago - if further suspicious activity is detected, we want to hear about it.

https://oceanbreeze.earth/u/windocean (oceanbreeze.earth)
https://normalcity.life/u/EuphoricPenguin22 (normalcity.life)

User returned this comment to me:

We just banned and subsequently deleted well over 2500+ of these accounts. We’ve just switched to closed registration as well.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 1 year ago

Just wanted to point out that according to your stats, unless I don’t understand them well, only 26 bots come from lemmy.world (which has open sign-ups, and uses the “easy to break” (/s) captcha) and 16 from lemmy.ml (which doesn’t have open sign-ups and relies on manual approvals).

For some perspective, lemmy.world has almost 48k users right now. Speaking of “corrective action” is a bit of a stretch IMO.

HTTP_404_NotFound@lemmyonline.com · edit-2 1 year ago

This post isn’t about lemmy.world, nor am I blaming lemmy.world!

I am trying to drag in the admins of the big instances, to come up with a collective plan to address this issue.

There isn’t a single instance causing this problems. The bots are distributed amongst normal users, in normal instances.

WIth- the exception of a instance or two with nothing but bot traffic.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 1 year ago

I’m just saying that context and scale matter. If an anti-spam solution is 99% effective, then chances are that on an instance with 100k users you are still going to have around 1k bots that have bypassed it.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

Your right- But, the problem is-

At a fediverse-level, we don’t really have ANY spam prevention currently.

Lets assume, at an instance level, all admins do their part, enable applicant approvals, enable captchas, email verification, and EVERY TOOL they have at their disposal.

There is NOTHING stopping these bots from just creating new instances, and using those.

Keep focused on the problem- the problem, is platform-wide lack of the ability to prevent bots.

I don’t agree with the beehaw approach, of bulk-defederation, as such, a better solution is needed.

fubo@lemmy.world · 1 year ago

Some older federated services, like IRC, had to drop open federation early in their history to prevent abusive instances from cropping up constantly, and instead became multiple different federations with different policies.

That’s one way this service might develop. Not necessarily, but it’s gotta be on the table.

o_o@programming.dev · 1 year ago

There is NOTHING stopping these bots from just creating new instances, and using those.

I read somewhere that mastodon prevents this by requiring a real domain to federate with. This would make it costly for bots to spin up their own instances in bulk. This solution could be expanded to require domains of a certain “status” to allow federation. For example, newly created domains might be blacklisted by default.

Mutelogic@sh.itjust.works · 1 year ago

It looks like the OP is responsible for the upvote bots (inferred from his edit?). Maybe to prove the original point?

Martineski@lemmy.fmhy.ml · 1 year ago

You may also want to block lemmit.online

HTTP_404_NotFound@lemmyonline.com · 1 year ago

Eh- its not really a spam instance.

They are very straightforward with what their instance does- It crossposts reddit to lemmy, in that instance’s communities.

In that case, its as simple as don’t subscribe to it. Don’t subscribe, and it won’t popup on your feed.

Martineski@lemmy.fmhy.ml · edit-2 1 year ago

Comments under this post describe the problems with something like that pretty well.

https://lemmy.fmhy.ml/comment/378514

Martineski@lemmy.fmhy.ml · 1 year ago

Yeah, but the problem is that you don’t have to subscribe yourself, once someone else from your instance interacts with communities from that instance it will flood the “new” feed on your instance making this feed useless.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

My viewpoint-

If the users of my instance want to view reddit data redistributed to lemmy- that is their choice.

A plus side- lemmy allows you to set the defaults to only show subscribed content too.

Martineski@lemmy.fmhy.ml · edit-2 1 year ago

I guess some people may like those posts but it’s just mindless posting dependant on reddit and posting on those bot instances will get you buried by the rest of post made by bots. I don’t see how using bots for posting stuff would help to build an active community but if people really need all of the posts regardless of quality from some subreddits then it’s fine.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

I am in agreeance with you, regarding the usefulness of the posts. However- I am looking at it from an administrative perspective.

Going back to my stance- I do not limit the content my users wish to see, UNLESS, it involves illegal, or extremist/hateful content.

It’s not my cup of tea- but, I am also running an instance for people who may share different viewpoints, and I do not wish to limit what they are able to do.

Martineski@lemmy.fmhy.ml · 1 year ago

Fair stance

Martineski@lemmy.fmhy.ml · 1 year ago

Have a nice day/night, I’m going to sleep now.

csm10495@sh.itjust.works · 1 year ago

I hope you mean a user can block it if they don’t want it.

Generally though: I don’t understand this logic. Like I want content, I subscribe over there to pull some content from reddit. Not all bots are bad.

It’s kind of weird how the fediverse kind of seems like a bubble of anti bot, anti big companies and constant self-political squabbles.

Martineski@lemmy.fmhy.ml · edit-2 1 year ago

Yeah, moving some content is fine but posts on this instance are straight up spam IMO. There’s no quality to the content.

csm10495@sh.itjust.works · 1 year ago

For clarity: When you say ‘this’ … which instance are you referring to?

Martineski@lemmy.fmhy.ml · 1 year ago

Lemmit ofc

csm10495@sh.itjust.works · 1 year ago

I don’t understand. That server is mostly just reddit cross posting. What spam are you talking about? Like I’m genuinely confused what your definition of spam is here. To me its content that I enjoy.

If you don’t like it: then block the bot account that posts it. I would not at all recommend defederation or anything like that with it.

Martineski@lemmy.fmhy.ml · 1 year ago

Like I said, the content is not quality controled, it reposts posts made by users on reddit so op won’t respond to you, there’s sonmuch content pumped out at once everywhere that there’s no point in engaging in those communities because noone will respond to you on topic. Another problem is that once someone interacts with some of the communities on the instance the posts will flood your “all” feed worsening it’s qualiy significantly.

Fedora@lemmy.haigner.me · edit-2 1 year ago

Hiring kids in africa and india to create accounts for 2 cents an hour.

Heads up that this depends on the operation size. Captchas are a solved problem. Commercial software exists that can solve Captchas automatically. You migrate from pay on demand services to computer vision software when it’s financially beneficial.

Computers are cheaper and better at solving Captchas than humans atm, and it doesn’t look like that’s going to change any time soon. As long as you pay attention to your proxies, it’s rare to see solution attempts fail. Some pay on demand services no longer employ people.

can@sh.itjust.works · 1 year ago

Computers are cheaper and better at solving Captchas than humans atm

This is hilarious

Hizeh@hizeh.com · 1 year ago

Hilarious and true

bren42069@thelemmy.club · 1 year ago

the problem is that activity pub is dumb and bad

HTTP_404_NotFound@lemmyonline.com · 1 year ago

This, isn’t a problem specific to activity pub, lemmy, or any individual platform in general.

Reddit faces this problem every day. Facebook faces this problem. Twitter faces this problem.

They all do.

And, each platform has to determine the best method for that platform to deal with this issue.

o_o@programming.dev · edit-2 1 year ago

Honestly, I’m interested to see how the federation handles this problem. Thank you for all the attention you’re bringing to it.

My fear is that we might overcorrect by becoming too defederation-happy, which is a fear it seems that you share. However I disagree with your assertion that the federation model is more risky than conventional Reddit-like models. Instance owners have just as many tools (more, in fact) as Reddit does to combat bots on their instance. Plus we have the nuke-from-orbit defederation option.

Since it seems like most of these bots are coming from established instances (rather than spoofing their own), I agree with you that the right approach seems to be for instance mods to maintain stricter signups (captcha, email verification, application, or other original methods). My hope is that federation will naturally lead to a “survival of the fittest” where more bot-ridden instances will copy the methods of the less bot-ridden instances.

I think an instance should only consider defederation if it’s already being plagued by bot interference from a particular instance. I don’t think defederation should be a pre-emptive action.

Lvxferre@lemmy.ml · 1 year ago

Honestly, I’m interested to see how the federation handles this problem.

Ditto. Perhaps we’re going to see a new solution for an old problem.

RoundSparrow@lemmy.ml · 1 year ago

There is no built-in “real-time” methods for admins via the UI to identify suspicious activity from their users, I am only able to fetch this data directly from the database. I don’t think it is even exposed through the rest api.

The people doing the development seem to have zero concern that their all the major servers are crashing with nginx 500 errors on their front page under routine moderate loads, nothing close to a major website. There is no concern to alert operators of internal federation failures, etc.

I am only able to fetch this data directly from the database.

I too had to resort to this, and published an open source tool - primitive and non-elegant, to try and get something out there for server operators: !lemmy_helper@lemmy.ml

HTTP_404_NotFound@lemmyonline.com · 1 year ago

Thanks, I’ll take a look at that one.

RoundSparrow@lemmy.ml · 1 year ago

I you have SQL statements to share, please do. Ill toss them into the app.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

I believe you already saw my post yesterday, for auditing comments, voting history, and post history, right?

bikesarethefuture@kbin.social · 1 year ago

If Twitter can’t avoid bots how will the fediverse avoid it, using some captcha maybe?

I_Miss_Daniel@kbin.social · 1 year ago

The only thing I can think of, which would probably be wildly unpopular, is ID checking.

Or perhaps SMS based 2FA on each account, which needs to be reconfirmed monthly?

Perhaps also rate limiting per account.

delendum@lemdit.com · 1 year ago

If you always had e-mail verification turned on then you can get rid of some of these junk sign-ups relatively easy, I wrote a guide for it here: https://lemdit.com/post/16430

From what I’ve seen, most of the bot sign-ups that are swelling instance User numbers wouldn’t have passed e-mail verification. I think it was done mostly to prove a point, rather than an attempt to actually use those accounts.

Instances that didn’t have e-mail verification turned on are in a much harder spot.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

I have a kubernetes cronjob, which automatically cleans those up every few days.

Along- with one that cleans up the activity table.

AmbientChaos@sh.itjust.works · 1 year ago

Hey look, it’s me in the picture! What a waste of my 15 minutes of fame

𝒍𝒆𝒎𝒂𝒏𝒏@lemmy.one · 1 year ago

This is troubling.

At least we have the data though, hopefully these findings are useful for updating the Fediseer/Overseer so we can more easily detect bots

HTTP_404_NotFound@lemmyonline.com · 1 year ago

I really wish we would have a good data scientist, or ML individual jump in this thread.

I can easily dig through data, I can easily dig through code- but, someone who could perform intelligent anomaly detection would be a god-send right now.

monobot@lemmy.ml · 1 year ago

There are data scientist around and we are monitoring where this goes.

Bigest problem I currently see is how to effectively share data but preserve privacy. Can this be solved without sharing emails and ip addresses or would that be necessary? Maybe securely hashing emails and ip addresses is enough, but that would hide some important data.

Should that be shared only with trusted users?

Can we create dataset where humans would identify bots and than share with larger community (like kaggle), to help us with ideas.

There are options and will be built, just jt can not happen in few days. People are working non stop to fix (currently) more important issues.

Be patient, collect the data and let’s work on solution.

And let’s be nice to each others, we all have similar goals here.

𝘋𝘪𝘳𝘬@lemmy.ml · 1 year ago

We need browser fingerprinting for this.

Cinner@kbin.social · 1 year ago

No.

Fingerprinting is against the goals of Lemmy and privacy. Lemmy should be for the good of people.

If anything there should be SOME centralization that allows other (known, somehow verified) instances to vote to allow/disallow spammy instances. In some way that couldn’t be abused. This may lead to a fork down the road (think BTC vs BCH) due to community disagreements but I don’t really see any other way this doesn’t become an absolute spamfest. As it stands now one server admin could spamfest their own server with their own spam, and once it starts federating EVERYONE gets flooded. This also easily creates a DoS of the system.

Asking instance admins to require CAPTCHA or whatever to defeat spam doesn’t work when the instance admins are the ones creating spam servers to spam the federation.

HTTP_404_NotFound@lemmyonline.com · 1 year ago

If anything there should be SOME centralization that allows other (known, somehow verified) instances to vote to allow/disallow spammy instances

We are working on this currently. Stay tuned.

Lemmy and privacy.

I would be careful using both of those words in the same sentence. They ONLY private thing on this entire platform, is your email address, and your IP. If you post, comment, or vote on a public instance- that data is sent to every other subscribing instances.

That being said- unless you volunteer information to lemmy, it doesn’t know who you are.

That also being said- I am against letting google handling data collection for lemmy.

db0@lemmy.dbzer0.com · edit-2 1 year ago

I noticellot of instances which were flooded with bots due to the open registration. I have most of them *defederated for this reason.

HTTP_404_NotFound@lemmyonline.com · edit-2 1 year ago

We need a better solution for this, rather then mass-bulk defederation.

In my opinion- that is going to greatly slowdown the spread and influence of this platform. Also IMO- I think these bots are purposely TRYING to get instances to defederate from each other.

Meta is pushing its “fediverse” thing. Reddit, is trying to squash the fediverse. Honestly, it makes perfect sense that we have bots trying to upvote the idea of getting instances to defederate each other.

Once- everything is defederated- lots of communities will start to fall apart.

towerful@beehaw.org · 1 year ago

Is this finally an application for a Blockchain?
Some sort of decentralised registry of instance reputation?

HTTP_404_NotFound@lemmyonline.com · 1 year ago

Well- we have a centralized registry of instance reputation being worked on and developed right now.

towerful@beehaw.org · 1 year ago

Which is awesome.
I actually have no idea where Blockchain tech could exist.
A reputation could be an excellent example. But if it can be manipulated or gamed, it kinda makes it pointless.
At which point a centralised registry makes sense.
As long as the central registrar can be trusted.
But I don’t think Blockchain solves that point of trust.

So, once again, turns out Blockchain tech is pretty useless.

HTTP_404_NotFound@lemmyonline.com · edit-2 1 year ago

The blockchain would just add the ability to verify somebody said, what it says they said.

Ie- if I say, hey, towerful is a great person. A blockchain could be leverage to ensure that that was said by me.

It does have a use- but, there is a big price to pay for using it, in terms of complexity, performance, and sized used.

In this case, I would call it unnecessary overhead, unless we determine there is foul play occuring at the point of centralization.

Edit- Although, it is still possible for users to sign messages, and still use a centralized location. That gives the best of both worlds, without the needless added complexity.

CAPSLOCKFTW@lemmy.ml · 1 year ago

deleted by creator

HTTP_404_NotFound@lemmyonline.com · 1 year ago

We are working on it. :-)

https://fediseer.com/

And, associated (work in progress) GUI. https://fediseer.kube.xtremeownage.com/instances/whitelisted

(EXTREMELY… work in progress. Literally just got that set up a few hours ago… But, will continue to push changes, until we have something that is usable.)

Proof that bots are manipulating content

Proof that bots are manipulating content

Important Note

The REAL problem

What can happen if we don’t identify a solution.

Edits

What, corrective courses of action shall we seek?

I sent messages to these users, notifying them to come to this thread.