welcome: please sign in
Page Locked

RobinTempleton

HCoop member since 2005; served as Secretary 2011--2018, 2022--present

Contact

Admin notes

Useful commands:

Admin changelog

Website performance (2026-02)

Dynamic websites (e.g., webmail) performed very poorly: a simple PHP application took seconds for a page load, and webmail was inaccessible due errors like "Could not load message from server". We already suspected that it was related to gitweb and poorly-behaved crawlers. (ClintonEbadi had previously blocked many UAs and IP ranges, but that didn't stop everyone.) I logged in to our main web server, ServerShelob, to investigate. ps aux and top confirmed that gitweb processes were responsible. ~hcoop/.logs/apache/shelob/git*.hcoop.net*/access.log showed that there were a lot of requests for blobdiffs (diffs of a file between two arbitrary commits), an expensive operation. How expensive? nproc shows the number of logical CPU cores. uptime shows the 1-, 5-, and 15-minute running load average; 1.0 = one core fully utilized. nproc: 8. uptime: 17:49:36 up 177 days, 22:48, 2 users, load average: 127.39, 61.01, 33.24. I think the CPU load was actually noticeable over SSH at some points. User hcoop runs the wiki, so I edited ~hcoop/.domtool/hcoop.net and added rewriteRule "/blobdiff/" "-" [forbidden]; to the publicVhost "git" block. (Translation: match any request matching the PCRE regexp /blobdiff/; - is an "identity" rewrite as we just want to apply the flag; and the forbidden flag causes Apache to send a 403 Forbidden response instead of serving the request normally.) I ran DOMTOOL_USER=hcoop domtool hcoop.net to apply the changes, and checked that blobdiff rule was being blocked with an URL from the access log. Now the load average is down to 7.29, 29.50, 46.29 and things are subjectively much better. (The load average spiked again, with a bot spamming URLs with a specific substring, which I also blocked, so this will be an ongoing process unless we block more of gitweb or apply some form of rate limiting. To keep an eye on the load average, I'm running watch uptime, which runs uptime every two seconds.)

Wiki spam cleanup (2026-02)

There's a new spam edit on RecentChanges. Spammers can no longer register, but they can still spam: Our wiki was once open-registration, and spammers still have registered accounts from back then. Searching for "moinmoin delete user" brought up MoinMoin:UserManagement, which links to both HelpOnUserHandling and HelpOnMoinCommand.

HelpOnUserHandling documents a web-based approach: a Moin superuser can "sudo" to another user's account -- in this case, a spammer's -- and then disable the account interactively. This isn't particularly practical with 19,968 accounts, though!

The page does mentions the option of simply deleting the user file in the Moin data directory, and mentions that a name2id cache file also needs to be deleted, and moin restarted if fastcgi. ps aux | grep wiki shows that we are, indeed, using FastCGI rather than old-school CGI. How do you restart it? No idea.

HelpOnMoinCommand documents the moin CLI interface. I thought it might have an account delete option, but it only has account disable. We don't want to disable the accounts; we want to take them from enabled to eviscerated. (The accounts, not the spammers. We're consulting our lawyers about whether spammers are protected under the Geneva Convention.) I decided to run moin account check --emailsunique as something without side effects. This showed that spammers often have multiple accounts associated with one email, which will prove useful.

Looking in ~hcoop, I saw an hcoop-wiki directory. (This is also discoverable by grepping for wiki in ~hcoop/.domtool/hcoop.net.) A bit of poking around revealed that the "user files" in question are flat files under hcoop-wiki/data/user. Safety first: tar cvf ~/backup-moinusers.tar . because one slip-up could destroy the user database!

I believe the file names are Unix timestamps suffixed with other data; in any case, they're not very meaningful. The format is a simple $key=$value list, including name and email fields. Let's start by getting a list of accounts and the relevant data:

lsmoin () {
  for f in *; do
    name=$(grep '^name=' $f | sed 's/.*=//')
    email=$(grep '^email=' $f | sed 's/.*=//')
    printf '%s\t%s\t%s\n' $f $name $email
done | sort -t$'\t' -k3
}
# This command takes ~15min to run
lsmoin > ~/moinusers.txt

I'm defining a function so I can easily regenerate the list. I now have a list of the format $filename TAB $name TAB $email. Now in Emacs, I can M-x keep-lines with a spammer's email, and...

This is already overcomplicated; am I going to mark lines with legitimate users? Mark spammers' lines and make deletion idempotent? No, let's keep it simple until there's some strategy for processing this stuff. grep email $(grep -l SomeSpammer *), rm all files containing their email address. (Manual sanity-check still required: what if a spammer signed up with a member's email address?) In total, this eliminates...one account. Oh well.

I now need to clear the name2id files. find . -name '*name2id*' finds a few files under cache, perhaps corresponding to different wiki namespaces? But what happens if I don't clear them? Visiting RecentChanges, the spam edit no longer has an associated username with no further action needed. (I guess I'd have just restarted Apache if needed?)

Now the interactive option will come in handy. (This is only possible for MoinMoin superusers.) I visit Settings -> Switch user, note the first spam account, and delete its file. Is it still in the (alphabetically-sorted) user list? No, that's all I have to do.

This is all a bit much for deleting two spammer accounts, but I'll have to return to this later.

Tasks

<robin> i will personally commit to making a tinkerable "hcoop-in-a-box" in 202[45], depending on the work situation [...]

*scratch*


CategoryHomepage

RobinTempleton (last edited 2026-02-26 04:30:06 by RobinTempleton)