User talk:Wabbajak


Looks like you had a great first run. You should probably ask your owner to get you bot rights should you ever develop abilities beyond moving files. One minor tip: slow down a bit, will ya? The poor wiki servers get a little perturbed when you edit at high speed. Apparently, the humans want a chance to do things too. HotnBOThered (talk) 09:53, 27 January 2013 (GMT)

This human supports the recommendation to slow down and insists that the creator clean up the mess on the RC. I'm not patrolling that! Snowmane(talkemail) 19:49, 27 January 2013 (GMT)
Creator here. I've already asked Daveh to give Wabbajak bot rights (since you have to be a bureaucrat to give out that right). Once he has those rights, he can tag edits as bot edits and suppress the redirect, saving the RecentChanges from his wrath. And HotnBOThered, ask your owner (unless you know) how much delay I should put in between edits (specifically, the time between receiving confirmation that the previous edit was successful and sending the new edit). Would one second be adequate, or is 5 seconds better? • JAT 20:24, 27 January 2013 (GMT)
For getting bot permissions, you're supposed to ask the community first, per What should a bot do and what shouldn't a bot do? then Dave can approve the request. For number of edits/reads, Wikipedia recommends that you do no more than 10/minute. That said, we are not Wikipedia and don't have dozens of bots running. I generally allow read-only to run at 20 or 30 reads per minute, depending on time of day, but write requests I always limit to 10/minute. I've also got some primitive logic in there to allow it to slow down to as little as 1/minute if it starts detecting lag. Robin Hood  (talk) 22:49, 27 January 2013 (GMT)
My problem with the proposed guideline of ours is, that it tries to impose unrealistic high standards (one edit every 10 seconds, "Before even being considered by the community, the bot must have been thoroughly tested beforehand, either offline, or on a private website"). We could need some guideline what to do when running a new bot, but that page seems to be designed to make sure we have no bots at all. So it is in some sense worse than no guideline, because it is constantly ignored anyway. But some form of community feedback may be good anyway. I'm a bit worried that bots on UESP have been in the past often been a tool for an editor to do many edits fast, without being noticed or any discussion about widespread changes. HotnBOThered is to be recommended for his task log here - makes it more transparent. --Alfwyn (talk) 23:24, 27 January 2013 (GMT)

() I didn't realize that there was a nomination process. Well, it does about half of those things, and I can easily add most of the other things. However, I have a few questions.

  1. How do I have it slow down or pause if it detects that the server is busy? What would that type of query (or error message from a standard query) even look like? I can easily code it, but I have to know what to code.
  2. Currently this bot's task is to update image links, so in some instances it has to edit User, User talk, and Template pages - there's simply no way around it. How should I adhere to this rule, if at all?
  3. If I'm supposed to have at most a one-edit delay after the talk page is edited, then that means I'll have to run a query every single edit, which nearly doubles the operating time (and significantly increases the server load). Is this necessary? I can have it check every other edit instead, which will have a much lesser impact.
  4. Does limiting the speed of the bot include the time it takes for the edit/query to be returned? For instance, you said that you like to limit your writes to 10/min. As it is, it takes about 5 seconds to actually edit the page (send the URL and get a response), due to my slow connection. Should I wait an additional 5 seconds before firing off the next edit?

I understand that most of these guidelines are for limiting the load on the servers. Looking at the wiki's monitor, my bot run last night (which included ~240 queries, ~120 file moves, 10 more queries (to get the article text), 10 page edits, and then ~120 more edits to mark the redirect for deletion) had no significant impact on the servers' CPU load, and this was at full speed. I'm not sure how necessary other speed limitations are, considering that my network speed is doing most of the throttling. If you feel that they are necessary, then I can implement them, but I'm not sure how much they are needed.

I'd like to work these details out now, rather than during the bot's nomination, so that way other people aren't inundated by technical jargon. • JAT 00:26, 28 January 2013 (GMT)

I think the nomination process is fairly loose, given that we've only ever had three bots prior to yours in the history of the wiki. It's mostly to cover off why you feel we need it and what you intend to do with it. (And don't take that the wrong way, I have no qualms about having another bot around. Rpeh and I were the prime example of why that's a good thing: we were both working from the same framework, but our interests and how we structured our bots gave each of us clear advantages in certain areas over the other.) As for the requirements, keep in mind that that is a proposed policy, and has never really been formalized, so those requirements may not be appropriate in all situations, as you point out. If I recall, most of them came from NepheleBot, which was almost exclusively intended to get database info onto the wiki, so some limitations were designed with that in mind.
  1. The way I'm detecting lag is by timing responses between edits/reads, and adding those times to a weighted average (weighted in favour of the most recent job). As long as the response time is below the requests/minute threshold, it goes ahead and does another job after the minimum time has elapsed. If it detects that it's taken longer than that, it increases the delay time before the next job. That sounds more complicated than it is. :) If you still have my code, and you can understand it, have a look at @BaseTask.cs, specifically, PreInvoke() and PostInvoke(). There have been some minor bug fixes to prevent inordinate/infinite waiting times under certain circumstances, but the basic logic hasn't changed since I sent it to you. In case you're doing your own research, you may come across mention of a "maxlag" parameter. That, unfortunately, does nothing in our setup, so don't try to use that.
  2. In cases where it has to edit those pages, I generally make a note of it in my log (though there have been exceptions/errors before, but generally speaking, I try). I'm not sure if that's entirely necessary, but at least it shows that you've thought of it beforehand and are deliberately letting the bot edit those pages. Truthfully, I'm not thrilled with my method of filtering them out, or not doing so, and it's something that I want to reimplement at some point. In an ideal world, though, the bot should automatically exclude those pages unless told to override that restriction. Also, you should always question whether or not editing those namespaces is truly necessary. Generally speaking, there aren't a lot of edits to do in those spaces, and sometimes it's better to do them by hand rather than letting the bot do them. In cases like images, yeah, it probably makes sense to edit user space, especially if there are a lot of changes. Template space is something I feel bots should pretty much never touch...too much potential for things to go wrong in a hurry. Imagine a bot making a mistake on some of our largest templates...the results could trash the wiki in a matter of a minute or two.
  3. While I haven't implemented it, rpeh was using the trick of having the bot check for the "you have new messages" header, which meant no extra reads were required. Unfortunately, that only works if you're using index.php. If you're using api.php to do your edits, you're stuck doing it as you suggest, where every edit is followed by reading the talk page. On the other hand, api.php generally send so much less extraneous information than index.php, the extra read may not even be noticeable. In some cases, you may even be able to add loading the talk page along with whatever page you're editing. The api does allow that sort of thing, it's just coding it that might be a challenge. Since the framework I'm using uses index.php by default, but also has api coding in it as well, I've settled for reading after every write, but there are no checks at any other times, like page loads and such. Like you, I've considered whether that's really an appropriate requirement at all, though. I think once every 4 or 5 edits ought to be plenty. The point is that the bot not do horrendous things to half the wiki in a matter of minutes, and an extra few edits won't usually make a big difference once way or the other.
  4. My take on it is that it's the server/database load that's the biggest concern, virtually all of which comes from the fact of the edit, not the time it takes to upload or download. So with a slow connection like yours, I'd only wait one extra second. (And again, if you look at my timing code, you'll see that's how I've constructed it.)
Server CPU load is a part of it, but also keep in mind that there are other factors, like database read/edit load, how much bandwidth you're hogging, giving yourself and others a chance to inspect the changes, and how much you have to undo if something goes wrong. Even if you just write something primitive as a start, like, say, delay 3 or 4 seconds between edits, between that and your connection speed, that ought to be sufficient. Keep in mind that sometimes, your connection is actually pretty good. Take a look at 06:49, 27 January 2013, for instance. If I'm counting correctly, it managed 41 edits in a minute.
Lastly, as Alfwyn says, it's generally a good idea to get community input, both for the bot itself as well as any major jobs it's going to be doing. Most of the jobs HotnBOThered has done without discussion are either repeatable jobs that have been discussed previously or minor, uncontroversial things. Virtually anything else it does is discussed on the wiki beforehand, or if it's something that's more along the lines of "major but uncontroversial", I'll post about it at the start of the bot's run. I think page moves to conform to site naming standards easily fall into the minor, uncontroversial category, but especially for its first run I would have mentioned it first, just so editors would know what was going on when a relatively unused account starts moving a lot of pages in a very short space of time.
Oh and lastly lastly :), going back to Alfwyn's point about testing, now that we have the development server, I'm thinking that us bot-writers should probably start doing more testing there before doing our actual runs on the main servers. Perhaps since we both have server access, we can even set up some on-demand method of updating the dev database from the main one or a recent backup so that we're testing on something reasonably up-to-date. Updating directly from the main database may put too much load on it, but I think a backup should be good enough 99% of the time, and places the load on content3 rather than the main servers. Robin Hood  (talk) 02:37, 28 January 2013 (GMT)
I just reverted a couple of speeds for image redirects that are still in use, those might need looking into. The speed tag makes the redirect not working, so it would be a nice addition if the bot would check for uses first - I know, yet another tweak request for the poor bot. --Alfwyn (talk) 13:43, 1 February 2013 (GMT)
While it seems like you're not around much anymore, in case you return, I thought I'd note that contrary to what I said earlier in this discussion, we actually do support the maxlag parameter now, if you wanted to use that to govern the bot's speed. I remember looking into it back when we were still on MW 1.16 and our particular setup wasn't recognized as being replicated back then. Under MW 1.19, it apparently is, so if the database is lagged more than 5 seconds and you add maxlag=5 to either an index.php or api.php request, you'll get an error and a suggested retry time in the response headers. You can read more about it here, if you're interested. Robin Hood  (talk) 19:42, 27 September 2014 (GMT)

More congrats!Edit

I just saw a post by Daveh that confirms Wabbajack now has official bot status. Congrats! Darictalk 13:33, 18 February 2013 (GMT)

Congratulations! Like me, you now have the ability to make edits nobody ever looks at and, largely, they never care about, either! HotnBOThered (Talk to Owner) • (Stop That!) 20:19, 18 February 2013 (GMT)
Another test... Wabbajak (talk) 02:32, 8 April 2014 (GMT)
Return to the user page of "Wabbajak".