This is an archive of past UESPWiki:Administrator Noticeboard discussions. Do not edit the contents of this page, except for maintenance such as updating links. |
Request for Server Access
I have emailed Daveh and asked him to allow me to have direct access to UESP's server so that I can help more with the technical side of keeping the server running. Daveh is open to the idea. I wanted to update the community to let everyone know what's in the works and also to see if anyone has any feedback.
First, although I am bringing this up on the wiki Administrator Noticeboard, it is not actually a request to change my wiki privileges, and in particular it is not a request to become a wiki bureaucrat. As I stated in a previous discussion, I think having Daveh make decisions about wiki user status, such as adding admins and patrollers, works well overall.
Rather, the issues that I believe have been more problematic for UESP are issues that can only be fixed with "behind the scenes" access, i.e., the ability to login directly to the computer hosting UESP. The primary motivation for the request is to make it possible to restart the site's apache server more easily (as a fix for one of the most common causes for site slowdowns). However, subject to Daveh's and the community's approval, there are other tasks I could help with on the server if I have access. Some ideas include: helping to upload non-wiki files such as the Oblivion/SI map tiles; or, tweaking the PHP code to fix problems (such as various broken special pages).
The decision about what is possible here is primarily up to Daveh. However, the community's feedback is also valuable in particular on questions such as:
- Does anyone have any objections or concerns?
- Are there any specific tasks that the community thinks should be a high priority for me to pursue?
Any thoughts? --NepheleTalk 15:49, 9 December 2007 (EST)
Followup
Thanks for the support :)
To clarify, as far as restarts are concerned, I'm only hoping to be able to restart apache (the service that provides web access). If I understand correctly, that's all that Daveh has had to do in most cases; I think he's only had to reboot the entire computer a few times. If any situations were to arise that required a full reboot, then I wouldn't be up for handling those myself: the chances are good that someone with full root access to the computer, and perhaps even physical access (not just remote access) to the machine, is needed.
Rpeh raises a good point about documenting/coordinating efforts. The Upgrade History page has been used so far to keep track of configuration and code changes. I'd be sure to continue Daveh's precedent of adding changes to that page. I'm not sure to what extent apache restarts need to be kept track of, especially since the server-status page always provides information on when apache was last restarted. But if there's interest in a longer record of the restart history, I could set up a page for a listing of restarts (including when/why).
Also, rpeh's point about the log files reminded me of another point: being able to login to the server would in general make it easier for me to diagnose some of the issues with the server. Not that I can guarantee that I'll be able to find fixes to the problems ;) But I think being able to view details on active processes and being able to monitor some of the log files would be useful for trying to figure out the root cause of problems, such as frozen connections. --NepheleTalk 20:51, 9 December 2007 (EST)
Feedback
- Support: I think this is a necessary step. At the moment, Daveh is effectively a single point of failure for various aspects of the site and that is Not Good. He obviously can't be available all the time but that has led to downtime that could have been obviated had another technical admin been available at the time. Of the administrators, Nephele is clearly the most qualified and has proved that several times through, for instance, nailing site problems, identifying add-ons and generally being a fount of knowledge and wisdom on the site. So on the first point, no I don't have any objections! On the second:
-
- Restarting the server. I don't know enough about Unix to say whether this should be limited to restarting the web service or whether it should be the whole machine, but this will enable the dead connections to be killed.
- Killing log files. The recent problems are possibly caused by a too-full log disk. Being able to empty that disk would stop such issues.
- Uploading Files. This would be a definite help - with the final tweaks to the SI map and then the MW map, having a 2nd pair of hands should speed things up.
- Config Tweaks. There have been several cases where Nephele has suggested tweaks to various aspects of wiki configuration - task priority and spam filter spring to mind. The ability to make such changes would be a definite benefit.
- The other things to look into would be installing add-ons and making changes to the wiki source code, although that's possibly something to go for as a second step.
- One thing I would suggest is that a page be created to record changes, reboots and so on. At the moment it's generally all held on Daveh's talk page but if two people are going to be doing this kind of thing, a (protected) page to record any changes would be better. Just so there's a list that can be referred back to. --Rpeh•T•C•E• 17:18, 9 December 2007 (EST)
- Support: This is something that has long been on the "really needs to be done" list. Of course, whoever gets this extra authority has to be hugely trusted -- and Nephele has clearly earned that. Technical competence is also a must, and again she fits the bill on that requirement. --Wrye 17:33, 9 December 2007 (EST)
- Support: I can't see why anyone would possibly object; Nephele has shown time and time again that she is completely trustworthy and capable. --GuildKnightTalk2me 20:26, 9 December 2007 (EST)
- Strong Support: I've said for over a year now that we need to have another person who's more active on the site with this level of access, and I can't think of anyone better suited for the task. --TheRealLurlock Talk 22:50, 9 December 2007 (EST)
- Support: The site sometimes gets too slow, especially on weekends. Just waiting for pages to load is very boring. Something definitely needs to be done about it. I think, this should be the highest priority for now. Of course, there is no reason not to trust Nephele. --Mankar Camoran•T•C•E• 14:33, 10 December 2007 (EST)
- Support: I have no problems with this whatsoever! It'll be a great help to the site and I can't think of a better person for the job. --Eshetalk14:50, 10 December 2007 (EST)
Update
As some of you may have noticed, Daveh gave me login access to the server on 4 January. At the moment I mainly just have the ability to restart apache (which was the original request). I finally had an opportunity to test it today, and it worked successfully :) So if anyone notices any problems such as frozen connections that can be fixed with a restart, either Daveh or I can be contacted to take care of it.
Next I'm thinking of identifying what specifically I could help with in terms of uploading and/or editing files. I have the ability to upload files, but only to my personal directory; for any other directories (e.g., all of the directories containing the website), I need to be given write access on a per-directory basis. To start with, the web directories where I'm aware of some potential for uploads are:
- the directory containing the alc_calc tool
- the directories containing the map tiles for the Oblivion map
- the directories containing the map tiles for the SI map
I've asked Daveh to look into write access for those directories, at which point I should be able to help out with some of the missing/to-be-updated tiles for the maps (and I might even get around some day to some upgrades to my alchemy calculator!).
As for other possible tasks....
- The log files turned out to be somewhat of a red herring, so I don't think log file cleanup is a particularly high priority.
- With the access I already have, it will make it easier for me to help out with suggestions about possible configuration tweaks. For example, I can now see what the existing wiki settings are instead of guessing from the default wiki configuration. And I can see the php files actually being used on the server, which will allow me to figure out what tweaks are necessary, for example, to fix some of our special pages. For now I think that's enough progress :) I'm more comfortable with continuing to forward suggested wiki-related tweaks to Daveh, and letting him be responsible for actual implementation... at least until I've had more time to familiarize myself with the behind-the-scenes universe.
Thanks everyone :) --NepheleTalk 17:17, 9 January 2008 (EST)
- Congratulations on your first restart! Is your personal directory web-accessible? In other words, can you put files in there and access them through the web site? The thought occurs that long-mooted projects like the NPC stats calculator become rather more feasible if you've got server access... –Rpeh•T•C•E• 05:04, 10 January 2008 (EST)
Semi-Protection of Main Pages
I've just semi-protected the main pages for those games that weren't already protected. The nonsense bots are back and they seem to like targeting pages like Redguard:Redguard and Lore:Tamriel so it seemed like a sensible change to me - especially as the main pages for some of the games were already SPed. I know it's generally considered a Bad Thing to protect too many pages but I'd say it's a Worse Thing to have some of the site's central pages bearing the marks of vandalism. If we're going to see a big resurgence of the nonsense bots we may need to look at protecting other important pages - at least temporarily. –Rpeh•T•C•E• 10:36, 8 January 2008 (EST)
- No complaints here - I personally thought all those pages were already protected, surprised to find that they weren't. (May have just been move-protection.) --TheRealLurlock Talk 10:51, 8 January 2008 (EST)
Special Characters
I notice that the edit window has these cute little JavaScript links to insert special characters into the text box. Very handy, of course, but may I put forth that this is ultimately a bad idea? I've never been an admin of a MediaWiki installation, but I've been a major contributor to two other MediaWikis (the Homestar Runner Wiki, and the Kingdom of Loathing Wiki), and both of them have seen firsthand the havoc that can be caused by putting non-ASCII characters directly into WikiCode.
The problem is subtle. This technique appears at first glance to be no problem, as long as the database, MW installation, and web server are configured appropriately. But I've seen the troll fat hit the fan when a Wiki is migrated to a different platform. Migrating from Windows to Linux, I believe, is a particular disaster. I couldn't tell you exactly what causes it, but what can happen is that all non-ASCII characters in page texts will be silently replaced with garbage or question marks. The cleanup of a spill like this, as you can imagine, is a nightmare, especially if your old installation is no longer accessible by the time you notice it.
On the other hand, non-ASCII characters that are entered as character references (e.g., é = é or é = é) never require the database, MW software, web server, migration script, or browser to understand any encodings but ASCII. Much more bulletproof. And they will still display just as nicely in the browser; possibly better, if the browser doesn't like the encodings that the server knows.
I recommend that a brief explanation and directive to use character references be added to the Format section of the UESPWiki:Style Guide. See the Standards page of the HRWiki for an example. And if possible, I strongly recommend that those quick-insert links be modified to insert the appropriate character references instead of real characters. That would make compliance with such guidelines much easier than it has been on other wikis. Thoughts? Does anyone know an argument against this, or reason to be positive it will never be an issue for this wiki? --TheNicestGuy 14:36, 10 January 2008 (EST)
- Well, for one thing I'm not sure it's POSSIBLE to have those little links display something different from what actually appears in the code. It's not really Javascript so much as an obscure, modified form of wiki markup. (Though it's not actual wiki markup, so the usual techniques won't work there. You can look at the code here: MediaWiki:Edittools (though only Admins can edit it.) You'll notice that the biggest grand-daddy wiki of them all, Wikipedia, has a similar feature on their edit page, see the code here. They include far more in the way of accented characters and such to account for the various subjects likely to be discussed there. But you'll notice their code for creating these characters is pretty much identical to ours. If it's not a problem for them, I don't see why it should be for us. As for possibly migrating to a Linux server - I think we already are on one, but you'd have to ask the experts. At any rate, I don't anticipate that we're going to move to another OS any time soon. A faster server, maybe, but I imagine we'll continue to run the same software for quite some time. At any rate, changing all this would be such a major task that it's almost not worth the effort. --TheRealLurlock Talk 15:00, 10 January 2008 (EST)
Advertising Update
As you may have noticed I've been playing a bit with different ad layouts recently. Unfortunately ad revenue has been dropping steadily over the past few months, from over 10$ to now approaching 1$ per day. This is not due to a decrease in site traffic or even a decrease in ad 'clicks' but solely from a large drop in average revenue per click. While I don't expect to get rich from ad revenue it would be nice if it could at least break even so I'll be playing with ad settings and see if anything makes a difference.
I'm still being cautious about where the ads will appear and as usual I'm open to comments/suggestions from anyone. I still would do not want ads to be too obtrusive. There probably would be a significant gain in placing ads at the top of the page or within articles which I won't do even if I can't break even with other layouts. I might be able to live with an ad unit on the top-left side but am worried that it would eat too much space on users with small monitors (I use large/widescreen monitors but 33% of visitors use a 1024x768 resolution size so the site has to work well at that resolution).
Another option is to try something other than Google Adsense but we'll see how the tweaking goes before considering that.
And just in case someone wonders...no, the site nor I are running out of money I'm just thinking longer term. With a site this size we really should be running on multiple/bigger servers (as you might tell by recent performance) and a more stable ad revenue would help.
PS: My home computer 'died' last night so it might be a few days until I'm back up reliably, depending on what the problem is. -- Daveh 13:25, 16 January 2008 (EST)
- Unless Google is dropping ad revenue across the board, I imagine that the problem is that ad clicks here are not resulting in end purchases -- hence the drop in revenue. If this is the case, it is likely an intrinsic limitation on the workability of ads on the site -- people who come here for information are very unlikely to click through and purchase the end product. --Wrye 16:31, 16 January 2008 (EST)
-
- AFAIK Google Adsense pays by the click or per 1000 ads displayed as decided by the advertiser. I think the Google referral program might work by 'end purchases' but I'm not currently using that. You (well, I) can see this by looking at the ad stats. Daily views have gone up slightly over the past few months while the number of clicks has gone down slightly. The big change, however, is the amount Google has paid for each click which has dropped, probably simply due to a lack of revelant advertisers.
-
- My main goal is to try and balance revenue with the number/size/placement of ads. Too few or the wrong ones and it's just not worth it. Too many or in the wrong place and it ruins the site. -- Daveh 18:24, 16 January 2008 (EST)
- Okay, I've been dealing with the ads for a while tonight while working on Cobl related pages, and I gotta say, I hate 'em. A busy color ad is tolerable, but a moving image is just extremely distracting and annoying. Page loads have also been hanging a lot (though that might be related to adding new pages, which always seems to drag down the ad server as it tries to figure out what ad to stick on a page it never heard of before). Anyway, this engendered a visit to my "Site Preferences" tab in Opera. Which makes me wonder if the more aggressive ads are actually self-defeating -- since the only way to get rid of them is remove them completely (no inline frames).
- OTOH, of course, breaking even is good. Some suggestions/thoughts:
- I imagine that game companies lay out big bucks for advertising in the months leading up to Christmas, and then drastically tighten the belt at the beginning of the New Year. In other words, advertising should pick up later -- especially late in the year, and that may cover losses earlier in the year. (Hmm... But you said it had been dropping for "several months. So I may be totally off base on that one.)
- The intellisense ads are notoriously not intelligent. Essentially, what we should see advertised here are ads for games, esp. ads for RPGs that compete with Bethsoft. Isn't there someway to direct the adserver to do that? (Instead of linking to useless stuff that happens to share a word with the current page topic?)
- --Wrye 02:26, 17 January 2008 (EST)
-
- Looking at the image ads that are being displayed it seems that most of them are more game oriented. Adsense is 'dumb' in the sense that it looks at the current page and tries to figure out the best ad to display. I thought I could narrow down the ad categories but it seems like the options are pretty narrow (blacklist or whitelist a specific site). There's nothing to say "show only game/RPG ads" unfortunately. Looking at the numbers from the past day is encouraging and we should be able to break even. I'll have to let it run a few weeks to get a good average though. -- Daveh 11:35, 17 January 2008 (EST)
Forgot to mention it earlier, but if anyone notices a 'bad' ad, either annoying (e.g., blinking) or inappropriate just note the ad link and let me know. I can easily blacklist certain ads. I haven't noticed any yet myself but I know I've seen some on other sites. -- Daveh 14:31, 18 January 2008 (EST)
- I haven't noticed anything annoying so far. They are all quite harmless. Surprising thing is that most of the ads I am seeing are completely unrelated to games like Naukri.com, MonsterIndia.com, Intel, ICICI and such things. --Mankar Camoran•T•C•E• 15:07, 18 January 2008 (EST)
-
- Yah, it depends on the page and what Google 'thinks' it is about. On some pages I get game oriented ads and on others (like the main page or recent changes) I typically get the general type of ads you mention. -- Daveh 16:10, 18 January 2008 (EST)
-
-
- I did notice some game related ads, although general ads are more. But I haven't been looking at many articles lately (I have stopped playing Oblivion completely for some reason), which may explain why I haven't seen many of them. --Mankar Camoran•T•C•E• 08:07, 19 January 2008 (EST)
-
I noticed a very annoying ad; it's flashing neon pink and purple with scrolling names. It links to perfectmatch4u.com. --GuildKnightTalk2me 22:37, 1 February 2008 (EST)
Blocking Rogue IPs at the Server
Given that the site is generally struggling so much, it really irks me to notice IPs that are clearly trying to systematically download large fractions of the site (even though they keep getting blocked by 503 errors), or that are repeatedly trying to post spam (even though the attempts are getting all getting blocked by captcha). Yet these unquestionably bot-controlled IPs keep showing up in the server logs. For example, 72.165.35.198 was denied access by the server 353 times during a 12 hour period today; some of the articles that this IP was so interested in obtained included <sarcasm>highly popular</sarcasm> pages such as Category:Oblivion-Factions-Nine_Divines-Primate and Special:Recentchangeslinked/Oblivion:Esbern (each was denied 6 separate times). And these were just the requests denied by mod_limitipconn (denied because the IP was trying to open too many connections at the same time).
Using iptables, it is possible to completely block certain IPs. This is a block at the server level, not just at the wiki, and completely denies the IP all access to the site. The IP would no longer be able to view a single wiki page, view any of the old site, view the forums, or anything else. If used against a legitimate user, that user would have no way to contact the site to point out the mistake. It's a pretty extreme measure, but one that has been used in a few past cases (as documented at Bad Addresses).
So what I'd like to throw open for debate is: Should we start blocking a few more of these IPs? And if we want to start doing it more widely, should there be a protocol in place to prevent the possibility of an IP used by a real reader from getting blocked?
A few ideas:
- Before blocking an IP at the server level, add a message to the IP's talk page. For example "Unusual server activity has been reported for this IP, as a result of which we believe that this IP is being used by a bot to monopolize system resources. To protect the site, this IP address is about to be completely blocked from any further access to UESP. If you have been directed to this page because you are using this IP address, please post a message here immediately to tell us that a legitimate reader is using this IP."
- If after an hour (?) no responses appear on the IP talk page, and the IP is clearly continuing to download site content, then proceed to block.
- Keep track of the IP, date, and time of all such blocks on Bad Addresses (tweak the table format perhaps, or add a new table to mark the start of this new protocol).
- After sufficient time has elapsed (one week? one month?), lift the block, again recording the info at Bad Addresses.
- As long as the IP resumes its suspicious activity, continue to reinstate blocks. I'm really reluctant to impose such an extreme block on a permanent basis. I think it's worth the small amount of extra effort to lift any such blocks periodically, even if the block just needs to be reinstated again the next day.
As for what types of behaviour would trigger this, unfortunately, I'm not sure that it's easy to come up with a clear set of rules. I think it will ultimately have to be a judgment call on the part of the person who makes the block. However, an IP would have to trigger numerous error messages (hundreds) over a period of several hours. We clearly want to avoid at all costs blocking a legitimate user who just hit refresh too many times while trying to get a page to load when the site was busy. Also, I'd say the downloaded pages would have to appear "unusual"... which is where the judgment comes in.
At the moment, the only person who can do iptable blocks is Daveh. If we wish to move forward with this, I'd like to request that I also be given permissions to add/delete IPs. If other admins notice highly suspicious behaviour from an IP in the server logs, they could post the user talk page warning and add a request (e.g., at UESPWiki talk:Bad Addresses); then Daveh or I could take care of the actual block.
Until we try it, it's hard to say whether this will have a noticeable effect on site performance. Worst case, it will at least reduce the frustration of seeing bots show up in the server logs when you're unable yourself to connect to the site. Even in the best case, I doubt it will fix all the server slowdowns (I'd like to believe that the majority of the connections to the site are coming from legitimate users rather than bots!), but maybe it can at least make it so that the site no longer refuses to respond to anything for 15 minutes at a time.
(P.S., I've also been posting a series of other more mundane/technically obscure suggestions for performance tweaks at UESPWiki talk:Upgrade History. So this isn't the only option for how to improve the site's performance.) --NepheleTalk 03:20, 17 January 2008 (EST)
- Support: Not sure if this is a voting one but hey... As we discussed earlier, I'm in favour of this. I'm not going to deny that such an extreme measure makes me feel a bit nervous but I can't think of anything else that's going to have the desired effect and the safeguards you've mentioned seem adequate. My only remaining concern is that it's yet more work being loaded on to you and Daveh. –Rpeh•T•C•E• 04:44, 17 January 2008 (EST)
- As an addendum to that, I'd suggest that any IP already blocked, say as a nonsense bot or for span, can be added immediately without the hour waiting period. If they're blocked, a legitimate user would already have appealed. I'm seeing several known nonsense bots accessing the site and it seems a waste of time to ask them if they'll be inconvenienced :-) –Rpeh•T•C•E• 06:13, 17 January 2008 (EST)
- What would a blocked IP see if they tried to access the site? If it's some sort of error message (404, 503, etc.), can we customize that error message to explain to them exactly why they've been blocked, and maybe give them a means of contacting someone to contest it? I mean, I'm all for going gung-ho against bots whenever possible, as anyone knows who's seen some of my more extreme suggestions for dealing with them, but leaving people without any explanation or way to contest a block makes even me a bit nervous. I know it's possible to make your own 404, 503, etc. error messages instead of using the browser-default, and it seems to me that this would be one way to at least leave some sort of recourse on the off-chance that a legit user is somehow affected. (It's possible that a legit user might have a trojan that is running from their IP, or that a proxy could fake its IP from another location, or even that certain dymanic IPs which get moved around and used by many separate locations might be affected in this way.) All of our other methods of blocking, such as those used on Nonsense/Spam bots and other open proxies, all of them still allow the blocked IP to post on the talk page if they wish to contest the block, but this would prevent any such chance, and has the potential to affect legitimate users if we're not extra careful about it. --TheRealLurlock Talk 13:43, 17 January 2008 (EST)
- I have seen scripts that automatically IP block an address at the server level my monitoring server logs for DoS like events (like the ones Nephele was talking about). This sort of block results in no error page (that I'm aware of)...its just like the server does not exist (the web server never sees the request). The 503 error page results from the web server DoS module kicking in but if the client is running some sort of download software (or whatever) it probably wouldn't make any difference. Perhaps a temporary automatic IP block (for a few days) is more appropriate in such an event. -- Daveh 13:56, 17 January 2008 (EST)
-
- It's quite clear from the error logs that whatever bots are involved here are basically ignoring the 503 error message. They just keep trying again and again until they get the page they're trying to access. So it seems likely that ultimately the iplimitconn isn't doing anything to limit the number of IP connections; in fact, it's really doing the opposite since the bots will now make 5 or 10 HTTP requests instead of 1 to obtain a single page. To the extent that it's true that the bots keep trying, it may not be doing anything to limit the bandwidth use either, because they still get the document in the end. Not to say that iplimitconn is doing nothing. At least it's slowing down their requests: the downloads are spread out over a longer period of time, and in the meantime more regular users can get in (hopefully).
- I'm also concerned, although I haven't been able to confirm it yet, that when the bots are blocked by iplimitconn, the bots are somehow forcing the 503 connection to stay open until apache forcibly times out the connection. It is clear that when the site gets busy there is a problem with incoming "R" requests hanging in "R" mode for a full 300 seconds; when one quarter of the site's connections are stuck open for 5 minutes at a time that's definitely going to have an impact on site accessibility. Unfortunately, the apache server status doesn't allow you to see the IP address of these "R" requests so I can't confirm where they're coming from. All I can say is that times when a lot of "R's" show up in the server status reports do correspond to times when a lot of iplimitconn blocks show up in the error logs (which admittedly could also just be that when the site is busy, there's more of everything going on).
- In any case, staring at the logs too much over the last few days does make me think that we need something that's more effective against these bots. Even if it's only a short term measure until we can find other ways to improve the site performance: if the site was running smoothly 100% (or even 95%!) of the time, I wouldn't really care about them. But right now, it seems to me very likely that legitimate readers (and editors) of the site are being denied access because of these bots every time there's a site slowdown. I'd much rather take the (small chance) of locking out a real person with a bot-infested computer than continue to very certainly turn away real users day after day.
-
- More specific comments:
- It's true that IPs who have already been blocked as nonsense bots on the wiki probably don't need an extra message. But as I've been pondering the feedback, I think it may still be worth adding the extra message just in case there is a legit user who never cared about the wiki block, but suddenly notices the problem when he loses all access. In this case, it wouldn't necessarily be an ahead-of-time warning, but more of an after-the-fact explanation once the user gets access again (yes, the wording of the message would also need to be tweaked accordingly... assuming we go with the manual approach instead of some newfangled automatic apache mod!). Also, just to be clear, I don't think we need to go through and do a server-level block on every IP that's ever been used by a nonsense bot. I'd say that only bots that continue to show up in the logs need further action (and, again, only with temporary blocks that get reinstated as long as activity continues).
- We could customize the 503 error messages that are currently being displayed when IPs get blocked. Which might in fact be helpful, since it's clear that most editors don't know what the messages mean when they first see them.
- The whole point of a server-level block is to completely prevent our computer from having to do any work at all. Apache (the web server that provides all HTTP responses) never even needs to see the connection, and therefore apache doesn't need to waste any of its resources deciding how to respond. Therefore, it's not really possible to provide a friendly explanation message. Thus the caution about extended length blocks and trying to notify ahead of time.
- --NepheleTalk 19:57, 17 January 2008 (EST)
- More specific comments:
-
-
- I too have been wondering about those lingering 'R' connections which I don't recall seeing before, at least in the amounts there has been lately. If you're familiar with netstat you can login into the server and do more specific lists of IPs connected to the server. While I haven't noticed anything recently I have used it in the past to catch 'bad' addresses DoSing the server in some manner. For example:
-
netstat -an | grep ESTABLISHED | sort -k5 | more
-
-
- lists all established connections sorted by IP. Note that its not too unusually to see a few IPs with a dozen connections since the OB/SI maps can easily generate a dozen server requests for each view. -- Daveh 09:22, 23 January 2008 (EST)
-
-
-
-
- OK, I just happened to catch one of the particularly suspicious clusters in action. On server-status 20 R connections appeared within 10 seconds of each other and, when I noticed them, had been lingering for 199-208 seconds. The server was otherwise pretty quiet (only 17 other active requests) and had been quiet for a while, so it's unlikely that these were triggered by the server getting bogged down. When I used the netstat command, lo and behold, there were 20 established connections from 89.128.216.85. Then in the process of writing this, even more R's appeared, and netstat is showing a huge burst of connections from both 24.201.104.51 and 89.128.216.85. Neither of these IPs is being reported by server-status (i.e., the connections do seem to correspond to the unidentified Rs). In netstat, both the sendQ and recvQ columns are 0 for all of these connections which (if I'm reading the man pages properly) says that neither direction claims that more data needs to be sent. Most of the other established connections had non-zero values in the sendQ column.
- The final interesting piece of the puzzle is checking the error_log file for apache. Just doing a grep on the last 5000 lines of the error log, 89.128.216.85 is only showing up once as being blocked by apache for exceeding the connection limit; 24.201.14.51 is showing up 6 times, but all from 4 hours ago. (Both do come up more times as I scan deeper back into the error log). Which means that I'm not sure that iplimit is doing anything about these connections. I'm guessing that iplimit is waiting for the IP to send a request before trying to block or get rid of them (since the iplimit criteria are all based upon which files are being requested). As long as they just hang there, the server's letting them monopolize our connections until finally the connection times out.
- Everything seems to confirm that the lingering R connections can be tied to one or two IPs that are misbehaving. And our current measures aren't doing much to control these IPs. --NepheleTalk 02:58, 24 January 2008 (EST)
-
-
-
-
-
- Last night I specifically checked during a time when the site was quiet to be sure there weren't other extenuating factors; on the other hand, it meant that having two IPs block 40 of our 100 connections for 5 minutes at a time wasn't really interfering with any other readers. Today I figured I'd snoop as the site got busy and confirm whether the same activity is happening when the site starts to bog down.
- In the server-status snapshot, all 100 connections on the server are now busy. 55 of those connections are lingering R connections that are more than 2 minutes old. netstat shows 28 established connections from IP 71.180.214.84 and 27 established connections from IP 81.214.45.167. Neither IP is visible in server-status, so these two IPs are indeed responsible for all 55 R connections. With them blocking more than half of our connections from legitimate users, it's no real surprise that we're all having trouble accessing the site. In the time it's taken me to write this, all of those 55 connections timed out. But now 71.180.214.84 is back again with 65+ connections, from that one IP alone. Needless to say, server-status is completely clogged up using all 100 connections, but the vast majority are Rs.
- Just to do some quick math: the server averages more than 30 requests per second. If one of these IPs blocks 20 connections for 300 seconds, that's nearly 10,000 requests that are unable to get through each time one of these IPs attacks us. And from what I've seen in the logs, these IPs keep doing it time and time again for hours. We really need to find a way to get rid of these pests. --NepheleTalk 13:06, 24 January 2008 (EST)
-
-
Prev: Archive 6 | Up: Administrator Noticeboard | Next: Archive 8 |