Open main menu

UESPWiki β

UESPWiki:Upgrade History/2010

< UESPWiki:Upgrade History
This is an archive of past UESPWiki:Upgrade History discussions. Do not edit the contents of this page, except for maintenance such as updating links.

27 January 2010

  • Created the OBMobile and Stormhold namespaces on content 1/2/3.

12 December 2010

  • Created the Skyrim namespace on content1/2 (SR prefix).
  • Current squid1 cache hit rate is 61% (3 day average of 10.9 hits/second and 17.8 requests/second). The current rate is 36 requests/sec at a 64% due to the recent Skyrim announcement.
  • Fixed replication error on content3 and backup1. Appears to be an error in the phpBB3 query:
   DELETE zebra
      FROM zebra z, user_group ug
      WHERE z.zebra_id = ug.user_id
      AND z.foe = 1
      AND ug.group_id = 4982
The correct query uses the z table alias instead of zebra before FROM.
  • Memcached on content1 is running at a 94% hit rate with an average request rate of 11 req/second.
  • Updated Morrowind map images.
  • Changed lighttpd settings server.max-fds to 4096 and server.max-connections to 2000 on files1 to fix recent performance issues.

16 November 2010

  • Restored the latest database dump from db1 onto content3 and restarted replication. The previous restoration was started using incorrect settings.
  • Note: squid1 is currently averaging a cache hit rate of 53%. This is lower than the previous 65-70% as all the CSS/Javascript requests have been since split off and are served directly from files1.
  • Restored zabbix monitoring on content3 along with other minor services.

15 November 2010

  • Restored latest database dump from db1 on content3 and started live replication.
  • Restored copy of UESP and Dave web sites and started web server on content3.

12 November 2010

  • Received content3 back after a server hardware swap. Random bit setting issue was not due to the hard drive but something else (memory, motherboard, etc...).
  • Initial installation of applications on content3.

6 November 2010

  • Installed reCaptcha on the UESP blog.
  • Copied blog files from content1 to content2/3 (backup purposes only, blog is still solely served from content1).
  • Rpeh given moderator access on the UESP blog.
  • Updated new SI map tiles on files1.
  • Changed HTTP headers in the getmaplocs.php map data file to permit caching of results in Squid (currently up to 1 week).
  • Reinstalled MySQL server on content3.
  • Restored zabbix database and restarted on content3.
  • Setup rc.local on squid1 to correctly restore required applications on reboot.
  • Started ntpd and vsftpd on squid1.

31 October 2010

  • Added purge permissions to the Squid cache.

30 October 2010

  • Cleared the shared MediaWiki file cache (in preparation for moving back to a Squid cache).
  • Removed Google Analytics code from all UESP content.
  • Added weekly rotation for the synced server file backups on backup1.
  • Pointed www. and .uesp.net to point to squid1.

24 October 2010

  • First run of backup scripts setup and running on the new backup1. Weekly/monthly backup rotation still needs to be finished.
  • Old backup1 shut off.
  • Received squid1 back with a new OS installation and confirmed an ext3 file system.

23 October 2010

  • Copied Oblivion water colored map tiles from test to live on files1.

21 October 2010

  • Confirmed that database replication is working on the new backup1.
  • Setup daily database backups on backup1.
  • Setup initial daily/weekly/monthly rotation of database backups on backup1.
  • Requested squid1's OS to be reinstalled with an ext3 file system to hopefully avoid the performance issue with journaling.

17 October 2010

  • Database replication setup and running on the new backup1.
  • Performing initial sync of all Wiki image directories on the new backup1.
  • Krusty/RobinHood70 given cartographer rights as requested. Rpeh given basic access on content1 to update map files.

16 October 2010

  • Finished setting up OS on new backup1 server. Setup uesp users and key files. In the process of downloading the last database snapshot to begin live replication.

31 August 2010

  • Made minor change to MetaTemplate on content1/2/3 to fix this issue.

23 August 2010

  • Added TorBlock wiki extension to content1/2/3; downgraded extension version to r36018 for compatibility with MW14.0

22 August 2010

  • Backup1 finally up and running again since the move. Multiple replication errors fixed manually due to some boot up issues (all duplicate IDs in session tables).

13 August 2010

  • Gave bots allspacepatrol rights on content1/2/3 (so bot edits are auto-patrolled in all namespaces)

25 July 2010

  • Added the "Dawnstar" namespace to content1/2/3.

19 July 2010

  • Reduced squid1 cache size from 50GB to 5GB.
  • Pointed www.uesp.net domain to content2 so squid1 can be worked on without taking down the entire site.
  • Changed main domains (www and .uesp.net) to point to content1 (more RAM and CPU than content2).

18 July 2010

  • dungeonhack.conf (apache configuration script) moved from apache/ to apache/conf.d subdirectory on content1. Done in response to this AN error report.
  • apache restarted on content1 so dungeonhack change can take effect.

3 July 2010

  • Update on Squid changes. IOWait% on squid1 has been below 10% overnight with a nominal value of 2-3%.
  • content3 database replication error:
'Duplicate entry '61760-17747' for key 1' on query. Default database: 'uesp_net_phpbb3'. Query: 'INSERT INTO topics_track  (user_id, topic_id, forum_id, mark_time) VALUES (61760, 7747, 6, 1277555351)'
  • Master_Log_File: uesp-mysql-bin.082
  • Read_Master_Log_Pos: 149201910
  • Relay_Log_File: cl-t169-500cl-relay-bin.064335
  • Relay_Log_Pos: 10435271
  • Relay_Master_Log_File: uesp-mysql-bin.081
  • Exec_Master_Log_Pos: 240680539
Manually updated duplicate column, set skip counter to 1, and restarted replication. One other similar replication error fixed. Currently 570,000 seconds behind master.
  • Turned off Apache access logs on content3 and set error logging level to critical.
  • Upgraded UespCustomCode to version 0.9.6 on content1/2/3 (new userspace-viewing options to handle logs such as the Block log and User creation log that are implicitly treated as part of the User namespace)
  • Upgraded MetaTemplate to version 1.0.5 on content1/2/3 (fix obscure issue with unsetting named/numbered parameters)

2 July 2010

  • Turned off Squid logging on squid1.
  • Created a new Squid directory (/var/spool/squid1/).
  • Restarted Squid on squid1.
  • IOWait% on squid1 went from 14-50% to 0-1%.

28 June 2010

  • Corrupt table zabbix.history on content3. This caused MySQL to continually restart. Forced a valid restart with innodb_force_recovery=4. Cannot repair since it is an InnoDB so exported, dropped, and reimported.
  • Several replication errors after MySQL restart on content3. Some easy ones initially fixed but replication is currently stopped.
  • Increased the squid1 cache repository size on squid1 from 10 to 50GB. Restarted squid.

25 June 2010

  • Note that content3 database replication is now caught up to db1.
  • Started httpd from rc.local on content3.
  • Drastically reduced the number of Apache max clients/servers on content3. The existing settings were using up all memory causing ~1GB of swap to be used which drastically hurt the performance of the server. Client/server numbers could probably be increased some depending on the amount of memory used/needed. These numbers will also have to be increased if content3 is ever needed as a content server.
  • Deleted some old mysql log files on content3 to free up some disk space.

24 June 2010

  • Manually started zabbix_server on content3. Changed order of execution in rc.local to ensure it starts automatically on server start.
  • content3 database replication error: Error 'Unknown column 'sessiïn_time' in 'field list'' on query. Default database: 'uesp_net_phpbb3'.
  • Relay_Log_File: cl-t169-500cl-relay-bin.064325
  • Relay_Log_Pos: 42119217
  • Relay_Master_Log_File: uesp-mysql-bin.080
  • Last_Errno: 1054
  • Exec_Master_Log_Pos: 41604648
Manually executed the query on content3 correcting the obvious error, increased the skip counter SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; and restarted the slave replication.

20 June 2010

  • Note that database replication on content3 is stalled due to a binlog reading error (Could not parse relay log event entry).
  • Master_Log_File: uesp-mysql-bin.081
  • Read_Master_Log_Pos: 52147018
  • Relay_Log_File: cl-t169-500cl-relay-bin.022016
  • Relay_Log_Pos: 4635
  • Relay_Master_Log_File: uesp-mysql-bin.076
  • Exec_Master_Log_Pos: 222044179
  • Relay_Log_Space: 3179338907
To fix this issue run the following commands on the MySQL slave substituting the above values:
    stop slave;
    CHANGE master TO master_log_file='Relay_Master_Log_File', master_log_pos=Exec_Master_Log_Pos;
    start slave;
content3 slave replication is currently 20 days behind db1 but should take less than a day to catch up (assuming no further replication errors).
  • Fixed issue backing up UESP Wiki images on content3. Manually synced images (automatic hourly backup).
  • Turned noatime on main partitions in content3/squid1. Only remounted the partitions, did not reboot servers.
  • Turned Wiki page counters off on content1/2 (already off on content3).

14 June 2010

  • Rebooted content2 (not responding); had to also manually start apache after content2 had restarted (apache did not automatically start)
  • Rebooted content3 (not responding). Manually killed the initial mysql process (could not connect with the client) and restarted.
  • Manually deleted a duplicate record in the content3.uesp_net_blog.evo_sessions table to permit replication to continue. Note that content3 is currently ~23 days behind in replication of db1 (will catch up quickly however).
  • Changed apache's MaxClients from 200 to 70 on content2 (allowing 200 clients obviously overloads the system; 70 is value of MaxClients on content1)
  • Modified apache's expires, cache-control directives for static content on content1/2 (trying to improve caching of static content); restarted apache on both content1/2.

13 June 2010

  • Squid1 acting up with a high usage of %wa from the kjournald process. Set data=writeback in fstab and rebooted. Since it didn't come back up in a reasonable time the uesp.net and www.uesp.net domain entries were pointed to content1.uesp.net to at least get some traffic through.
  • iWeb support reverted the /etc/fstab changes to get squid1 back up. Reverted the DNS changes. Note that squid1 is back to using both content1 and 2.
  • More content2 issues with very high load. Removed temporarily from squid1 and rebooted. After reboot restored content2 back into squid1.

11 June 2010

  • content2 unreachable (can ping but not ssh into). Power cycled to no effect (was actually powering down content3 which is why it didn't work).
  • Removed content2 from Squid content peers and restarted Squid server on squid1 (all content currently served by content1).

10 June 2010

  • Upgraded UespCustomCode to 0.9.5 on content1/2/3 -- fix error message related to redirects when searching
  • Updated MetaTemplate to 1.0.4 on content1/2/3 -- enable #preview on non-templates; tweaks to catpagetemplate

9 June 2010

  • Set max_filedesc=4096 for squid; added ulimit command to squid initialization; restarted squid for changes to take effect (all to deal with "Your cache is running out of filedescriptors" warnings)
  • Started memcached on content2 (including uncommenting line in rc.local); changed LocalSettings.php to use local memcached

8 June 2010

  • Enabled mod_expire in lighttpd on files1.uesp.net; configured all content to expire after a month; restarted lighttpd
  • Fixed some typos in ExpiresDefault apache directives on content1/2; restarted httpd. But still can't get an expiration date set for favicon.ico

7 June 2010

  • Disabled wiki's CustomCategory extension on content1/2/3 -- conflicts with MetaTemplate's category functions (if CustomCategory is being used, MetaTemplate can be modified so both can be enabled simultaneously)

3 June 2010

  • Added nephele to uesp group on content3
  • Applied all updates from last three days to content3 as well as content1/2 (MetaTemplate, UespCustomCode, GlobalFunctions.php, prefs.js)
  • Cleanup/refresh scripts have all finished running

2 June 2010

  • Updated MetaTemplate to version 1.0.3 on content1/2 -- #ifexistx parser function
  • Updated UespCustomCode to version 0.9.4 on content1/2 -- incorporated Nx's RestrictBlock function; gave admins ability to add/remove from blockuser group; final tweaks to prevent wanted categories/pages bugs; more search query optimization
  • Added a new user to SQL so that wiki updates can be run from content1
  • Running refreshLinks to clean up old issues with wanted categories

1 June 2010

  • Updated MetaTemplate to version 1.0.2 on content1/2 to fix a bug in cleantable.
  • Updated UespCustomCode to version 0.9.1 on content1/2 -- tweaks to recentchanges; Userspace Patroller group; search query optimization

31 May 2010

  • Updated MetaTemplate from version 0.5.1 to version 1.0.1 on content1/2 but not (at the moment) on content3
    • Leaving servers in inconsistent state because new version decreases the chance of content1/2 deadlocking (and the different versions shouldn't cause any conflicts, at least not until new features start to be used).
  • Ran mysqlcheck on all databases again, and found no issues
  • Checked innodb logs again; latest detected deadlock at May 31 13:10:55, which would be about 30 minutes before MetaTemplate code was updated; will continue to monitor to see whether new cases appear once code is updated on all servers.
  • Applied mediawiki revision r49428 to GlobalFunctions.php on content1 and content2. Issue was generating ~10 warning messages per day on content1.
  • Updated UespCustomcode from version 0.8 to version 0.9 on content1/2 (but, again, not yet on content3) -- new Special:Recentchanges customization features

30 May 2010

  • Restarted content3 (not accessible via ssh/ping this morning).
  • Increased file handles on files1 to 8192.
  • Disabled lighttpd access log on files1
  • Restarted lighttpd on files1.

29 May 2010

  • Ran mysqlcheck on all databases, which fixed numerous cases of incorrectly-closed tables
  • Checked innodb logs, and noticed an entry under latest detected deadlock from May 23, caused by MetaTemplate.

21 March 2010

  • Added a new blockuser group to content1/2/3.

1 March 2010

  • Disabled memcached in LocalSettings.php on content2/3. Benchmarking shows a performance increase of 150% without memcached. content1 shows a small performance improvement on some pages since it hosts the memcached server locally. Moving all content servers onto the same subnet would likely improve memcached performance on content2/3.

28 February 2010

  • Switched out content1 for content3 in the squid1 load balancing. Content1 still exhibits the "reload lag" issue that content2/3 do not.
  • Stopped NFS server programs on content1. These were apparently the source of the "reload lag". After stopping them the issue did not occur.
  • Swapped back content1 for content3 in the squid1 configuration.

27 February 2010

  • Redirected all squid1 traffic to content2 via the DNS entries.
  • Backed up all configuration and custom applications on squid1 to content3.
  • Files1 now supports compressing of text/css/javascript files.
  • Increased Apache's MaxClients on content2 from 75 to 200 to deal with the increased traffic from the squid1 outage.
  • Received back squid1 after its hard drive replacement, setup, and tested.
  • Switched www.uesp.net to point back to squid1.

26 February 2010

  • Squid1 server was unresponsive this morning for ~1.5 hours. Ultimately several resets of power got it back up but the root cause is unknown.
  • The DNS entry for www.uesp.net was pointed to content2.uesp.net for most of the morning to try and direct some traffic to a working site.
  • Added Apache startup to /etc/rc.local on squid1.
  • Wiki skin resources are now served from http://skins.uesp.net on content1/2/3.
  • Installed DenyHosts on content3. Will see how it works before moving to the other servers.
  • Fixed e-mail notifications on the Zabbix server.
  • Figured out how to send text messages for events on the Zabbix server.

25 February 2010

  • Reset memcached on content1.
  • Made a few changes to the squid configuration on squid1 and restarted.
  • Previous changes made little to no effect on the "reload lag" issue.

24 February 2010

  • Created a symbolic link /shared/uesp/wikiimages/images.new to point back to /shared/uesp/wikiimages/ on files1. Older cached content is still referring to the old images location which was preventing images from being displayed to anonymous users (anyone not logged into the Wiki).
  • Wiki images now reference images.uesp.net directly.
  • All Google based maps load the map images directly from maps.uesp.net.
  • Installed eAccelerator on content3.

23 February 2010

  • Switched content1/2/3 to use files1 as the source of the UESP wiki images, file cache, and php sessions. Tested on each server to confirm upload operation.
  • Updated backup scripts on content3/backup1 to copy from the new home of the UESP wiki images on files1.
  • Removed the affiliate ad from Monobook.php on content2 (due to the caches it will still show up for a while).
  • All wiki images from content1/2/3 are now being served from images.uesp.net (files1 server). Currently this is a direct path into the wiki images directory, i.e. www.uesp.net/w/images/3/30/Mainpage-logo.jpg translates to images.uesp.net/3/30/Mainpage-logo.jpg.
  • Changed the DNS entry on maps.uesp.net to point to the new files1 server. Will wait a few days for the change to fully propagate before switching all the maps to use this for the map images.
  • Fixed link to the lower-left Somerights.png image on content1/2/3.

22 February 2010

  • Started setting up files1.
  • Installed ImageMagick on content3.
  • Installed wikidiff2 on content3.
  • Installed tidy on content3.

20 February 2010

  • Upgraded MySQL on content3 to v5.0.
  • Installed the Zabbix server on content3 and the agent on all servers (content1/2/3, db1, squid1).
  • Zabbix web frontend installed on content3 under monitor.uesp.net (or zabbix.uesp.net). Anonymous users set to read-only display of monitor status.
  • Setup content3 so the Wiki can be server by it (http://content3.uesp.net/wiki/Main_Page for example). Currently it does not use the memcached server and the PHP opcodes are not cached and it is not set up in the squid1 load balancer. The main purpose is to be able to use it for testing changes to the Wiki before deploying them to content1/2.
  • Modified skins/common/prefs.js on content1/2/3 to make the user preferences appear properly in a tabbed panel.
  • Upgraded extensions/Icon/Icon.php on content1/2/3 to 1.6.2 to fix some issues in the older version.
  • Installed the Gadgets extension on content1/2/3. From what appears to be a memcached related issue the Gadgets are not working on content1/2.
  • Current memcached stats:
    STAT uptime 11157356
    STAT time 1266720823
    STAT version 1.2.2
    STAT pointer_size 32
    STAT rusage_user 2732.099657
    STAT rusage_system 9565.813775
    STAT curr_items 2028518
    STAT total_items 12071830
    STAT bytes 547480918
    STAT curr_connections 1
    STAT total_connections 90812029
    STAT connection_structures 142
    STAT cmd_get 226291001
    STAT cmd_set 12197883
    STAT get_hits 214786706
    STAT get_misses 11504295
    STAT evictions 0
    STAT bytes_read 28848473825
    STAT bytes_written 5502509679862
    STAT limit_maxbytes 805306368
    STAT threads 1
  • Cleared memcached. This fixed the issues on content1/2 with the recently installed Gadgets extension. Users may still have to reload their pages in order to see any changes.
  • Experimenting with a new Google affiliate ad program. Content served from content2 has a small ad in the lower-left for PC Connection. If it can make around the same amount as the Adsense ads it may make sense to switch all ads.

19 February 2010

  • Restarted lighttpd on content2 after noticing it was not running. This would have prevented most images from loading from any page being served from content2.

13 February 2010

  • Added the movefile permission to the bots group on content1/2.

11 January 2010

  • Changed $wgJobRunRate on content1/2 from 0.01 to 0.1.
  • Updated common.css with the custom code from shared.css which was changed to a minified version of the MediaWiki 1.14 file on content1/2.
  • Applied this patch to wikibits.js on content1/2.


Prev: 2009 Up: Upgrade History Next: 2011