UESPWiki:File Database Design
Currently any files available for download on the site are manually ftped to the server and hard-linked. This greatly limits the amount of files we can host simply due to the effort involved in maintaining these files and prevents us from easily including content such as Morrowind and Oblivion mods. A simple database system, preferably integrated with the existing Wiki, needs to be developed to allow file downloads to be easily added and maintained.
Basic Requirements and IdeasEdit
The following are the more important requirements for whatever system is used.
- Integrated with the Wiki as much as possible (reuse Wiki user information).
- Web front end for adding new files and editing existing ones.
- Track basic stats such as number of downloads, last update, etc....
- Each file is associated with an owner (Wiki user) that can edit and update the file details.
- File information is stored in database form but the actual files are stored in the file system.
- Both the file database and files themselves must be capable of being backed up and restored in some manner.
- Simple search capabilities.
The following are less important ideas that may be desired in the future.
- Optional ability to upload files via FTP (useful for large files).
- Some form of version tracking may be nice. Instead of having different versions of the same download being seperate, combine them into a single download resource record. By default the newest version is downloaded, but older versions may still be available.
- Possibly use a custom domain for storing of files (ex: files.uesp.net) rather than a local/absolute path.
- Files can belong to multiple categories.
- Simple rating system (0-5 stars, default at 3).
- Optional comments thread for each file.
- Optional Screenshots (useful for mods but probably not worth it initially).
- Add a MediaWiki extension for easily adding file links to pages (<file>SomeFile</File>).
- 1. Existing Wiki Add-on/Extension
- The Wiki itself is capable of handling binary files (through the Image: namespace) but doesn't provide anything to properly organize and manipulate 1000s of files. The binary file capability could be expanded. The main advantage to this design is that almost no work needs to be done as everything is handled by the Wiki. The major downside is that the file uploading/downloading and file organization of the Wiki is extremely basic and may lead to problems later on. Extending the Wiki to better handle binary files is a possibility.
- 2. Article for Each File
- File information could be stored in a Wiki article with just the file itself having to be stored somewhere. This is nice from the user perspective and requires relatively little effort to implement. This still requires a good amount of custom work to be added as well as system for handling the file uploads and downloads. It does save us the effort of managing the file page content as it is handled by the Wiki.
- 3. Reusing Image Namespace
- Simply use the unmodified Image: namespace to handle files.
- 4. Modified Image Namespace
- Use the existing Image: namespace and tweak it slightly to better handle general file uploads/downloads.
- 5. Custom File Namespace
- This design uses a system similar to the existing behavior of the Image: namespace. A custom File: namespace will be created with additional handling for the extra file information.
- 6. Custom File Database
- The most complex design but also the most flexible as it allows essentially any content and presentation without having to conform to the existing Wiki structure.
For the remainder of this design we will be generally considering options #3/4/5 as they appear to be the best compromise between a fully custom solution and integration into the existing Wiki code.
Reusing Image NamespaceEdit
Simply using the existing image namespace is the easiest solution as it requires no modification.
Modified Image NamespaceEdit
This extends the previous design concept further by taking the existing Image: namespace and modifying it slightly to better handle binary files.
Custom File NamespaceEdit
This design adds a custom File: namespace that operates in a manner similar to the Image: namespace except specifically for binary files.
File Record InformationEdit
The following is an incomplete list of data that can possibly be stored for each file record.
- File name (no path information)
- File size (cached for easy access)
- Last update date/time
- Original record creation date/time
- Author's user ID (preferably a Wiki user ID if possible)
- Download count
- RatingSum and RatingCount (used to compute an average rating)
- Version information (depending on how we implement this)
- Requirements (useful to list required Expansions for mods)
- Category list
- Contact information (may not be needed if available from the Wiki user information)
- Website (useful for mods or utilities that have their own website/forums)
- Alternate resource (another place the mod can be downloaded)
- Wiki Database (existing)
- User Table (existing)
- UserID (long, PK)
- ID (long, PK)
- UserID (long, FK)
- Name (text)
- UpdateDate (date/time)
- CreationDate (date/time)
- Description (text)
- Downloads (long)
- RatingSum (long)
- RatingCount (long)
- VersionID (long, FK)
- Website (text)
- ID (long, pk)
- Name (text)
- Description (text)
- CategoryID (long, FK)
- FileID (long, FK)
- ID (long, PK)
- FileID (long, FK)
- Name (text)
- Description (text)
- ReleaseDate (date/time)
- Filename (text)
- Size (long64)
- Requirements (text)
- User Table (existing)
File Table DiscussionEdit
The file table holds the basic information about the file that won't change (or won't change much) from version to version. Some of the information here could be technically stored in the file version table (ratings for example) but this would likely only be useful in the rare case that a file changes dramatically between versions (in which case a new file record can be created instead). The specific file information (such as filename and size) is stored in the file version table which allows an arbitrary number of versions to exist for each file. The current/default version is stored in the VersionID field of the table which allows the owner of the file to be able to explicitly state which file is the default (merely choosing the most recent version by date could possibly cause problems).
File Category DiscussionEdit
The categories for a file are stored in a separate table which allows each file to belong to an arbitrary number of them. The FileCategory table holds a list of the currently available categories. This would include the applicable game (Morrowind, Oblivion, None, etc...) and any other specific types (plug-in, utility, editor, quest pack, etc...). New categories can only be created/edited by the database administrator.
The FileCategoryLink table connects one file to a specific category. To get all categories for a file then requires a join to be performed between this table the file table, matching the specific FileID. Performance wise this shouldn't be a concern with a proper index. Even with 10,000 files with an average of 3 categories each the table size will be ~30,000 records.
File Version DiscussionEdit
The FileVersions table holds the information for each version of all files in the File table. This design allows an unlimited number of versions to exist for each file. The version of the file is stored in the Name field. The Filename field holds the actual filename which will be used to download the file from file server (ex: files.uesp.net). Relative paths may be added to the filename to better organize the file server rather than having 1000s of files in a single directory. Specific version information can be stored in the Description and Requirement fields. The ReleaseDate field stores when the version was made available (not necessarily the creation date of the version record).
The given design of file categories is good in that's quite simple and allows unlimited categories per file. A problem arises, however, when actually storing the file on the file server as all files would naturally be stored in a single directory. This isn't a huge deal, but when (or if) we start having to deal with 1000s, or 10000s of files in a single path it may being to be annoying.
It would be nice to have a very basic directory structure for the file locations to indicate their general type: \Morrow\Mods, \Morrow\Editors, \Morrow\Patches, etc.... This would make any manual maintenance and backup procedures easier. With the given category design, however, implementing this structure is non-trivial. It is likely possible to general a semi-appropriate path from a file's list of categories, but this would probably be a complex algorithm and also makes changing a file's category non-trivial (instead of just editing the database you now have to worry about physically moving a file).
A design that makes choosing a file's physical location easier to determine involves using two forms of category information. A separate category field would be added to the FileVersion table (or File table) which would specify the file's base category (Morrowind-Mods, Daggerfall-Patches, etc...). The existing category design would still be used for specifying additional category levels as needed. This also as the side effect of reducing the size of the FileCategoryLink table considerably. The splitting of category information does make some things more difficult, such as the searching specific categories for example.
By storing the file related tables in the wiki database and reusing the wiki User table, we introduce a possible issue when upgrading to future Wiki versions. There is no guarantee that future versions will use the same user structure or that the upgrade procedure won't nuke our custom file tables. We should look at how flexible and forgiving a Wiki upgrade is to the database and ensure that including the file tables to the Wiki database won't be regretted.
A Wiki extension allows functions to be developed to parse custom tags, <newtag></newtag>, in article the markup. There is a potential to use this capability to easily link files from article text.
- Tag attributes may or may not be allowed (must confirm either way). If possible it would make adding a file tag trivial, <file id="1234" version="1.5">Some File</file>
- Files can potentially be referred to by their unique ID or their name. Referring to files by name might be easier from a user standpoint but it causes problems since the name is not necessarily unique and can be modified.
- Referring by file ID prevents us from specifying anything other than the current version of the file.
- Referring to a file by ID yields a tag format of: <file>12345</file>. The custom extension function would replace this with a valid URL for the file: [a href="files.uesp.net/download.php?id=12345"]Some Filename[/A].
A web interface for the adding, editing, downloading, and browsing of files in the file database will need to be created.
- Add new file (requires user login)
- Browse files (table format with basic information, sortable)
- Search files (simple text search with basic category selection)
- View file details (view all available information on a file including all versions)
- Download file
- Edit file information (only available to sysops and the file author)
- Add file version (sysops and file author only)
- Edit file version information (sysops and file author only)
- Database statistics (read only)
The following is more specific design notes for a file upload/download system using a custom File: namespace operating similarly to the existing MediaWiki Image: namespace.
The following are example procedures on how the file upload/download system should function.
Upload a New FileEdit
- User uploads a new file selecting the initial article name and optionally any categories and license.
- New file page is created with default content (much like a new image page). The file is not yet available for download until verified (patrolled).
- A file patroller checks the new file upload request. This includes checking the content to ensure it is what is says it is, is not corrupted, is not copyrighted material, scanning for virus/trojan, selects an appropriate filename, etc.... If the file is verified it now becomes available for download and shows up in searches. If refused the file is either requested to be fixed and downloaded again or the file page deleted (depending on why it was refused).
- Upon verification the file is moved/copied from the temporary upload location to the permanent download location.
Upload a New Version of Existing FileEdit
- User uploads a new version of an existing file by clicking on a link from the file's page (much like reloading a new image).
- The new version is subject to the same verification as a new file with the addition of verifying that it is indeed a new version of the given file.
- Upon a successful verification the new version is moved/copied to the permanent download location and is now available.
- Upon a failed verification the new version info is simply deleted.
- The default version for a file will either be the most recently released (not uploaded) or might be set manually.
Download a FileEdit
- Each file page will contain a conspicuous download link for the current version.
- Older versions can also be downloaded from the file page.
- Downloading will forward you to a short page (for statistic purposes) which will automatically begin the download.
- Linking to a file download should be made to this intermittent page in order for the file statistics to be updated.
Find a FileEdit
- Article name and text should be searched by the default search function.
- A custom search function for just files can be written if additional functionality is required.
Delete a File or VersionEdit
- A deletion request is handled similarly as a request to delete an image page.
- If a file version is to be deleted the specific version file and database records are moved into an archive location where they can be later restored if needed.
- Similarly, a complete file deletion would move all file versions and database records to an archive location.
The following is the initial database structure for the basic file upload/download system. Not all features are included and the intention will be to add them at a later date.
The files table stores the basic information on each file in the same manner that the existing image table stores data on images.
|file_id||int(8)||Unique identifier for the file.|
|file_name||varchar(255)||The file article title.|
|file_versionid||int(8)||The default/current file version or NULL if there are no current versions.|
|file_downloads||int(8)||Total downloads for all versions of this file. This should just be a sum of the version download counts but is included for simplicity.|
|file_author||varchar(255)||Author name and optionally an e-mail or website link. This is not a Wiki username. Allows Wiki code or HTML.|
This table serves the same purpose as the existing patrollers table used for patrolling articles. It is used to ensure only that one file patroller sees a particular file upload request.
|fileptr_versionid||int(8)||The version record being patrolled.|
|fileptr_timestamp||varchar(14)||The timestamp for the start of the patrolling session.|
This table holds information on the specific versions available for each file.
|filever_id||int(8)||Unique identifier for the file version.|
|filever_fileid||int(8)||The parent file this version belongs to (cannot be NULL).|
|filever_title||varchar(255)||Short description identifying this version.|
|filever_userid||int(8)||User that uploaded this version.|
|filever_releasets||varchar(14)||The release timestamp of this version (if available, otherwise NULL). The format matches the MediaWiki timestamp fields (YYYYMMDDHHmmSS).|
|filever_uploadts||varchar(14)||The upload timestamp of this version.|
|filever_name||varchar(255)||The filename of this version as used in the file system.|
|filever_size||int(16)||The size of the file in bytes.|
|filever_verified||tinyint(1)||Has this version been verified yet? Unverified versions will not be available for download.|
|filever_downloads||int(8)||Number of times this version has been downloaded.|
When deleting versions or entire files the data will be moved into archive tables rather than simply removed. The archive tables (archive_files and archive_fileversions) will have the identical structure as their matching tables.
The following describes the needed custom and modified MediaWiki pages/files needed along with their requirements.
Similar to the Special:Upload page for uploading images except for uploading files.
This might be modified to forward to the new Special:FileUpload page if the input filename is not an image/movie/sound file. It is also possible that no Special:FileUpload page will be created but all images and files will be downloaded from this single page.
The main file for serving up file content, both the actual file page as well as for downloading.
The logging page may need to be updated to support proper output of file uploads and patrols.
A new page for patrolling (verifying) file uploads needs to be created. On this page the patroller can download new uploads, perform the necessary checks and then verify or refuse the upload. This should be very similar to the existing Special:Patrol page.
The following describes the file system specific design.
All uploads will be in their own directory until their are verified (moved) or refused (deleted). This directory may also be available for uploads via anonymous FTP (upload.uesp.net). Preferably uploads should be saved using the lighttpd web server to reduce the load of the main Apache server.
Once verified all files will be moved from the upload directory to their permanent download location.
- The directory structure will have some manual structure as chosen by the file patrollers. Files will be roughly organized by game and type. This organization is optional and has no impact on the Wiki side of things but may be needed if the download site is available via FTP.
- The filename is largely irrelevant and is not used to uniquely identify the file (the file article name and file/version ID is used instead).
- The filename can be changed at anytime by an admin or file patroller.
- Downloads may be accessible via anonymous ftp (downloads.uesp.net).
- The lighttpd web server will be used to serve file content and reduce the load of the main Apache server.
- Bandwidth/connection limiting will likely be needed to prevent download abuse of a single user.
When a file/version is deleted the actual file will be moved into an archive location in case it needs to be undeleted at a later date. The archives will not be publicly accessible.
Backups of the files can be done in a similar manner to the image backups, rsync via a private NFS share.
Initially all three file systems (uploads, downloads, and archives) will be on a single server. In the future, however, it may be necessary to separate one or more of these to their own server. The system must be capable of supporting this without requiring too much effort.
- Uploads can easily be moved to their own server as it is essentially just a file storage. Files can be moved to the download server once verified via a NFS (or similar) share or via FTP. The only obvious issue is timing...once verified it may take a significant amount of time before a file has been copied from the upload to the download server. To avoid this the file transfer can be first to a temporary file which is then moved to its permanent name once the transfer is complete.
- The download server will likely see the most traffic by far and is just a file server. The only special features required is the ability to move/copy files from the upload server and to the archive server.
- Similarly the archive server only needs to be able to copy files from the download server and occasionally copy them back to the download server. This server does not require any public access and does not actually serve any of the files it stores.
- If may be possible to code the download source to check the load/availability of the download server and use the backup (or mirrors) to serve a file. This requires a live backup of the file downloads to be in place. Multiple download servers can be implemented in this case with the one with the lowest load used.