By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Creating an offline copy of a Web site can be tricky, and it's not always as simple as copying the needed files. For instance, suppose you have a Web site that you want to package up and include on a CD or DVD with promotional material. You might have a hard time doing it if the site uses server-based technologies like server-side includes (SSIs). These don't show up in the source code for the site.
I've looked at a number of different programs for creating copies of a Web site for offline browsing. Among the best designed is HTTrack 3.33 (or WinHTTrack in its Windows incarnation). It's both free and open source and it runs on multiple platforms. It makes site copies that are remarkably faithful to the originals, so offline browsing resembles online browsing as closely as possible. A site that's been mirrored with WinHTTrack will have an automatically generated index page and a few (invisible) comments in the body of each page, but no other differences.
WinHTTrack uses a wizard-style interface to set up a site-copying job. The defaults for the program work fine, but the user can have a great deal of control over the copying process. For example, you can control the following functions: what types of files, the recursion depth, whether to follow offsite links and how deeply to do so, which objects to scan for or ignore, how to manage dropped connections and other kinds of flow control, how many connections to establish to the server, whether to copy Java applets or Flash objects and so on. It's this extra level of fine control that makes WinHTTrack particularly useful.
If you have site copies you made with WinHTTrack before, you can point the program to the copy and ask it to update the copy from the original site according to various criteria -- i.e., update only changed files or recopy everything religiously. Even more interesting, you can create a word database of the entire site -- an index.txt file in the top directory of the mirrored site, which can be used for linguistic analysis by third-party programs (for instance, to determine the average reading level of the site).
Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!