Tip

Utility creates free offline browsing

This tip originally appeared on SearchWinSystems.com, a sister site of SearchSMB.com.


Creating an offline copy of a Web site can be tricky, and it's not always as simple as copying the needed files. For instance, suppose you have

    Requires Free Membership to View

a Web site that you want to package up and include on a CD or DVD with promotional material. You might have a hard time doing it if the site uses server-based technologies like server-side includes (SSIs). These don't show up in the source code for the site.

I've looked at a number of different programs for creating copies of a Web site for offline browsing. Among the best designed is HTTrack 3.33 (or WinHTTrack in its Windows incarnation). It's both free and open source and it runs on multiple platforms. It makes site copies that are remarkably faithful to the originals, so offline browsing resembles online browsing as closely as possible. A site that's been mirrored with WinHTTrack will have an automatically generated index page and a few (invisible) comments in the body of each page, but no other differences.

WinHTTrack uses a wizard-style interface to set up a site-copying job. The defaults for the program work fine, but the user can have a great deal of control over the copying process. For example, you can control the following functions: what types of files, the recursion depth, whether to follow offsite links and how deeply to do so, which objects to scan for or ignore, how to manage dropped connections and other kinds of flow control, how many connections to establish to the server, whether to copy Java applets or Flash objects and so on. It's this extra level of fine control that makes WinHTTrack particularly useful.

If you have site copies you made with WinHTTrack before, you can point the program to the copy and ask it to update the copy from the original site according to various criteria -- i.e., update only changed files or recopy everything religiously. Even more interesting, you can create a word database of the entire site -- an index.txt file in the top directory of the mirrored site, which can be used for linguistic analysis by third-party programs (for instance, to determine the average reading level of the site).


Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!


This was first published in November 2005

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.