There are a number of reasons to be concerned about access to web pages and other information on the system. One reason is security. If you serve certain kinds of information, you are making your system, and possibly other systems, vulnerable to attack.
Another reason may involve the potential for embarrassment. Certain communications made public may be a surprise to some of the people involved. An example is email that goes to a list that is archived in a web-accessible way where the sender was unaware of the lack of privacy. Another example is references to vendors that should not be made publicly available. You may think it unlikely that your web page will be found, but many people and companies regularly do searches on their names and the names of their products, so if the search engines index your pages, they may be found by an audience you didn't intend.
Set up your server correctly! For security reasons, don't serve password files, email, cgi source code, etc. Here are some references for webmasters:
- Information for Fermilab Webmasters which links to:
- Security Information Resources which includes a link to:
Ways to Restrict Access
Access to web pages can be controlled in a number of ways:
- Limit what the search engines can index
- Control access by IP address
- Allow access by password only
Note, none of these methods is very secure with the standard servers. For example, passwords are generally passed without encryption. If you have sensitive information to which access must be controlled, you need expert advice.
Limit what search engines index
Since a very important way to find things that shouldn't be found is by use of a search engine, you can control access by the search engines. Of course, these methods don't keep anyone from seeing your pages if they know the URL.
Also note that restricting access or unlinking a file will not remove it from a search engine. Even re-indexing your site may not get this file removed. They can remain for many months. In fact, even dead links remain for a long time (as anyone who uses search engines a lot knows). The only way to be sure a file is no longer available from a search engine if it once was listed is to move (change the URL) or remove it.
Web Robots are programs that explore the web automatically. They are also sometimes called spiders or crawlers. The search engine indexers are web robots.
One way to exclude log files, dynamic pages, and anything else you don't want indexed by specifying the directories or files in a robots.txt file in your root area. Whenever an indexing robot visits your site, it first looks for a file named robots.txt. For example, such a file on the main webserver would have the URL: http://www.fnal.gov/robots.txt Only one robots.txt on a site will be recognized.
You can exclude all robots (*) or specific robots. You can exclude all pages, specific directories, or individual directories. Expressions are not allowed.
The following robots.txt file restricts all search agents from searching the directories /logs/ /private/ and any subdirectories thereof.#no robots
If you don't have access to the root area of your server, you can specify access on a page-by-page basis using META tags. (Note, all search engines may not honor this META tag.) META tags belong in the HEAD section of your document. In this example, the page containing this META tag should not be indexed nor its content analyzed for links:<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Allow only certain IP addresses to access some pages
Restricting access to web pages by IP address allows you to grant or deny specific computers, groups of computers, or domains access to Web sites, directories, and in some cases, individual files. For example, you can allow access only from computers in the fnal.gov domain or only by specific computers based on their IP address.
Restricting access by IP address is web server specific. For information about how to do this with two of the popular servers, check here:
On Unix using an Apache server, IP address restrictions can be contained in the server configuration, or the server can be configured to allow access control commands on a directory by directory basis. If your server is an NT system running IIS, such configurations are performed through the Internet Service Manager.
Note that sometimes it is preferrable to use 131.225 for restricting access to Fermilab staff and users instead of using fnal.gov. The reason for this is that there are occasionally non-fnal computer domains that are used and administered by Fermilab staff and users. These have different domain names than fnal.gov, but still have an IP address in the 131.225 range.
To test that your IP-based protections are working, you need to be able to test web pages from an IP that is off-site or outside of the specified subnets/IPs you have restricted. You can do this from the comfort of your own computer by using the following URL to surf anonymously.http://jproxy.uol.com.ar/jproxy/http://your_web_site/your_directory/...
Allow access by password only
By allowing access by password only, you can restrict access to small groups of people or individuals depending on the level of security you need. This kind of restriction is also server specific:
On a Unix server running Apache, password access is also on a directory by directory basis. You first set up a password file and then create an .htaccess file in the target directory. Web authentication on NT systems is by use of the NT authentication.
An easy way to administer restrictions to a web area for a group of people is to have one username and password for everyone. This way people can find out the username and password from their colleagues. Obviously, this method is only appropriate for pages which are not really "secret".
Do not use your system password as your web-access password. Web access passwords are not as secure as system passwords.
|Security, Privacy, Legal|