![]()
By far the most well known and talked about Internet service today is the HTTP or Web Service. HTTP stands for "Hyper Text Transfer Protocol," and is widely popular because it allows information to be quickly transferred from your DWH Server to anyone in the world and have it displayed it in an attractive format on a client program called a Web browser. When most people talk of "getting on the `Net" or "surfing", what they are really talking about is accessing the HTTP service through their Web browser.
Some of the most common files that are accessed through a Web browser include:
.html or .htm files: which are often called Web pages. These files are written in a language called Hyper Text Markup Language which is used because of its easy formatting characteristics, its simplicity, and its ability to create links to other Web pages or other files.
.gif files: graphics files which load quickly and contain no more than 256 unique colors. Some gif files are animated (called animated gifs), some come through in layers of increasing detail (called interlaced gifs), and some are displayed normally.
.jpg files: graphics files which can contain millions of colors and generally take longer to transfer. These are generally used for photographs and highly detailed artwork.
Many other types of files are also viewed or otherwise processed through Web browsers, and the list of the files continues to grow.
HTTP (Hypertext Transfer Protocol): the method that makes it possible for Web browsers and Web servers to communicate with each other
WWW (World Wide Web): a network of files spread out among the vast number of computers connected to the Internet. These files contain information, pictures, sounds, and other media and can be easily viewed through a client program called a Web browser. When most people refer to the Internet, they are actually referring to the World Wide Web.
Web Server: a computer connected to the Internet that makes files available to the public
HTML (Hyper Text Markup Language): a language that is used to create documents for the Web. HTML documents, also called Web pages, are easily formatted by a Web browser for quick display.
Web Site: a collection of linked files on a Web server
Web Browser: a program that allows you to view files on a Web server from your computer
URL (Uniform Resource Locator): an address which identifies a specific file on the Internet. URLs follow a standardized format which consists of a protocol type, a domain name or IP address which identifies the computer which contains the file, and a path to the file.
SSL (Secure Socket Layer): a method of encrypting information on a DWH Server as it is sent to a Web browser, in order to protect the information from being intercepted and viewed by someone other than the intended viewer. It is often used to securely transfer credit card numbers and other sensitive information.
Your DWH Server comes pre-configured with the Apache Web Server program installed as your HTTP (or Web) server. From the moment your DWH Server is turned on, it is providing HTTP services. The Apache Web server is very flexible and has many configuration options. In most cases, you will never need to make any changes to the configuration files that define how your HTTP service operates. On the other hand, if you use some of the more advanced features offered by you DWH Server, you may wish to learn more about the "ins and outs" of these files. Additional information about your Web service's configuration files is available at the Apache Web site at http://www.apache.org/.
Your DWH Server has been assigned a unique IP address that identifies it from all the other computers on the Internet. It looks something like 203.37.xxx.xxx, where the x's depend on where your DWH Server resides physically on our network. You should have received this IP address when your DWH Server order was processed. You can access your DWH Server at any time through a Web browser by typing this IP address into the address bar. Since you want your DWH Server to be easily accessible to the public, you most likely registered a domain name to point to this IP address. If so, you can also access your DWH Server using this domain name. Be aware, however, that if this is a new domain name, or a domain name that previously pointed somewhere else, it may take a few days for the domain to be registered or redelegated, whichever the case may be, and for the new information to propagate throughout the internet making your domain "live" on your Virtual Server.
Every file on every Web server on the Internet is accessed through a unique address. These addresses, called Uniform Resource Locators or URLs, all follow the same format. All URLs start with a protocol type, followed by a domain name or IP address which identifies a computer, then the path to the file. For example, to access the home page of the DWH Servers.com Web site, you would put the following URL in your Web browser's address bar:
http://www.dutchwebhosting.nl/index.html
The http:// is the protocol definition, which means that you want to access the server's HTTP service (as opposed to another service like FTP, Email, or Telnet). In many browsers, you can simply omit the protocol definition because it assumes you want to use http:// if no other protocol is specified. Next comes www.dutchwebhosting.nl the computer name and domain name which points to the actual computer that houses the DWH Servers Web site. Finally, the /index.html is the path to the file containing the HTML code that makes up the homepage of the DWH Servers Web site. By default, if no file name were specified, your DWH Server looks for a file called index.html. If no such files exists, it builds an index of all the files in the specified directory and displays this instead. This means that the page in this example could also be referred to as http://www.dutchwebhosting.nl/, or even www.dutchwebhosting.nl in most Web browsers.
When your DWH Server is first setup, a very simple, generic home page named index.html is created for you. This file is located in the /www/htdocs directory. Log in to your DWH Server as described in Chapter 3 and type the following at the prompt:
cd /www/htdocs Enter
Make sure you're in the correct place. Type:
pwd Enter
Your DWH Server should respond with /usr/local/etc/httpd/htdocs, where username is the username that was assigned to you upon ordering. It is important that you understand why this named as it is. If this is not what you expected to see, or you do not understand why the name of this directory is so long, please refer to Chapter 3. Because the name of this directory is so long, we refer to it in this guide as /www/htdocs, referring to the symbolic link mentioned in Chapter 3.
The /www/htdocs directory is a special directory on your DWH Server. It's commonly referred to as the Document Root. Only files and directories contained within this directory are accessible by the public through a Web browser. Therefore, if you create or own Web pages or have someone else create them for you, they need to be placed in this directory before they will be available through your DWH Server's HTTP service. When your server is first set up, there should be a single file in this directory called index.html. To verify this, type:
ls Enter
You should see the file listed. What's in this file? You can take a look by using the more command. Type:
more index.html Enter
You will see the contents of this file, which is simply a text file written in HTML. Remember that the more command will wait for you to press the space bar before it scrolls past the limit of your screen, so continue pressing the space bar until your normal prompt appears. Now, depending on your experience with HTML, you may or may not understand the contents of the file that you just viewed. Don't panic! You do not necessarily ever need to learn HTML to create a great Web site if you don't want to, but it is good idea to at least see what HTML looks like. Your Web browser is fluent in HTML and can make sense of HTML code even if you cannot. Take a look at what this page looks like through a Web browser.
Open your Web browser and enter either the IP address that your DWH Server was assigned, or its domain name (if your domain name is already properly pointing to your server). You should see a simple Web page indicating that you've found the future home of your Web site. If you have trouble connecting to your DWH Server in this way, you may need to contact your local Internet provider or the DWH Servers support department for assistance.
Use the pico editor on your DWH Server to create a simple HTML file from scratch. Return to the command prompt and be sure that you are in your DWH Server's document root. If you're not there, or you're not sure if you are there or not, type:
cd /www/htdocs Enter
This command will take you to your document root regardless of where you currently are. To keep things tidy, create a test directory in your document root which you can use to store your Web page. Type:
mdkir test Enter
Move into this newly-created directory by typing:
cd test Enter
You should now be in the /www/htdocs/test directory. Now, create a new html file. Using an online or local text editor, type in the following lines:
<html>
<head>
<title>My First Web Page</title>
</head>
<body>
This is my first Web page.
</body>
</html>
When you're finished save the file and place it in your /www/htdocs/test/ directory. Return to your Web browser and enter the following into the address bar, replacing 203.37.xxx.xxx with your IP address:
You should see a very simple Web page containing the text "This is my first Web page." You've just created your first Web page on your DWH Server, and you should be starting to feel comfortable with UNIX, Telnet, the command prompt, your Web browser, and the way that your DWH Server serves HTML documents. Notice that the "/test/" in the URL that you entered into your Web browser refers to the directory that you created in your document root using the mkdir command. The "test1.html" part refers to the name of the file you created with the pico editor. Congratulations! You've come a long way.
What would happen if you pointed your Web browser to this test directory, but did not specify the "test1.html" file? Try it and see. Put the following URL into your Web browser, substituting your IP address as you did before:
You should see an index that was generated by your DWH Server displaying the contents of this directory. The only file in the directory should be your test1.html file. If you click on this file with your mouse, your Web browser will show you the contents of the HTML file that you created. What if you wanted this file to come up automatically when you referred to the /test/ directory? Remember that by default, your DWH Server will look for a file called "index.html" if no file name is specified in the URL. If a file with that exact name doesn't exist, your DWH Server builds it's own index and displays that instead. You can see this in action by renaming your "test1.html" file to "index.html" using the mv command. Return to your command prompt and type:
mv test1.html index.html Enter
Now return to your Web browser and press the refresh or reload button. You should now see your homemade Web page being displayed. What happens if you try to access the file "test1.html" through your Web browser now? Try it by entering the URL http://203.37.xxx.xxx/test/test1.html and see. You should see a message that says that the file could not be found. Why? Because it not longer exists. That file was renamed to "index.html."
You've now learned the basics of Web publishing on your DWH Server. In the next few sections, you'll learn about how you can do to customize your DWH Server's HTTP service to make it do more powerful things.
In certain circumstances, may want to restrict access of certain areas of your Web site to one person or a specific group of people. It's easy to restrict access to a certain directory based on a user name and password. You can also allow or deny access to areas of your Web site based on the IP address of the computer that is trying to access the site. Restricting access is a simple process that is done from the command prompt. If you're using the Microsoft FrontPage, there's an easy way to do this, which is explained in the FrontPage documentation.
Suppose you wanted to restrict access to the /test/ directory that you just created in your document root. The key to restricting access to specific directories is a file called .htaccess. An .htaccess file contains instructions about who can an cannot access files in a certain directory, and its instructions apply to that directory and all directories below it. Depending on what you put in the .htaccess file you can allow or deny many types of access. There are three main types of access restriction.
The simplest, and most common, form of access restriction is to define a user name and password that your DWH Server will be require in order for someone to access the directory through a Web browser. For this example, you'll restrict access so that the username "myuser" and the password "mypassword' are required in order to access the /test/ directory in your DWH Server's document root.
The first step in basic password restriction is the use of the htpasswd command to create a password file containing an encrypted, or scrambled, version of the password. The most common location for such a file is in the /etc directory of your DWH Server. To get there, type:
cd /etc Enter
Now, use the htpasswd command with the -c option to create a new password file. You only need to use the -c option if you're creating a new password file, which you are in this example. Call the new password file "test.pwd," although the name can be anything that is meaningful to you. To use the htpasswd command, you need to type the command, followed by the appropriate option, then the file that will contain the encrypted password, and then the username you wish to add to the password file. To do this for "myuser," type:
htpasswd -c test.pwd myuser Enter
You'll be prompted to enter a password twice. For this example, use "mypassword" (without the quotes) as the password for this user. When you're done, a file called test.pwd will have been created in this directory. If you would like to see what it looks like, type
more test.pwd Enter
You'll see something like:
myuser:x2i8Pk9WufYJ
The garbled text next to the username is actually the password you chose, but it's understandable only by your DWH Server. This helps protect your password from being discovered by unauthorized persons. It's important to remember that the -c option shown above should only be used when creating a new password file. If you want to add another entry to an existing password file, you need to omit the -c option, or you'll delete the exiting password entries in the file.
Now that you have successfully created the password file, you're ready to create the .htaccess file. First, be sure you're in the directory that you want to protect. In this example, this is the /test/ directory you created in your document root, so to get there you should type:
cd /www/htdocs/test
Create the .htaccess file with an online text editor such as "ee" or Windows Notepad on your local PC.
pico .htaccess Enter
Using the editor, carefully type the following lines :
AuthUserFile /etc/test.pwd
AuthName This is a Protected Area
AuthType Basic
<Limit GET>
require user myuser
</Limit>
The "AuthUserFile" line tells the Web server where the password file that protects this directory is located. The "AuthName" line contains the text that will be displayed in the password box that pops up in your Web browser when you try to access this page. The "AuthType" should always be set to basic. The "Limit" area contains the specific instructions as to how you want to limit access to this directory. First of all, you're limiting "GET" access, which is the standard way in which Web browsers "get" files from a Web server. Next, you're telling your Web server that in order to access the files in this directory, you're requiring that the username be "myuser."
After typing this all in, be sure that you press Enter a couple of times at the end of these lines. When you're done save the file and exit the editor. When you try to access this directory through your Web browser, you will now be required to enter this user name and password. Only upon doing so will you be able to view the contents of this directory.
Suppose that instead of granting access only to a single username and password, you wanted to grant access to a group of users with their own passwords. This is done in a similar manner as the single password protection method described above. However, you will also need to create a user group file, and there are some minor differences in the contents of your .htaccess file. For this example, assume that you want to give access to three different users, named "user1," user2," and "user3." You want "user1" to have access with the password "password1," "user2" to have access with "password2," and "user3" to have access using the "password3." The first step would be to use the htpasswd command to create password file containing encrypted versions of the all three of these passwords. See the previous section for more information on the htpasswd command. The most common location for a password file is in the /etc directory of your DWH Server. To get there, type:
cd /etc Enter
Now, create the password file. Name it "test2.pwd," although the name can be anything that is meaningful to you. Type:
htpasswd -c test2.pwd user1 Enter
You'll be prompted to enter a password twice. For this example, use "password1" (without the quotes) as the password for this user. When you're done, a file called "test2.pwd" will have been created in this directory. Now, you'll want to add entries in this file for "user2" and "user3." You'll need to use the htpasswd command again to do this, but this time you won't use the -c option to create a new password file, since the password file already exists. If you did use the -c option, it would create a new file over the top of the old one, erasing the entry you just made for "user1." Type:
htpasswd test2.pwd user2 Enter
This time, type "password2" when prompted for a password. When your prompt returns, type:
htpasswd test2.pwd user3 Enter
Enter "password3" as the password, and your password file is now complete. If you would like to see what it looks like, type:
more test.pwd Enter
Your next step is to create the user group file. This file defines the group of users that will have access. This saves you from having to include the name of every member of the group in the .htaccess file in every restricted directory on your Web server. If you plan on restricting more than one area, then using a group file is a smart thing to do. Like password files, the most common place for a group file is in the /etc directory. A group file simply contains one or more groups of users that you will later authorize to have access. For this example, create a group file called "test2.grp," and define a group called "users" that contains the users "user1," "user2," and "user3."
If you later wanted to define additional groups in this same file, you could add a new line and add another line for the additional group. It is important to keep all the users that belong to a single group on the same line. They can not be wrapped around onto more than one line.
The last step in this example is to create or modify the .htaccess file in the directory that you want to protect. In this case, you're protecting the directory called "test" in the document root, so you'll need to edit the .htaccess file located there.
Use your editor to make this file look like the following:
AuthUserFile /etc/test2.pwd
AuthGroupFile /etc/test2.grp
AuthName Enter Your Password
AuthType Basic
<Limit GET>
require group users
</Limit>
Access to this directory is now restricted to a user that appears in the group file. That user is required to enter his matching password that is encrypted in the password file. Try accessing this directory through your Web browser now by pointing your browser to http://yourdomain.com/test/ and you should be asked for a username and password.
One of the most popular modifications users make to their DWH Server settings is customizing your Web server's error pages. By default, these error pages are an ugly gray background with plain black text. There are many reasons why a DWH Server would return an error page. The most common are summarized in the following table:
Error Type |
What causes the error |
Error Code |
Page Not Found |
A file was requested that does not exist. |
404 |
Access Forbidden |
A user tried to access a file or directory that they did not have permission to access. |
403 |
Authorization Required |
A user tried to access a password protected file or directory and did not give the correct username and password. |
401 |
Generic server error |
For any of a number of reasons, the server encountered some type of error. |
500 |
You can customize these pages with three easy steps. First, create an "errors" directory in your DWH Server's document root. Next, create your error pages as HTML documents and place them in the "errors" directory. Third, edit the srm.conf file in the /www/conf directory of your DWH Server, and define your custom error pages. Following is an example of how this is done.
First, log in to your DWH Server. Move to the document root by typing:
cd /www/htdocs Enter
Create a directory to contain your error pages by typing:
mkdir errors Enter
Next, you should create any custom error pages that you want to use and place them in this directory. They don't have to be placed in the "errors" directory, the can be placed in any directory you wish, as long as it's in or below the document root. This example, however, will assume they're in the /www/htdocs/errors directory.
Finally, you'll need to edit the srm.conf file which resides in the /www/conf directory of your server. This file contains several important settings for your DWH Server, so you should make a backup copy before editing it. From the command prompt, type:
cd /www/conf Enter
cp srm.conf srm.bak Enter
Open up your srm.conf file and scroll down near the end of this file until you see a section that looks something like the following:
# If you want to have files/scripts sent instead of the built-in version
# in case of errors, uncomment the following lines and set them as you
# will. Note: scripts must be able to be run as if the were called
# directly (in ScriptAlias directory, for instance)
# 302 - REDIRECT
# 400 - BAD_REQUEST
# 401 - AUTH_REQUIRED
# 403 - FORBIDDEN
# 404 - NOT_FOUND
# 500 - SERVER_ERROR
# 501 - NOT_IMPLEMENTED
#ErrorDocument 404 /errors/NotFound.html
#ErrorDocument 302 /cgi-bin/redirect.cgi
#ErrorDocument 500 /errors/server.html
#ErrorDocument 403 /errors/Forbidden.html
Remember than anytime you see a # placed at the front of a line in one of your DWH Server's configuration files, it means that what follows on that line is simply a comment and has no effect. We've placed #'s on these lines so that you can easily enable the custom error messages that you want by simply removing the # and making any necessary changes without having to write the whole line yourself.
The important lines in this section that you may want to modify are the lines that start with "ErrorDocument." Each of these lines contains an error code, which are listed in the table above, and a path from your document root (/www/htdocs directory) to the HTML file that contains your error page. Assume that you created a custom error page called "NotFound.html" for your new "File not Found" error message, and another file called "Forbidden.html" for your new "Access Forbidden" message. This example will assume also that you placed these two files in a directory called "errors" in your document root. To enable these two custom error pages, you would need to change the lines listed above so that they looked like this:
ErrorDocument 404 /errors/NotFound.html
#ErrorDocument 302 /cgi-bin/redirect.cgi
#ErrorDocument 500 /errors/server.html
ErrorDocument 403 /errors/Forbidden.html
To see your custom error pages in action, try pointing your Web browser to an address like http://www.yourdomain.com/blah.html to test your custom "File Not Found" error message, or go to http://www.yourdomain.com/cgi-bin/ to see your new "Access Forbidden" page.
Your DWH Server tirelessly keeps track of all the HTTP activity that occurs, logging detailed information about each and every request that comes in through its HTTP service. The log files your DWH Server generates are stored in the /www/logs directory. By default, there are four distinct records kept of your HTTP activity. You should get to know them.
At the command prompt, type:
cd /www/logs Enter
ls Enter
Unless absolutely no activity has occurred on your DWH Server yet, you should see the following four files:
access_log: This file keeps track of what information is being requested, who made the request, and how your DWH Server handled the request.
referer_log: This file keeps track of where your Web visitors are coming from. More precisely, it tells what your visitors were viewing just before they came to your site. This is helpful for tracking any external Web sites that have has links pointing to your site. This can help you decide if any of your advertising efforts are paying off.
agent_log: This file keeps tracking of what Web browser software your visitors are using to view your site. It lists the browser name as well as the version. This information can be helpful in designing your Web site so that it looks its best to the majority of your visitors.
error_log: This file records any error messages that were returned by your Web server, and information about what caused the error. This information is very valuable in helping you discover incorrect links to your Web site, and in debugging any CGI programs that you are trying to use.
Information is added to these four log files constantly in real time as activity occurs. Each individual request for a file is listed on a separate line. People often talk about hits when discussing Web activity. A hit is simply a request for a file, so one line in a log file is equal to one hit. You can understand a hit in more detail by seeing what it looks like in each of these log files.
Here is an example of what's recorded in the access_log file for each request, or hit:
ip218.m4.nwlink.com - - [28/May/1997:02:24:19 -0700] "GET /index.html/ HTTP/1.0" 200 420
So what does all this mean? The first part identifies who made the request, or more specifically, the address of the computer of the person who requested the information. In this example, this address is "ip218.m4.nwlink.com." Depending on the way you have your DWH Server configured, this information could also be stored as a standard IP address consisting only of numbers.
Next, you'll see information about when the request was made. It's listed in [day/month/year:hour:minute:second GMT variance] format. In this example, this information was requested on May 28, 1997 at 2:24 am and 19 seconds, in a time zone that is -0700 variance from GMT, such as Mountain Standard Time.
Following the time information, you'll see the actual request that was issued by a Web browser to your Web server. The request is the information inside the quotes. The request is made up of the request method, the path to the file requested, and the version of the request. There are only two methods to request a file: GET and POST. In this case, the Web browser tried to get a file. In this example, the path to the file requested was /index.html. All paths in your access_log start at the document root, so the file being requested here is actually the index.html file in the /www/htdocs/ directory. The final part of the request is the HTTP version that was used to request the file. Most of the time, the version used is HTTP/1.0.
The last piece of information that is recorded in the access_log file is the Web server's response to the request. The Web server's response code will vary according to the request. If the request was successful, and the requested file was delivered to the Web browser, then the response code is "200." If the request was successful, but the requested file did not have to be delivered to the Web browser because the visitor's Web browser already had a current version of the requested file in its cache, then the response code is "304." If the request was unsuccessful because the Web server couldn't find the requested file, the response code is "404." This "404" corresponds to the "404 File Not Found" error returned by your Web server. In this example, the response code was "200," which means the Web server found the requested file and delivered it to the Web browser.
Finally, following the Web server's response code, comes the size of the response, given in bytes. In this example, 420 bytes of information was sent, so you can figure out that index.html file must be 420 bytes in size. If no information is actually sent in response to the request, as is the case with a cached "304" response, a dash (-) appears in place of a number. Even if the server's response code is "404," which means the file was not found, there can be a size to the response, which is size of the error document that was returned to the Web browser.
Now that you understand what's recorded in an access_log, take a look at your access_log on your DWH Server. Type the following at the command prompt:
cd /www/logs Enter
To take a look at the last five lines of any file, you can use the tail command. Type the following:
tail access_log Enter
You should see the last five lines of this log file, which represent the last five requests made to your Web server. Can you understand what they mean?
Web browsers deliver what is called a User Agent String along with their requests. At the least, this string usually contains information about whether or not the browser is Netscape compatible, the name and version of the browser, and the platform which the browser is running on. The information listed in this log is determined by the browser manufacturer, and there are no set standards. Still, this information can be useful. Here's an example of a record in the agent_log:
Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)
Again, you should examine this record in pieces. The first part, "Mozilla/2.0," indicates that the browser is Netscape compatible. Mozilla was an early name used for the Netscape browser. Most of the widely used Web browsers, including Microsoft Internet Explorer and Netscape Navigator, will have this designation.
Next comes the name and version of Web browser being used, which in this case is MSIE 3.01 - meaning Microsoft Internet Explorer version 3.01.
Finally, the operating system of the visitor's computer is given. This example shows that the computer was running Windows 95.
Now that you understand what a record in the agent_log means, take a look at the actual agent_log on your DWH Server. At the command prompt, type:
cd ~www/logs Enter
This will make sure that you're in the correct directory. To see the first 5 lines of any file, you can use the head command. Type:
head agent_log Enter
You'll should information about the Web browser that your first five visitors were using. Does this log make sense to you now?
The referer_log file is particularly useful for determining how people are finding your Web site, and which Web sites are referring Web traffic to you. Here's an example of a record in a referer_log:
http://www.microsoft.com/products/download.html -> /promo/switch.html
This record is divided into two parts. The first half of this line tells the complete URL of the page that contained a link to your Web site, or in other words, the page that referred the Web browser to you. In this example, it's http://www.microsoft.com/products/download.html. The second half displays the file on your DWH Server that was requested, which in this case was /promo/switch.html.
Now that you understand what is contained in the referer_log, take a look at the referer_log on your DWH Server. At the command prompt, type:
cd ~www/logs Enter
This makes sure that you're in the correct place. Again, you can use the tail command to view the last five records in the file. This time, however, instead of just displaying the last five lines of the log, you can use the -f option to tell the tail command to keep the file open on the screen and update it if any new information is added to it. This can be a lot of fun once your Web site starts getting some heavy traffic. There's not much that can compare to the feeling of watching hundreds of people coming to your well-designed Web site! Type:
tail -f referer_log Enter
Like the last time you used the tail command, you'll see the last 5 lines of your referer_log appear on your screen. However, this time you'll notice that your command prompt does not come back. Your Telnet session is in a stand-by mode and will display any new information that is written to this file. Try pointing your Web browser to your own Web site, and navigate through it while watching the entries being added to this file. When you're done, you'll need to tell your DWH Server that you want to stop the tail command by pressing Ctrl+C.
Each time that your DWH Server returns an error message to a Web visitor, the message is also recorded in the error_log. This information is particularly useful for determining troublesome areas of your Web site, including broken links and malfunctioning CGI programs. It helps you to maintain your Web site and keep it error-free and professional. Here's an example of a record in an error_log:
[Thu Jun 26 22:25:27 1997] HTTPd: send aborted for 153.34.112.95, URL: /gifs/ie_anim.gif
In this example, an error_log entry contains the date and time that the error occurred, followed by a description of the error that occurred. In most cases, the IP address of the visitor who received the error is given, and the file that was requested is usually listed as well.
Now that you understand what's recorded in the error_log file, take a look at the error_log on your DWH Server. At the command prompt, type:
cd ~www/logs Enter
This will place you the correct directory. Use the tail command again to view the end of the file, but this time, tell the tail command to show you the last 20 lines of the log file, rather than just the last five lines. Type:
tail -20 error_log Enter
You should see up to 20 errors messages that were generated by your DWH Server. If no errors have occurred, then you won't see anything.
Because some much information is stored in your log files, it's recommended that you reset them regularly. This prevents them from taking up a large amount of your DWH Server's disk space. There is a special command, called vnukelog, that will reset all of your DWH Server's log files for you automatically. To reset your log files, type the following from the command prompt:
vnukelog Enter
Some people prefer to have their DWH Server record all their HTTP activity in a single log file that contains access, referer, and agent information in a combined log format. This format is useful if you want to be able to have all this information in a central location, rather than split into three different files. The combined log format may also be necessary when using certain log file analysis software packages.
If you would rather have your logs stored in this format, you need to make several changes to your DWH Server's main configuration file. Since corrupting this file can cause your DWH Server to stop responding entirely, you should always make a backup copy of this file before editing it. At the command prompt, type:
cd ~/www/conf Enter
cp httpd.conf httpd.bak Enter
Now, use an editor to edit the httpd.conf file and make the necessary changes.
This file is rather large, and contains many important configuration settings. For this exercise, you should only be interested in a particular section that deals with log files. There are four lines in this file that define your HTTP logging. They look like the following, although they are separated by several lines of comments:
TransferLog logs/access_log
ErrorLog logs/error_log
AgentLog logs/agent_log
RefererLog logs/referer_log
Suppose that you want referer, agent, and access information to be recorded in a combined log format in a single file called access_log located in the /www/logs directory. You would need to change the lines above so that they looked like:
TransferLog logs/access_log
ErrorLog logs/error_log
#AgentLog logs/agent_log
#RefererLog logs/referer_log
You would then need to add the following line to this section of the httpd.conf file:
# Combine Log Files
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\""
That's certainly a confusing line to have to put in, but it is necessary. Enter it exactly as it appears, including all the quotes. When you're finished, close the pico editor by pressing Ctrl+X, Y, Enter.
Before your log files will actually be useable in this combined format, you need to nuke your existing log files by typing:
vnukelog Enter
This will reset them so that no old entries remain.
As useful as the information in your log files is, you can't really make sense of what's going on with your Web server without using some type of statistical analysis software. There are a number of commercial software packages out on the market that you can buy. However, DWH Servers provides two great statistics tools as part of our service. One tool is the getstats command, and it's ready to use immediately. The other tool is a popular program called analog, which can easily be installed on your DWH Server by following the instructions in this section.
Analog is a great statistical analysis program, and is available free of charge from DWH Servers. A recent Internet survey showed that Analog is the most popular log file analysis program in the world, being used by over 20% of all Webmasters on the Internet. Analog runs on your DWH Server, quickly analyzes your server logs, and creates easy to read HTML reports that you can view from your Web browser. To use analog on your DWH Server, you first need to install it. After installing analog, you will need to run the program every time you want to create an updated report, or you can use your DWH Server's cron facility, explained later in this guide, to automatically run the program at a specified interval.
To install analog for the first time on your DWH Server, type the following from the command prompt:
vinstall analog Enter
This will begin the analog installation process. You may be asked for information as to how you would like analog to be set up on your DWH Server. When the installation process in completed, you will notice that your /www/ directory now contains a subdirectory called "analog." This directory contains the analog program files, as well as the source code files that can be modified if you desire to customize analog. Keep in mind that customizing analog requires some programming knowledge, and that it will be necessary to recompile the program if you choose to do so.
To run analog, simply type:
analog Enter
You don't have to be in any specific directory to run analog. By default, the report that the analog program creates is saved as /www/htdocs/analog/index.html. This means that once analog completes its work, you can open your Web browser to http://www.yourdomain.com/analog/ and view your statistics report. You may decide that you want to password protect this directory so that this report is not seen by anyone except those people that you authorize.
For more information on analog, visit http://statslab.com.ac.uk/~sret1/analog/Readme.html.
copyright Internet Service Europe Ltd