Lab 5 - Apache Web Services¶
Welcome to the Web Services lab. Here's a quick overview of this week's tasks:
- Validation of previous week's tasks
- Introduction to web services
- HTTP protocol
- HTTP Flow
- HTTP Messages
- HTTP Headers
- Setting up a personal web server
- Understanding DNS CNAME records
- Virtual Web Hosts Using Apache Web Server
- Utilizing a web server as a proxy
- WordPress setup
- Configuring Apache modules
- Forensic logging
- Security-hardening our webserver using mod_security
- Ansible tips
- Tags
- Recommended modules
- Handlers
1. Validation of previous week's tasks¶
Please make sure the following tasks have been completed before starting this lab, as we are using services from the previous labs:
- Personal domains are configured. Machine is accessible over
<machine_name>.sa.cs.ut.ee
inside the University network. - All the tests on
scoring.sa.cs.ut.ee
are green.
For helping you to better debug and view the web services you'll be setting up this week, it's also ideal to validate your personal machine's DNS servers.
Danger
Your personal machine's (the computer you are using - NOT THE VM) DNS servers should be set to:
-
ns.ut.ee (IP: 193.40.5.39)
-
ns2.ut.ee (IP: 193.40.5.76)
The University VPN service should set these automatically, but please validate this. Without having the DNS servers set, you will have problems connecting to the websites you set up, and this will cause a lot of confusion.
You can validate by trying to resolve <machine_name>.sa.cs.ut.ee
from your personal machine.
2. Introduction to web services¶
Before we can host our own web services, we need to understand how the browsers work. As most of the web works by utilizing the HTTP protocol, we'll first have a look into that.
2.1 HTTP System Architecture¶
HTTP functions as a client-server protocol where requests originate from a user-agent (typically a web browser) and are sent to servers that process these requests and return responses. Between these endpoints exist various intermediaries called proxies.
The Client Side
The user-agent acts on the user's behalf, with web browsers being the most common example. Browsers initiate all requests in the communication flow. When loading a webpage, a browser first requests the HTML document, then makes subsequent requests for additional resources like scripts, CSS files, images, and videos. These components are assembled to render the complete webpage.
Websites function as hypertext documents, containing clickable links that, when activated, trigger new HTTP requests to fetch different pages. The browser handles the translation of user interactions into HTTP requests and interprets the resulting responses.
The Server Side
The server responds to client requests by providing the requested documents. While appearing as a single entity, a server may actually be:
- A cluster of machines sharing workload
- A collection of software components (caches, databases, e-commerce systems)
- Multiple server instances on a single physical machine (possible through HTTP/1.1's Host header)
Intermediary Proxies
Between browsers and servers, numerous computers relay HTTP messages. While many operate at lower network layers (transport, network, physical), those functioning at the application layer are called proxies. These can be:
- Transparent: forwarding requests without modifications
- Non-transparent: altering requests before forwarding
Proxies serve various purposes including:
- Caching (public or private)
- Content filtering
- Load balancing
- Authentication
- Request logging
The layered design of the web means that lower-level network components remain hidden within the network and transport layers, while HTTP operates at the application layer.
2.2 HTTP protocol¶
HTTP is a protocol for fetching resources such as HTML documents. It is the foundation of any data exchange on the Web and it is a client-server protocol, which means requests are initiated by the recipient, usually the Web browser.
HTTP, developed in the early 1990s, is a flexible protocol that has continuously evolved. Operating at the application layer, it typically runs over TCP connections, which may be secured with TLS encryption, although it could theoretically work with any reliable transport protocol. Its versatility extends beyond retrieving hypertext documents - HTTP handles images and videos, processes form submissions to servers, and can even fetch specific document segments for dynamic webpage updates.
Each individual request is sent to a server, which handles it and provides an answer called the response. Between the client and the server there are numerous entities, collectively called proxies, which perform different operations and act as gateways or caches.
In reality, there are more computers between a browser and the server handling the request: there are routers, modems, and more. Thanks to the layered design of the Web (OSI layers), these are hidden in the network and transport layers. HTTP is on top, as the application layer (7th layer). Although important for diagnosing network problems, the underlying layers are mostly irrelevant to the description of HTTP.
To display a Web page, the browser sends an original request to fetch the HTML document that represents the page. It then parses this file, making additional requests corresponding to execution scripts, layout information (CSS) to display, and sub-resources contained within the page (usually images and videos). The Web browser then combines these resources to present the complete document, the Web page. Scripts executed by the browser can fetch more resources in later phases and the browser updates the Web page accordingly.
A Web page is a hypertext document. This means some parts of the displayed content are links, which can be activated (usually by a click of the mouse) to fetch a new Web page, allowing the user to direct their user-agent and navigate through the Web. The browser translates these directions into HTTP requests and further interprets the HTTP responses to present the user with a clear response.
Something to also mind is that HTTP is stateless (but not sessionless). This means that it's up to the client to track the success of requests. For doing this, every response given by the servers also contains a response code, which is an indication of whether something went wrong and what.
Examples of status codes:
-
200 - OK
-
401 - Unauthorized (the client must authenticate itself to get the requested response)
-
403 - Forbidden (The client does not have access rights to the content; that is, it is unauthorized, so the server is refusing to give the requested resource.)
-
404 - Not found (Requested resource is not found on the server)
-
405 - Method Not Allowed (HTTP Method used is not allowed by the server)
-
418 - I'm a teapot (The server refuses the attempt to brew coffee with a teapot.)
2.3 HTTP Flow¶
When a client wants to communicate with a server, either the final server or an intermediate proxy, it performs the following steps:
-
Open a TCP connection: The TCP connection is used to send a request or several, and receive an answer. The client may open a new connection, reuse an existing connection, or open several TCP connections to the servers.
-
Send an HTTP message: HTTP messages (before HTTP/2) are human-readable. With HTTP/2, these simple messages are encapsulated in frames, making them impossible to read directly, but the principle remains the same. For example:
GET / HTTP/1.1\\ Host: developer.mozilla.org\\ Accept-Language: fr
-
Read the response sent by the server, such as:
HTTP/1.1 200 OK\\ Date: Sat, 09 Oct 2010 14:28:02 GMT\\ Server: Apache\\ Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT\\ ETag: "51142bc1-7449-479b075b2891b"\\ Accept-Ranges: bytes\\ Content-Length: 29769\\ Content-Type: text/html\\ \\ <!DOCTYPE html... (here come the 29769 bytes of the requested web page)
-
Close or reuse the connection for further requests.
2.4 HTTP Messages¶
There are two types of HTTP messages: requests and responses, each with its own format.
Requests¶
An example HTTP request
Requests consist of the following elements:
-
An HTTP method, usually a verb like
GET
,POST
, or a noun like OPTIONS or HEAD that defines the operation the client wants to perform. Typically, a client wants to fetch a resource (usingGET
) or post the value of an HTML form (usingPOST
), though more operations may be needed in other cases. -
The path of the resource to fetch; the URL of the resource stripped from elements that are obvious from the context, for example without the protocol (http://), the domain (here, developer.mozilla.org), or the TCP port (here, 80).
-
The version of the HTTP protocol.
-
Optional headers that convey additional information for the servers.
-
A body, for some methods like
POST
, similar to those in responses, which contain the resource sent.
Responses¶
An example HTTP response
Responses consist of the following elements:
-
The version of the HTTP protocol they follow.
-
A status code, indicating if the request was successful or not, and why.
-
A status message, a non-authoritative short description of the status code.
-
HTTP headers, like those for requests.
-
Optionally, a body containing the fetched resource.
2.5 HTTP Headers¶
HTTP headers let the client and the server pass additional information with an HTTP request or response, on top of the actual body. An HTTP header consists of its case-insensitive name followed by a colon (:), then by its value. Whitespace before the value is ignored.
Headers can be grouped according to their contexts:
-
Request headers contain more information about the resource to be fetched, or about the client requesting the resource.
-
Response headers hold additional information about the response, like its location or about the server providing it.
-
Representation headers contain information about the body of the resource, like its MIME type, or encoding/compression applied.
-
Payload headers contain representation-independent information about payload data, including content length and the encoding used for transport.
Headers can also be categorized based on proxy handling:
- End-to-end headers must reach the final recipient (the server for requests or the client for responses). Intermediate proxies must forward these headers without modification, and caching systems must preserve them.
- Hop-by-hop headers apply only to a single transport-level connection and must not be forwarded by proxies or stored in caches. Only hop-by-hop headers can be specified using the Connection header.
Examples of more important headers:
-
Authorization: Basic <credentials>
- allows to send authorization information with headers. -
Viewport-Width: <number>
- specifies the width (in pixels) at which the browser will render the page when requested. -
Accept: <MIME_type>/<MIME_subtype>
- tells the server which types of data the client can accept in the response. -
Set-Cookie: name=value
- a response header that allows servers to send cookies to the user's browser. These cookies are stored and automatically sent back to the server in subsequent requests, enabling persistent sessions. -
Content-Type: text/html; charset=UTF-8
- Indicates the media type and character encoding of the content in the message body. In this example, it specifies HTML content with UTF-8 character encoding.
There's a large amount of other headers, that are being widely used. Which headers browsers or web servers accept depends on the software used, and even though most headers are agreed upon nowadays, there are still differences between the different browsers. This makes web developer's job a nightmare at times. If you want to read more on HTTP header types.
3. Setting up a personal web server¶
3.1 Understanding DNS CNAME records¶
Sometimes you need one system to respond to multiple domain names, all pointing to the same IP address. This is common when:
- Hosting multiple websites on a single server
- Running different services (
www
,ftp
,mail
) on the same machine - Creating specialized subdomains for different applications
CNAME (Canonical Name) records solve this problem by creating "aliases" that point to your main domain name.
For example, let's consider a situation, where we have a single machine running a webpage, FTP (File Transfer Protocol) and an application, each has its own subdomain:
-
www.<vm_name>.sa.cs.ut.ee
-
ftp.<vm_name>.sa.cs.ut.ee
-
myservice.<vm_name>.sa.cs.ut.ee
For our main domain, we already have something like this:
-
student-test.sa.cs.ut.ee IN A 193.40.154.247
-
247.154.40.193 IN PTR student-test.sa.cs.ut.ee
... and for the rest of the hostnames which we are going to use, we will add the following:
-
www.student-test.sa.cs.ut.ee IN CNAME student-test.sa.cs.ut.ee
-
ftp.student-test.sa.cs.ut.ee IN CNAME student-test.sa.cs.ut.ee
-
myservice.student-test.sa.cs.ut.ee IN CNAME student-test.sa.cs.ut.ee
Important
Make sure you think through where you put a .
in the end!
How CNAME Records Work¶
A CNAME record maps an alias (like www.student-test.sa.cs.ut.ee
) to a "real" domain name (like student-test.sa.cs.ut.ee
) that has an A
record. DNS resolution happens in two steps:
- The alias is resolved to the real domain name
- The real domain name is resolved to its IP address (via its
A
record)
Think of it like this:
Alias (CNAME) → Real Domain Name (A record) → IP Address
For example, if we have:
- student-test.sa.cs.ut.ee (A record) → 193.40.154.247
- www.student-test.sa.cs.ut.ee (CNAME) → student-test.sa.cs.ut.ee
When someone visits www.student-test.sa.cs.ut.ee
, DNS first resolves it to student-test.sa.cs.ut.ee
, then resolves that to 193.40.154.247
.
Benefits of CNAME Records¶
The main advantage is simplified management. When multiple CNAMEs point to the same domain, you only need to update that single A record to change where all the aliases resolve. For example, if www.student-test.sa.cs.ut.ee
, ftp.student-test.sa.cs.ut.ee
, and mail.student-test.sa.cs.ut.ee
all point to student-test.sa.cs.ut.ee
, changing student-test.sa.cs.ut.ee
's A
record will automatically update where all three subdomains resolve.
Limitations of CNAME Records¶
- No reverse records:
CNAME
s cannot havePTR
records - an IP address will always resolve to the hostname with theA
record. - Service records:
MX
andNS
records cannot point toCNAME
s - they must point toA
orAAAA
records. - Root domains: A domain's root record typically cannot be a CNAME (though some DNS providers allow this through workarounds).
More details about Canonical Name Record (CNAME) with examples you can read here.
Complete
- Create a CNAME record for
www
inside/etc/named/<vm_name>.sa.cs.ut.ee
zone file.www IN CNAME <vm_name>.sa.cs.ut.ee.
Note
Mind the dot in the end. That is important, because if you do not add it, DNS server will add another "<vm_name>.sa.cs.ut.ee"
part in the end, so your CNAME becomes:
<vm_name>.sa.cs.ut.ee.<vm_name>.sa.cs.ut.ee
Verify
Reload the DNS and make sure the CNAME was added correctly.
journalctl -r -u named
dig www.<vm_name>.sa.cs.ut.ee
- To make sure everything new gets read in, did you update the
Serial
?
3.2 Virtual Web Hosts Using Apache Web Server¶
According to December 2024 statistics, Nginx leads the market with a 33.8% share, distinguished by its superior high-traffic handling capabilities. Apache follows closely at 27.6%, having established its reputation through exceptional reliability and extensive customization options. We are going to focus on Apache. Apache supports a wide variety of features and can be extended with modules.
When hosting multiple websites on a single server, you need a way to direct different domain requests to their appropriate content. This is where ''virtual hosting'' comes in—a method that allows one server to host multiple domains, each with its own separate web content and configuration.
3.2.0 What are Virtual Hosts?¶
Virtual hosts are a fundamental concept in web servers that enable a single physical server to host multiple websites, each with its own domain name. There are two primary types of virtual hosts:
-
Name-based virtual hosts: Multiple domain names share the same IP address using CNAME records. The web server determines which site to serve based on the hostname provided in the HTTP headers. This is the most common and efficient method.
-
IP-based virtual hosts: Each website uses a different IP address on the server. This approach is less common but useful in specific situations where unique IPs are required.
In this lab, we'll focus on name-based virtual hosting, which allows us to host multiple websites (www, wordpress, proxy) on our single VM without needing additional IP addresses.
Benefits of Virtual Hosts
- Resource efficiency: One physical server can host dozens or hundreds of websites
- Simplified management: Central administration of multiple websites
- Cost-effective: No need for separate hardware or IP addresses for each site
- Flexibility: Each virtual host can have its own configuration, log files, and security settings
3.2.1 Installing and Starting Apache¶
Now let's set up Apache HTTP Server, the most widely used web server software, and configure it to support our virtual hosts.
Complete
-
Install Apache Web Server
# dnf install httpd
-
Add the default Security group in
ETAIS
calledweb
withtcp
port80
to your Virtual machine, if necessary. -
Also, either add port
80
or servicehttp
as a rule to yourfirewalld
service. -
Afterwards you can start HTTPD service with the default configuration
# systemctl start httpd
Verify
Check if you can ping your virtual machine from your personal computer (laptop):
$ ping 172.17.XX.XX
or$ ping <vm_name>.sa.cs.ut.ee
The second option will work if your bind
configuration is right AND you have set your personal computers DNS servers to be 193.40.5.39
and 193.40.5.76
.
If you can ping your machine successfully then go to your web browser and navigate to:
http://172.17.XX.XX
orhttp://<vm_name>.sa.cs.ut.ee
You should see a web page.
3.2.2 Understaing Apache Configuration Files¶
Apache HTTP Server uses a modular configuration approach that makes it easy to manage complex setups. Instead of a single monolithic configuration file, Apache's settings are distributed across multiple files and directories. This kind of separation offers better flexibility when adding and removing different web applications, modules and domains. It also imporves navigation of the setup and enable/disable blocks of instructions.
Main Configuration Files:
/etc/httpd/conf/httpd.conf
- The primary configuration file, containing only global settings/etc/httpd/conf.d/*.conf
- Supplementary configuration files loaded automatically (you will mainly focus on)/etc/httpd/conf.modules.d/*.conf
- Module loading configuration files
Configuration Processing Order:
- Apache first processes the main
httpd.conf
file from top to bottom - When it encounters
Include
orIncludeOptional
directives, it processes those referenced files - Files in
conf.d/
are processed in alphabetical order (e.g.,00-base.conf
before10-vhosts.conf
) - Later directives can override earlier ones if they apply to the same scope
Apache Include Directive
For example, the directive Include mods-enabled/*.conf
automatically loads all .conf
files from the specified directory into the Apache configuration, while ignoring other files. This allows us to:
- Add functionality by creating a new
my-functionality.conf
file - Remove functionality by changing the file extension (e.g., to
my-functionality.conf.disabled
)
As a result, we can modify the Apache setup without directly editing the core configuration files.
In addition, adding files with Ansible is much easier than modifying parts of an existing file.
Best Practices for Apache Configuration:
- Create separate
.conf
files for each website or application in/etc/httpd/conf.d/
- Use meaningful filenames like
www.example.com.conf
for better organization - Add comments to explain non-obvious configuration choices
- Test configuration changes with
apachectl configtest
before reloading
The log files for the Apache webserver are located in the /var/log/httpd/
directory, where query (access_log) logs and error logs (error_log) are kept separately. It is recommended to have a separate log file for each virtual host as well.
Extra reading:
3.2.3 Creating Your First Virtual Host¶
During this lab, we will configure Apache to use name-based virtual hosts. With name-based virtual hosting, the server relies on the client to report the hostname as part of the HTTP headers. Using this technique, many different hosts can share the same IP address.
First, configure the actual virtual host. We will create a virtual host for www.<vm_name>.sa.cs.ut.ee
.
Apache documentation for Virtual Host and more specifically Name-based Virtual Host.
Complete
-
Creating a virtual host webroot directory. The contents of this directory will be published to web via HTTP. For
www.<vm_name>.sa.cs.ut.ee
,/var/www/html
directory is created during Apache installation for default page (usually web pages are located in/var/www/directory
or/var/www/html/directory
).- Create a directory called
www.<vm_name>
in the/var/www/html
directory - Create another one named
public_html
inside the newly created directory. - The end result should be something like:
/var/www/html/www.<vm_name>/public_html/
.
- Create a directory called
-
Creating a configuration file for the said virtual host. The file must be in the
/etc/httpd/conf.d/
directory. As the file name, usewww.<vm_name>.conf
. The name does not actually matter, it just makes it easier for you to keep track of things, just make sure that the file ends with.conf
.- Create a virtual host configuration file in
/etc/httpd/conf.d/
- Name the file
www.<vm_name>.conf
(the exact name isn't critical, but it must end with.conf
)
- Create a virtual host configuration file in
-
Populating the virtual host configuration file
- Add the configuration directives to define a new name-based virtual host. This is just an example (given below), you should insert correct values
<VirtualHost *:80> ServerName InsertRightValueHere DocumentRoot /var/www/html/www.<vm_name>/public_html # Possible values include: debug, info, notice, warn, error, crit, alert, emerg. LogLevel warn ErrorLog /var/log/httpd/<Insert_Right_Value_Here>-error.log CustomLog /var/log/httpd/<Insert_Right_Value_Here>-access.log combined </VirtualHost>
Overview of the important parameters/directives for the virtual host you will need to set:
ServerName
- the full DNS name for virtual host i.e.www.student-test.sa.cs.ut.ee
DocumentRoot
- Directory from where Apache servers domain related files, for www would be/var/www/html/www.<vm_name>/public_html
ErrorLog
,CustomLog
- error log and access (query) log for the virtual host - the name and location of log files can be set to any value, but we recommend to stick to common values e.g.:<Apache log dir>/webmail-error.log
and<Apache log dir>/webmail-access.log
combined. In most casesis located at /var/log/httpd/
- For easier troubleshooting change
LogLevel
todebug
Verify
If all was done correctly, your website default canvas should be ready. It should look something like this:
Complete
This default page contains the necessary information for you.
- Find the location of the default Apache HTTP server page.
- Follow the instructions inside it.
TIP: Read the page carefully.
Verify
Try accessing your web server now:
http://www.<yourdomain>.sa.cs.ut.ee
- You should see an empty page saying Index of /
You may have noticed that your page is empty. The next step is to create some content for the virtual hosts. To do that you need to be familiar with basic HTML tags and how a web page is constructed. If needed, refer to very good public tutorials here: W3Schools HTML Introduction and W3Schools HTML Basic Examples.
Complete
- In the root directory of your
www.<your-domain>.sa.cs.ut.ee
virtual host, create anindex.html
file. The content of this file can be freely chosen, but should also contain your machine's full hostname.- This
index.html
should have a string ofwww.<vm_name>.sa.cs.ut.ee
somewhere inside it for the scoring server to test.
- This
Verify
Use the # apachectl configtest
to test the configuration syntax or if all the configuration files are visible and can be loaded by Apache.
The output should be Syntax OK
.
Now it is time to restart the Apache httpd
server.
Complete
- Enable and start/restart the Apache webserver (the service name is
httpd
)# apachectl restart
or# systemctl restart httpd
Verify
You do not need to restart the Apache service every time as the pages will be re-read on each request. But since we just created a new index.html
file a restart to make sure everything is in order does not hurt at this point.
- If you visit your page again you should see your personal demo page now.
http://www.<yourdomain>.sa.cs.ut.ee
TIP Sometimes modern web browsers don't understand a page has been changed and just displays local cache version therefore if in doubt always refresh with <CTRL> + F5
key.
There are situations when you want to restart your Apache server, but can't interrupt its work. Imagine that you have a few hundred clients currently downloading files from your server and you need to avoid disconnecting them. In such situations you can use the following command:
# apachectl graceful
This will ''gracefully'' restart your Apache server with a new configuration without affecting your client's connections.
Verify
Test by accessing the web pages:
-
View the web pages for the virtual host you created, make sure that you are getting the right content for the virtual host.
- try
http://www.<your-domain>.sa.cs.ut.ee
- try
-
In the
/var/log/httpd/
directory, look at the access and error logs for your virtual host.- Understanding error messages is an important part of system administration. Some future tasks (e.g. exam) may require you to get and understand the troubleshooting information from the Apache logs independently.
3.3 Utilizing a Web Server As a Proxy¶
3.3.0 Introduction to Web Proxying¶
In modern web architecture, it's common to have Apache (or other web servers) act as a "front door" to other web applications or services. This pattern, called reverse proxying, provides several benefits:
- Security: The application itself isn't directly exposed to the internet
- TLS termination: Apache can handle HTTPS, simplifying your application code (we will touch TLS in lab7)
- Port normalization: Internal services can run on non-standard ports while users access standard ports (80/443)
- Load balancing: Requests can be distributed across multiple application instances
- Unified logging and monitoring: All web traffic passes through Apache
In other words allowing a systems administrator to have a single point of control over security settings and log unification, without having to delve into the application configuration or code.
A reverse proxy functions as a middleman between users and web servers. It receives client requests and forwards them to the right backend servers. Unlike forward proxies that hide client identities, reverse proxies conceal the backend servers.
Why Use a Proxy Instead of Direct File Serving?
Using a proxy server instead of direct file serving enhances security by hiding your backend infrastructure from potential attackers, while providing a central point for implementing access controls. Proxies improve performance through load balancing (distributing requests across multiple servers) and caching frequently accessed files. Additionally, proxies allow for consistent URL structures regardless of backend organization, and can handle protocol translation and content optimization. Though adding some complexity, these benefits make proxies the preferred choice for applications where security, scalability, and performance are priorities.
In this section, we'll set up Apache to proxy requests to a simple Python web application, a pattern commonly used for web applications written in Python (Flask, Tornado), Node.js, Ruby, Go, and other languages.
3.3.1 Setting Up a Small Application¶
In this example, we will use Python Flask, but you can use anything capable of serving the web. We will start a service on localhost
port 5000
, and proxy that to the internet.
Complete
-
Install
PIP
(a tool to manage Python libraries.)# dnf install python3-pip
-
Install Flask libraries that are necessary libraries to run a flask program.
# pip3 install flask
-
Create a create random file, for an example
/root/website.py
, and add the following code there:#!/bin/env python3 from flask import Flask app = Flask(__name__) @app.route("/") def hello(): return "Hello World!" if __name__ == "__main__": app.run(port=5000)
Verify
The following steps require two terminals:
- Run this Python application in one of the terminals, by doing
# python3 /root/website.py
. - After running the Python program, verify it works by executing
# curl localhost:5000
in the other terminal. It should answer with whatever the program is programmed to do. - Keep the second terminal open as we will use it to continue setting up appropriate settings for proxy.
3.3.2 Configuring a Virtual Host as a Reverse Proxy¶
Now we'll configure Apache to forward requests to our Flask application:
Complete
-
Create a DNS entry for the proxy:
- Set up a domain name
proxy.<vm_name>.sa.cs.ut.ee
, that points to your machine.
- Set up a domain name
-
Configure SELinux to allow Apache to connect to network services, basically this allows HTTPD to connect to your python program:
- Run the command
# setsebool -P httpd_can_network_connect=1
.
- Run the command
-
Create an Apache virtual host configuration:
- Add the following code into a new HTTPD config file, called
proxy.conf
.
- Add the following code into a new HTTPD config file, called
<VirtualHost *:80>
ServerName proxy.<vm_name>.sa.cs.ut.ee
# ServerName sets the name to listen for with requests
ErrorLog /var/log/httpd/proxy-error_log
CustomLog /var/log/httpd/proxy-access_log common
ProxyPreserveHost On
ProxyPass / http://localhost:5000/
ProxyPassReverse / http://localhost:5000/
</VirtualHost>
Key directives explained:
- ProxyPass: directive that tell Apache where to forwards incoming requests to the specified URL
- ProxyPassReverse: directive that rewrites response headers from the backend server
- ProxyPreserveHost: directive that passes the original Host header to the backend
Verify
You should be able to check whether your Python program is available from the name proxy.<vm_name>.sa.cs.ut.ee
. Try accessing your page either by:
-
# curl proxy.<vm_name>.sa.cs.ut.ee.
-
Using a web browser.
3.3.3 Running the Flask Application as a systemd
Service¶
Now, because you cannot keep a terminal open constantly to keep this web service up, nor is it a good practice to run things like that, we are going to use our service manager, to keep our python service up for us. Modern Linux systems use systemd
as their init and service manager. You have already used systemd
in lab 4 with systemctl status|restart <service-name>
command. But here we make a small intro for you to consolidate what you have gained from practicing lab 4. Linux services need to run reliably in the background, start automatically after system reboots, and restart if they crash.
Why use systemd
?
Instead of manually running our Flask application in a terminal (which would stop when we log out), systemd
allows us to:
- Run the application as a background service
- Start the service automatically when the system boots
- Restart the service automatically if it crashes
- Collect logs from standard output and error streams
- Run the service as a non-root user for better security
When working with systemd
, you'll primarily use the systemctl
command to manage services:
systemctl start|stop|restart|status|enable|disable <service-name>
Now let's create a systemd
service for our proxied application:
Complete
-
Close the proxy server
-
Create a dedicated user for security:
- Make a
proxy
user.
- Make a
-
Move the application to a standard location:
- Move your python file to
/usr/local/lib/server.py
. - Give ownership of your python file to
proxy
user.
- Move your python file to
-
Create a systemd service file:
- Create a file
/etc/systemd/system/proxy.service
with following contents:
- Create a file
# systemd unit file for the Python Proxy Service
[Unit]
# Human readable name of the unit
Description=Python Proxy Service
[Service]
# Command to execute when the service is started
ExecStart=/usr/bin/python3 /usr/local/lib/server.py
# Disable Python's buffering of STDOUT and STDERR, so that output from the
# service shows up immediately in systemds' logs
Environment=PYTHONUNBUFFERED=1
# Automatically restart the service if it crashes
Restart=on-failure
# Our service will notify systemd once it is up and running
Type=simple
# Use a dedicated user to run our service
User=proxy
[Install]
# Tell systemd to automatically start this service when the system boots
# (assuming the service is enabled)
WantedBy=default.target
- Enable and start the service:
- Reload systemctl service files
systemctl daemon-reload
- Start a service called
proxy
. - Enable a service called
proxy
to be started on boot.
- Reload systemctl service files
Verify
-
See if something is listening on port
5000
.# ss -tulpn | grep 5000
ornetstat -tupln | grep 5000
-
Test if your website is working.
-
Check if service survives on machine reboot.
4. Let's add a default WordPress setup to our current setup¶
WordPress is the world's most popular content management system (CMS), powering approximately 43% of all websites on the internet. It provides a flexible platform for creating blogs, business websites, e-commerce stores, and more without requiring extensive coding knowledge. In this section, we'll set up WordPress as another virtual host on our Apache server, which will demonstrate how to:
- Configure Apache to work with PHP applications
- Set up a database backend using MariaDB
- Install and configure a complex web application
- Manage application-specific security considerations
This practical implementation reinforces the virtual hosting concepts we've already covered while introducing database connectivity and dynamic content generation - essential components of modern web architecture.
4.1 Installing Prerequisites¶
Let's begin by installing the prerequisite packages needed for WordPress.
Complete
-
Installing packages for php and MariaDB
# dnf install php-mysqlnd php-fpm mariadb-server tar curl php-json
-
Startup the MariaDB service
# systemctl start mariadb
-
And let's enable it, so it starts up every time the VM boots up.
# systemctl enable mariadb
If you haven't already done so, do the same for httpd
service.
4.2 Configuring the Database¶
Let's Initiate and Secure the MariaDB Installation
Complete
The following command starts up a command-line-based procedure. The purpose of this is to override the default mariadb
settings and make it more secure by removing the default settings.
When you are asked to 'Set root password?'
write it down for future purposes. Put it somewhere you know to look for. The root password can be changed when you have root access to said machine, but that means you would have to google for a solution yourself (There are a plethora of guides on the internet that can help you achieve your goal).
Otherwise using recommended settings, is recommended, pun intended. You will recognize recommended option by the capitalized letter, i.e. [Y/n], here Y (Yes is recommended).
- Run -
# mysql_secure_installation
NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY!
In order to log into MariaDB to secure it, we'll need the current
password for the root user. If you've just installed MariaDB, and
haven't set the root password yet, you should just press enter here.
Enter current password for root (enter for none):
OK, successfully used password, moving on...
Setting the root password or using the unix_socket ensures that nobody
can log into the MariaDB root user without the proper authorisation.
You already have your root account protected, so you can safely answer 'n'.
Switch to unix_socket authentication [Y/n] n
... skipping.
You already have your root account protected, so you can safely answer 'n'.
Change the root password? [Y/n] n
... skipping.
By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them. This is intended only for testing, and to make the installation
go a bit smoother. You should remove them before moving into a
production environment.
Remove anonymous users? [Y/n] Y
... Success!
Normally, root should only be allowed to connect from 'localhost'. This
ensures that someone cannot guess at the root password from the network.
Disallow root login remotely? [Y/n] Y
... Success!
By default, MariaDB comes with a database named 'test' that anyone can
access. This is also intended only for testing, and should be removed
before moving into a production environment.
Remove test database and access to it? [Y/n] Y
- Dropping test database...
... Success!
- Removing privileges on test database...
... Success!
Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.
Reload privilege tables now? [Y/n] Y
... Success!
Cleaning up...
All done! If you've completed all of the above steps, your MariaDB
installation should now be secure.
Thanks for using MariaDB!
Let's create the WordPress Database and User
Complete
Login into the MySQL database with the password you just created for user root
.
# mysql -u root -p
Let's create a database for WordPress, naming it WordPress
.
mysql> CREATE DATABASE <database name>;
Now we need to create a new user admin
, with an insecure password pass
.
mysql> CREATE USER `admin`@`localhost` IDENTIFIED BY 'pass';
Let's grant user admin
access to the database we created.
mysql> GRANT ALL ON WordPress.* TO `admin`@`localhost`;
Now, since we have just granted user admin
access to a new database wordpress
, we need to reload the grant tables, with the following command:
mysql> FLUSH PRIVILEGES;
mariadb
bit is done for now.
mysql> exit
4.3 Setting Up WordPress Files¶
Now let's download the wordpress
tarball, unpack it and move it to a proper webroot.
Complete
-
Download the lastest version of wordpress:
# curl https://wordpress.org/latest.tar.gz --output wordpress.tar.gz
-
The tar command unpacks the wordpress.tar.gz into a directory
wordpress
in the current working directory.# tar xf wordpress.tar.gz
-
We can place the freshly unpacked
wordpress
directory into/var/www/html
. We could just as easily create a new directorywordpress.<vm_name>
for a more structured look (like we did for www previously), but it is not critical, as long as you know what you are doing. -
# cp -r wordpress /var/www/html
Let's grant user apache
the right to work in /var/www/html/wordpress
directory. And change the file SELinux security context.
Complete
-
# chown -R apache:apache /var/www/html/wordpress
-
# chcon -t httpd_sys_rw_content_t /var/www/html/wordpress -R
-> This changes the SELinux context to allow Apache to write to the WordPress directory, which is necessary for plugin installations and updates.
Now let's consolidate our WordPress PHP logs to one place. By default, PHP-FPM logs are scattered across the filesystem, making it difficult to troubleshoot issues that span both PHP and Apache. Consolidating these logs into the Apache log directory creates a single location for all web-related logs, simplifying monitoring and troubleshooting. This is especially important for complex applications like WordPress where problems can originate in different components of the stack.
Complete
- Edit both
/etc/php-fpm.conf
and/etc/php-fpm.d/www.conf
- In both files, find the directive
error_log
and redirect the logfiles to/var/log/httpd/php-errors.log
and/var/log/httpd/www-php-errors.log
accordingly - Dont forget to create the logfiles and assign the correct permissions (same as Apache logs)
- In both files, find the directive
4.4 Configuring Apache Virtual Host¶
Now repeat the steps we did earlier for www.<vm_name>.sa.cs.ut.ee
.
Complete
-
Create a CNAME for
wordpress.<vm_name>.sa.cs.ut.ee
-
Create a virtual host pointing at wordpress.
-
Restart Apache service
-
Also make sure to start the system service named
php-fpm
4.5 Completing WordPress Installation¶
The final part of the WordPress setup is setting up the page, what we have done so far has been laying down the foundation, so to speak.
Complete
-
Go to
wordpress.<vm_name>.sa.cs.ut.ee
and perform the actual WordPress installation. -
Follow the instructions provided there and complete the installation.
-
Again all the passwords and authentications you set there, you should remember or write down. Otherwise, it is a one-way trip to google.
5. Configuring Apache modules¶
Apache HTTP Server is designed with modularity at its core. Rather than being a monolithic application, Apache functions as a collection of components that can be enabled or disabled as needed. This modular design provides several key advantages:
- Flexibility: Administrators can tailor Apache to specific requirements without unnecessary overhead
- Resource efficiency: Only needed functionality is loaded, minimizing memory usage
- Security: Reducing the active code surface helps minimize potential vulnerabilities
- Maintainability: Modules can be updated independently of the core server
Each module extends Apache's functionality in specific ways - from basic features like serving static files to complex capabilities like URL rewriting, proxy services, or security filtering. Modules are typically named with a mod_
prefix (e.g., mod_ssl
, mod_rewrite
, mod_security
).
Viewing Currently Loaded Modules
You can see which modules are currently loaded in your Apache installation using:
# httpd -M
Module Configuration Files
In Red Hat-based systems like CentOS, module configuration files are typically stored in:
/etc/httpd/conf.modules.d/ - Module loading directives
Each module may introduce new directives that can be used in your virtual host configurations. For example, our proxy website is using the ProxyPass
directive that comes from mod_proxy
, which is loaded via /etc/httpd/conf.modules.d/00-proxy.conf
.
Categories of Apache Modules
Apache modules generally fall into several functional categories:
- Core functionality: Essential features like serving files and processing requests
- Security: Authentication, authorization, and protection against attacks
- Content handling: Processing different file types and content generation
- Performance optimization: Caching, compression, and connection management
- Integration: Connecting Apache with other software (like PHP, Python, etc.)
- Logging and monitoring: Tracking server activity and performance
In the following sections, we'll focus on two specific modules that enhance our server's security and diagnostic capabilities: forensic logging for detailed request tracking and ModSecurity for web application firewall protection.
5.1 Forensic logging¶
Standard Apache logging provides basic information about requests and responses, like what we have seen already ErrorLog
and CustomLog
directives. However,complex debugging scenarios often require more detailed insights. This is where mod_log_forensic
comes in; it is a specialized logging module designed for in-depth request analysis. For example, Looking at our access logs defined in CustomLog
we notice some pretty simple info: request origin IP, datetime of the request, and some basic information about the request and response. While this is fine in most cases when working with developers chasing an elusive bug, they might require extra information about the nature of the request.
Forensic logging captures comprehensive details about HTTP requests, including:
- Complete header information
- Unique request identifiers for tracking requests across logs
- Before-and-after transaction markers
Unlike standard logging which records a single line after a request completes, forensic logging creates two log entries:
- A pre-request entry containing all request headers (marked with a
+
prefix) - A post-request entry containing just the request ID (marked with a
-
prefix)
This dual-entry approach allows administrators to identify incomplete requests (those with a pre-request entry but no matching post-request entry), which is particularly valuable when troubleshooting client disconnections or server crashes.
When to Use Forensic Logging
Forensic logging is especially valuable when:
- Developers need complete request header information to debug application issues
- Security teams are investigating potential intrusion attempts
- Operations staff are troubleshooting intermittent server crashes or timeouts
- Support teams need to recreate and verify exact client request patterns
While powerful, forensic logging generates substantially larger log files than standard logging, so it's typically enabled selectively rather than as a permanent configuration.
Now let's implement forensic loggin for your virtual hosts:
Complete
Fortunately, mod_log_forensic
comes installed by default, so all we need to do is enable and configure it.
-
Loading the module
- edit the main apache2 configuration file
httpd.conf
and add the lineLoadModule log_forensic_module modules/mod_log_forensic.so
- edit the main apache2 configuration file
-
Configuring Virtual Hosts
- add a new logfile option
ForensicLog /var/log/httpd/<virtualhost>-forensic.log
- add a new logfile option
-
Checking Virtual Hosts
- Make sure to replace the
<virtualhost>
value appropriately for each site - Create the empty file with the correct permissions for each site
- Make sure to replace the
4 Restart Apache
Verify
-
Access on of your websites (e.g.
www.<vm_name>.sa.cs.ut.ee
) -
Examine the corresponding forensic log
/var/log/httpd/www-forensic.log
. You should now see similar log data in your forensic logfiles.
+YDvAdxfNz6bRQrk@IWUKUQAAAMA|GET / HTTP/1.1|Host:localhost|User-Agent:Mozilla/5.0 (X11; Linux x86_64; rv%3a84.0) Gecko/20100101 Firefox/84.0|Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8|Accept-Language:en-US,en;q=0.5|Accept-Encoding:gzip, deflate|DNT:1|Connection:keep-alive|Cookie:utbslang=et|Upgrade-Insecure-Requests:1|Cache-Control:max-age=0
-YDvAdxfNz6bRQrk@IWUKUQAAAMA
Interpreting Forensic Log Entries
Let's break down a typical forensic log entry:
- Pre-request entry (starts with +):
YDvAdxfNz6bRQrk@IWUKUQAAAMA
- Unique request identifier- Pipe-separated (
|
) list of all HTTP request headers - Headers include method, path, host, user-agent, accepted content types, cookies, etc.
- Post-request entry (starts with
-
):- Same unique identifier confirming request completion
AKey Differences from Standard Logs:
Feature | Standard Log | Forensic Log |
---|---|---|
Format | Human-readable | Machine-readable (pipe-delimited) |
Header Detail | Limited (path, method) | Complete (all headers) |
Entries per Request | One (after completion) | Two (before and after) |
Request Tracking | No unique ID | Includes unique ID |
Size | Compact | Verbose |
Forensic logs are particularly valuable when standard logs don't provide enough information to diagnose issues. Their detailed nature allows for exact reproduction of client requests during troubleshooting.
5.2 Security-hardening our webserver using mod_security¶
While basic website security should come from the developers' side, system administrators have a few tricks to make sure malicious requests are filtered at the webserver level. For this mod_security
is used in apache2 to filter requests that might not always inherently be malicious but can often be exploited.
While traditional firewalls protect networks at the transport layer (controlling which ports and protocols can be used), web applications need defense at the application layer where sophisticated attacks like SQL injection, cross-site scripting (XSS), and command injection occur. This is where Web Application Firewalls come in. ModSecurity is a Web Application Firewall (WAF) that is a list of rules that get matched against each HTTP request. These rules try to determine whether the request is malicious in nature, for example trying to access system files like /etc/shadow
, and if they do, then the request gets blocked.
A WAF sits between web traffic and your web server, analyzing HTTP requests and responses to identify and block malicious patterns. Unlike network firewalls that rely on IP addresses and ports, WAFs examine the actual content of web requests—including URLs, query parameters, headers, and request bodies.
Not all websites are always able to work behind a WAF. For example courses.cs.ut.ee
needed heavy WAF custom configuration, because every edit to a page seemed like a bunch of code for a WAF, and it denied all requests.
Key Features of ModSecurity
- Request filtering: Inspects incoming requests for suspicious patterns
- Response filtering: Can inspect outgoing responses for data leaks
- Real-time monitoring: Logs all detected security threats
- Virtual patching: Can mitigate vulnerabilities before application code is fixed
- Flexible rule engine: Supports complex conditions and actions
- Integration with OWASP CRS: Can use community-maintained rule sets
Complete
-
The Apache security module does not come installed by default.
- use
# dnf search
to search for the package and then install it.
- use
-
Configure custom security rules
- In your Apache config directory you should see a directory named
modsecurity.d
and another subdirectory in it namedlocal_rules
- Inside we will find a placeholder file named
modsecurity_localrules.conf
which we will open for editing - Now let's append the following rules
- In your Apache config directory you should see a directory named
# default action when matching rules
SecDefaultAction "phase:2,deny,log,status:406"
# [etc/passwd] is included in request URI
SecRule REQUEST_URI "etc/passwd" "id:'500001'"
# [../] is included in request URI
SecRule REQUEST_URI "\.\./" "id:'500002'"
# [<SCRIPT] is included in arguments
SecRule ARGS "<[Ss][Cc][Rr][Ii][Pp][Tt]" "id:'500003'"
Verify
We need to craft a request that matches our filters, let's look at a common attack of trying to access the Linux /etc/passwd
file through the webserver.
-
Take one our your virtual hosts, we will use
www
in this example- Try to query the file
/etc/passwd
from your server by going to the URLwww.<vm_name>.sa.cs.ut.ee/etc/passwd
- You should see a classic error
Not Found
with the return code404
in the site error log.
- Try to query the file
-
Now restart your web server if you haven't already
- After retrying the query, we should now instead get
Not Acceptable
with the HTML return code406
- We can also see an error line in our log containing
ModSecurity: Access denied with code 406 (phase 2). Pattern match "etc/passwd" at REQUEST_URI
, we can now confirm that our custom rules are in place
- After retrying the query, we should now instead get
We can also test this from the command line, emulating an HTTP GET
request
-
We will use the tool
netcat
to manually write our requestFirst, let's craft the request itself, we will be using some printf magic to make sure it's correctly formatted. The request is similar to the one shown at the beginning of the lab guide
Notice the 2 extra empty lines at the end - these are necessary forGET / HTTP/1.1 User-Agent: nc/0.0.1 Host: www.<vmname>.sa.cs.ut.ee Accept: */*
netcat
to understand where the end of the request is.- Next we need to craft this into a format
printf
can understand, for this we will replace all of the line breaks with\r\n
"GET / HTTP/1.1\r\nUser-Agent: nc/0.0.1\r\nHost: www.<vmname>.sa.cs.ut.ee\r\nAccept: */*\r\n\r\n"
Now we pass this string onto
printf
, a command-line tool to format strings and pipe the output tonetcat
-
printf "GET / HTTP/1.1\r\nUser-Agent: nc/0.0.1\r\nHost: www.<vmname>.sa.cs.ut.ee\r\nAccept: */*\r\n\r\n" | nc localhost 80
- We should see a response from the server starting with
HTTP/1.1 200 OK
, followed by the HTML content of our indexpage
- We should see a response from the server starting with
-
Now replace the query path
/
with/etc/passwd
- We now see that the webserver has rejected our request with the header
HTTP/1.1 406 Not Acceptable
- We now see that the webserver has rejected our request with the header
- Next we need to craft this into a format
Thankfully, with nowadays default webserver configuration, most of these kinds of attacks are impossible if you have not made a severe configuration error, but it is still good to have at least two layers of configuration to prevent these kinds of problems.
Complete
In reality, we would not use our own custom rules as we might miss potential attack vectors since we cannot account for every type of attack ourselves. Where could we find a preconfigured list of Apache security rules?
6. Ansible tips¶
6.1 Tags¶
Previously, we have asked you to define your roles with tags like - role { role: <role name>, tags: <tag name> }
. Tags are used for running specific parts of your playbook, instead of the whole thing.
For example, you have edited some zone files under your DNS role and want to apply them. Running ansible-playbook playbook.yml
will run all of the roles that you have set up there, this will get cumbersome if you have tens of huge roles that will be executed. Running these extra roles won't break anything if your playbook is well written, but in some cases, it will take a lot of time to run a whole playbook for a single configuration change. In this instance, it can be avoided by using ansible-playbook --tags=dns playbook.yml
, which will run only the roles that have a DNS tag. The main purpose is to avoid the hassle of creating multiple playbooks for singular roles, editing the main playbook to comment something out, and all-in-all making using Ansible a better experience. More information about tags can be found at the Ansible documentation tags page
6.2 Ansible modules¶
Ansible modules are discreet sets of code that give ansible the functionality that it has. You have already used a few modules in your playbook tasks, like user:
for user management, file:
for creating files and modifying their permissions dnf:
for installing packages, templates, etc. Ansible has a lot of modules built in that can greatly improve your playbook execution.
Verify
For this lab look up the following modules:
-
pip
-
seboolean
-
copy
-
mysql_user
-
mysql_db
-
unarchive
-
sefcontext
and see how they could be used to automate lab 5.
Ansible documentation always shows specific examples on every module usage, most of them even match with what we are trying to do in this lab. For example, we can take them from seboolean and sefcontext to get:
- name: Seboolean | this equates to 'setsebool -P httpd_can_network_connect=1'
seboolean:
name: httpd_can_network_connect
state: yes
persistent: yes
- name: Sefcontext | this equates to 'chcon -t httpd_sys_rw_content_t /var/www/html/wordpress -R'
sefcontext:
target: '/var/www/html/wordpress(/.*)?'
setype: httpd_sys_rw_content_t
state: present
6.3 Handlers¶
Ansible tasks have three main states of completion:
-
ok
- task ran successfully and nothing was changed -
changed
- task ran successfully and something was changed -
failed
- executing the task failed
There are some tasks that you may only want to run when a change is made. Mostly this is used to restart services only when a configuration file has been updated, not on every run. Just as it was said in the guide above, you want to avoid unnecessary restarts to Apache as much as you can.
Handlers have their own subdirectory roles/<rolename>/handlers/main.yml
An example use of a handler in a task declaration
- name: Template | website config
template:
src: vhost.conf.j2
dest: /etc/httpd/conf.d/www.{{ hostname }}.conf
notify:
- restart httpd
An example of an httpd handler in handlers/main.yml
:
- name: restart httpd
systemd:
daemon_reload: yes
name: httpd
state: restarted
enabled: yes
The ansible documentation for handlers has many great examples on how to run handlers. Now when a handler is notified, it will run at the end of your playbook to flush all the changes you have done all at one time.
7. Automating lab 5¶
Complete
- Create a new role directory for apache and declare it in your main playbook with
- { role: apache, tags: apache}
- Update the templates under your DNS role for www, WordPress, etc
- Create the necessary subdirectories for
files
,templates
andhandlers
underroles/apache
- Utilize handlers and different modules for your playbook
- Use tags to speed up consecutive runs
- Don't be afraid to ask for help in slack
8. Keeping your Ansible repository safe in Gitlab¶
And as always push your stuff to our Gitlab.
Complete
In your ansible home directory:
# git add .
# git commit -m "Web lab"
# git push -u origin main
Verify
Go to your GitLab page in courses' gitlab to see if all of your latest pushes reached the git repository. If you play around with your repository and have made changes to the Ansible that you wish to utilize also in the future, always remember to commit and push them.