Common mistakes in website deployment

Security measures during deployment is probably one of the most ignored aspects of website development. Many novice web developers choose to deploy their websites in Cpanel by uploading zip file of their website source content and extracting it using the file manager. The worse part of it is sometimes they forget to delete the actual zip after extraction, exposing the source code for everyone to download. Also, many website developers don’t secure their .git folder, exposing the source content to everyone. Some developers even go so far as to place the different API keys in their git source code, thus exacerbating the situation.To analyze the situation of current websites, I downloaded the alexa list of top one million websites. The goal is to find any mistakes made by the website developer during deployment.

The first thing to check is if the website developer forgot to delete the backup zip file of the website. A novice developer will choose to deploy his website as a compressed zip file of source code. This is particularly easy for beginners as they can compress all the files into a single file, upload it in the cpanel file manager and extract all of the files using the online interface – preventing them from uploading each file individually. The zip filename is usually the name of the website itself. For example, for a website example.com , the zip filename is usually example.zip. Other common filenames are www.zip, htdocs.zip etc.

Another silly mistake that a developer makes is exposing the .git repository for their websites. This is a common mistake among even fairly adept developers. We can check the response of HEAD file inside the .git repository to check if the directory is accessible. For example, for example.com, we can check the response of example.com/.git/HEAD

Another file from where we can gain information about the website is the error_log file generated by APACHE. Although it doesn’t contain any confidential information, some information like database name, path of the current script, filename of scripts etc may be exposed in the file.

The first step in analyzing the websites is to prepare a list of URLs, whose response is to be checked. A simple script can be used to generate a list of URLs whose HTTP response code is analyzed to check if the file exists. The following script is to get the HTTP response code of the URLs in a file “urllist” and save the result in “results.csv”. It uses CURL to get the HTTP response and xargs to perform the action in parallel.

#!/bin/bash
xargs -n1 -P 10 curl -o /dev/null --silent --head --location --write-out '%{url_effective};%{http_code};\n' < urllist | tee results.csv

Leaving this script to run for a few days gave few million URLs and their response codes. Then, the list of urls that gave 200 code is filtered. The result is again partitioned into “zip files”, “git repos”, and “error log files”.

This gave a list of around 8000 zip URLs. Not all of these URLs are source code containing Zip files. Many websites respond 200 for any URL. So, I wrote a script to check the header and content of the zip file to verify if the URL is of a valid Zip file. The final filtered list contains around 1600 Zip files. I manually checked the content of some of these zip files. Most of them are wordpress backups, exposing the database name, username and password among other things. Some of them are harmless-containing only static files. Hence, one out of every 625 websites are exposing their complete or partial source code as a downloadable Zip file.

Another weak point of website deployment is the Git directory ( Or any other version control system repository) . Website developers often use git as an easy to use deployment tool. While placing secret keys in git is highly discouraged, many developers fail to do so, thus increasing the severity of loss if any intruder gets hold of the git repository. The script gave a list of around 10000 websites that returned 200 HTTP response for /.git/HEAD . I wrote another program to verify that the git directory is really a valid repo and that the website isn’t just returning 200 response to every request. That resulted in a list of 3000 exposed git repositories.

The error log URL list contained around 30000 URLs. I didn’t bother to filter the error_log URL list as the file contains only small amount of information that might not be very useful or interesting to intruders.

How to prevent these loopholes?

Small precautions taken during website deployment can easily prevent these mistakes. First of all, don’t compress the content of your website to upload or download it. CPanel users can use FTP with client like FileZilla to upload or download the source code. If possible, use a version control system for deployment. This will save your website from being exposed as a downloadable zip file in case you forget to delete the file.

Also, never place your secret API keys or other keys in your source code managed by a version control system. There are different methods like environment variables, separate file in server, etc where you can put the secret keys. If you place the secret keys in the repository, any person who has access to your repository will have access to all the secret keys.

It is also important to secure your .git directory. The easiest method is to point the domain root to a sub-directory of your repo. For example, consider a project mywebsite managed using git. The directory structure should be more or less like this:

mywebsite (managed by git)

–> .git

–> content (Contains all the website content)

In this way, any outside person will have no way to access the .git directory directly.

If you already have a directory hierarchy where your .git directory is inside your website root, you can prevent access to the directory using apache or nginx rules. If you use apache, placing the following in the .htaccess file of website root will prevent access to the .git folder

RedirectMatch 404 /\.git

If you nginx, place the following in your server block

location ~ /\.git {
  deny all;
}

So, your website nginx configuration will look like this

server {
	listen 80 default_server;
	listen [::]:80 default_server ipv6only=on;

	root /usr/share/nginx/html;
	index index.html index.htm;

	
	server_name localhost;
	# Block .git directory
	location ~ /\.git {
  	deny all;
	}
	# Make site accessible from http://localhost/
	location / {
		# First attempt to serve request as file, then
		# as directory, then fall back to displaying a 404.
		try_files $uri $uri/ =404;
		# Uncomment to enable naxsi on this location
		# include /etc/nginx/naxsi.rules
	}
}