Security.txt – an Analysis of the Alexa Top 1 Million Domains
Since 2017, there is an IETF draft out there that proposes to implement a security.txt file on webservers so that security researchers have an easy way to get in touch with the company / website owner in case security issues get detected. This is actually a great idea – so I wanted to check how widespread this security.txt already is and reviewed the Alexa Top 1 Million URLs.
So what is the security.txt in detail?
Many security researchers encounter situations where they are unable to responsibly disclose security issues to companies because there is no course of action laid out. security.txt is designed to help assist in this process by making it easier for companies to designate the preferred steps for researchers to take when trying to reach out.
Security.txt is a proposed standard which allows websites to define security policies. The security.txt file sets clear guidelines for security researchers on how to report security issues, and allows bug bounty programs to define a scope. Security.txt is the equivalent of robots.txt, but for security issues.
In a perfect world, at least each corporate website would have a security.txt file which you can refer to in case needed. This is a mutual win-win situation – the researcher will get the contact details faster – the corporation will get notified faster about potential security issues. There is basically no reason not to do it as far as I know. You are saving time and pain to all involved parties. Potential attack vectors for the security.txt can be submitted here.
There are already various parsers for the security.txt out there – but none in python so far which I used basically for processing. So I basically used some primitive search functions on the files in order to detect potential security.txt files – this is of course error-prone.
My script checks if /.well-known/security.txt exists on all Alexa links and if exists, downloads the file / response the server is offering. I had over 300.000 hits here most probably caused by false responses, WAF responses, redirects, 0-byte files etc.
I removed all files containing HTML – according to the IETF draft, HTML should not be used – the security.txt is text-only. Additionally I filtered for key words like “Contact” etc. in order to push them to the next stage.
Finally, I had to manually review some files which slipped through my filters.
A Python script returns results.txt which is then used for further statistics done in google sheets.
- Some of the admins do not seem to be very polite – as Scott already found out.
- Various websites dropped fatal errors or other application specific error messages – this should be prevented since it could provide further indicators how to compromise the website. Some websites even drop you a 25 MB video file containing an error message – wtf?
- My results are most probably not 100% correct due to parsing errors – additionally, some websites also do not stick to the IETF draft – for example: bitmex.com. It’s great that they use a security.txt in general, but they deviate from the proposal and hence make it hard for automated parsers to verify.
- The amount of information in the security.txt file varies heavily – in some cases it’s just a one-liner containing the contact details – in other cases PGP keys, hiring information etc is added.
- A huge amount of security.txt findings are caused by tumblr providing the security.txt automatically for its subdomains
- CH domains pretty much suck in regards to the security.txt – 4 findings so far.
- xorz website is the only .rocks domain with a security.txt 🙂
# of URLs investigated:
# of security.txt findings:
17787 (1.7787% of total 1 Mil)
# of tumblr domains in findings:
16645 (93.58% of all findings)
# of google domains in findings:
108 (0.61% of all findings )
Top 20 – TLD
1.7% is a pretty low number still – however, it’s a start – I thought the results would be worse to be honest. The influence of tumblr domains is helping here a lot and various websites have a security.txt most probably without even knowing it. WordPress isn’t there yet – bad – time to catch up folks.
It would be great to see further domains, especially bigger hosting enterprises, to catch up and implement a security.txt file. The importance of it will grow.
It seems like Scott published his Alexa analysis today as well including raw data – he also mentioned the broader coverage by tumblr.