Post

Web Information Gathering

Web reconnaissance involves carefully gathering information about a target website or web application. It serves as the groundwork for the information-gathering phase in penetration testing.

Web Information Gathering

Certificate Transparency Logs

Top Tools:

1
root@kakarot$ curl -s "https://crt.sh/?q=tesla.com&output=json" | jq -r '.[] | select(.name_value | contains("dev")) | .name_value' | sort -u    #Coarse Filter
1
root@kakarot$ curl -s "https://crt.sh/?q=tesla.com&output=json" | jq -r '.[].name_value | split("\n")[] | select(contains("dev"))' | sort -u    #Hostname-level Extraction

WHOIS

WHOIS is a protocol used to query databases that store registration details of domains, to identify who owns or manages them and more.

1
root@kakarot$ whois domain.com

Some of the information returned by whois:
Domain Name
Registrar
Registrant Contact
Administrative Contact
Technical Contact
Creation and Expiration Dates
Name Servers

DNS

Top Tools:

  • dig
  • nslookup
  • host
  • dnsenum
  • fierce
  • dnsrecon
  • theHarvester
  • Online DNS Lookup Services
1
root@kakarot$ dig domain.com                      # Default A record lookup
1
root@kakarot$ dig domain.com A                    # Get IPv4 address (A record)
1
root@kakarot$ dig domain.com AAAA                 # Get IPv6 address (AAAA record)
1
root@kakarot$ dig domain.com MX                   # Show mail servers (MX records)
1
root@kakarot$ dig domain.com NS                   # Show authoritative name servers (NS)
1
root@kakarot$ dig domain.com TXT                  # Show TXT records (SPF, verification, etc.)
1
root@kakarot$ dig domain.com CNAME                # Show CNAME (alias) record
1
root@kakarot$ dig domain.com SOA                  # Show SOA (zone authority info)
1
root@kakarot$ dig @1.1.1.1 domain.com             # Query using a specific DNS server (1.1.1.1)
1
root@kakarot$ dig +trace domain.com               # Trace full DNS resolution path
1
root@kakarot$ dig -x 192.168.1.1                  # Reverse lookup: IP -> hostname
1
root@kakarot$ dig +short domain.com               # Short, concise answer only
1
root@kakarot$ dig +noall +answer domain.com       # Show only the ANSWER section
1
root@kakarot$ dig domain.com ANY                  # Request all records (may be ignored by servers)

Subdomain Bruteforcing

Top Tools:

1
root@kakarot$ dnsenum --enum inlanefreight.com -f /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt -r

--enum: Shortcut option equivalent to –threads 5 -s 15 -w.
-r, --recursion: Recursion on subdomains, brute force all discovered subdomains that have an NS record.

DNS Zone Transfers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
+------------------+                           +------------------+
| secondaryServer  |                           |  primaryServer   |
+------------------+                           +------------------+
        |                                               |
        | --------- AXFR Request (Zone Transfer) ------>|
        |                                               |
        | <--------- SOA Record (Start of Authority) ---|
        |                                               |
        +---------------------- loop ------------------ +
        |                                               |
        | | <--------- DNS Record -------------------|  |
        | |                                          |  |
        | +------------------- end loop -------------+  |
        |                                               |
        | <---------- Zone Transfer Complete -----------|
        |                                               |
        | --------- ACK (Acknowledgement) ------------->|
        |                                               |
+------------------+                           +------------------+
| secondaryServer  |                           |  primaryServer   |
+------------------+                           +------------------+

Exploiting Zone Transfers Using Dig:

1
root@kakarot$ dig axfr domain.com
1
root@kakarot$ dig axfr @DnsServer domain.com

Virtual Hosts

Top Tools:


We Have 3 Types of Virtual Hosting:

  • Name-Based Virtual Hosting
  • IP-Based Virtual Hosting
  • Port-Based Virtual Hosting


1
root@kakarot$ gobuster vhost -u http://<target-ip> -w <wordlist> --append-domain

-u: The target URL
-w: Path to the wordlist
--append-domain : Append main domain from URL to words from wordlist

-t: Number of concurrent threads (default: 10)
-k: Skip TLS certificate verification (default: false)
-o: Output file to write results to (defaults to stdout)

Fingerprinting

Top Tools:

Nmap
Wappalyzer
BuiltWith
WhatWeb
wafw00f
Netcraft


Fingerprinting Techniques:

Banner Grabbing
Analysing HTTP Headers
Probing for Specific Responses
Analysing Page Content

1
root@kakarot$ curl -I domain.com

Wafw00f

1
root@kakarot$ wafw00f domain.com

Nikto

1
root@kakarot$ nikto -h domain.com -Tuning b

-h: Target host
-Tuning b: Software Identification

Crawling

Types of crawling strategies:

Breadth-First Crawling
Depth-First Crawling

Some Valuable Information:

Comments
Links (Internal and External)
Sensitive Files
MetaData

robots.txt Structure:

  • User-agent
  • Directives
    • Disallow
    • Allow
    • Crawl-delay
    • Sitemap

Some /.well-known URIs:

/.well-known/change-password
openid-configuration
security.txt
mta-sts.txt
assetlinks.json
more


Top Web Crawlers:

Scrapy (Python Framework)
Apache Nutch (Scalable Crawler)
Burp Suite Spider
OWASP ZAP (Zed Attack Proxy)

Using ReconSpider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
import scrapy
import json
import re
from urllib.parse import urlparse
from scrapy.crawler import CrawlerProcess
from scrapy.downloadermiddlewares.offsite import OffsiteMiddleware

class CustomOffsiteMiddleware(OffsiteMiddleware):
    def should_follow(self, request, spider):
        if not self.host_regex:
            return True
        # This modification allows domains with ports
        host = urlparse(request.url).netloc.split(':')[0]
        return bool(self.host_regex.search(host))

class WebReconSpider(scrapy.Spider):
    name = 'ReconSpider'
    
    def __init__(self, start_url, *args, **kwargs):
        super(WebReconSpider, self).__init__(*args, **kwargs)
        self.start_urls = [start_url]
        self.allowed_domains = [urlparse(start_url).netloc.split(':')[0]]
        self.visited_urls = set()
        self.results = {
            'emails': set(),
            'links': set(),
            'external_files': set(),
            'js_files': set(),
            'form_fields': set(),
            'images': set(),
            'videos': set(),
            'audio': set(),
            'comments': set(),
        }
        
    def parse(self, response):
        self.visited_urls.add(response.url)

        # Only process text responses
        if response.headers.get('Content-Type', '').decode('utf-8').startswith('text'):
            # Extract emails
            emails = set(re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', response.text))
            self.results['emails'].update(emails)
        
            # Extract links
            links = response.css('a::attr(href)').getall()
            for link in links:
                if link.startswith('mailto:'):
                    continue
                parsed_link = urlparse(link)
                if not parsed_link.scheme:
                    link = response.urljoin(link)
                if urlparse(link).netloc == urlparse(response.url).netloc:
                    if link not in self.visited_urls:
                        yield response.follow(link, callback=self.parse)
                self.results['links'].add(link)
        
            # Extract external files (CSS, PDFs, etc.)
            external_files = response.css('link::attr(href), a::attr(href)').re(r'.*\.(css|pdf|docx?|xlsx?)$')
            for ext_file in external_files:
                self.results['external_files'].add(response.urljoin(ext_file))
        
            # Extract JS files
            js_files = response.css('script::attr(src)').getall()
            for js_file in js_files:
                self.results['js_files'].add(response.urljoin(js_file))
        
            # Extract form fields
            form_fields = response.css('input::attr(name), textarea::attr(name), select::attr(name)').getall()
            self.results['form_fields'].update(form_fields)
        
            # Extract images
            images = response.css('img::attr(src)').getall()
            for img in images:
                self.results['images'].add(response.urljoin(img))
        
            # Extract videos
            videos = response.css('video::attr(src), source::attr(src)').getall()
            for video in videos:
                self.results['videos'].add(response.urljoin(video))
        
            # Extract audio
            audio = response.css('audio::attr(src), source::attr(src)').getall()
            for aud in audio:
                self.results['audio'].add(response.urljoin(aud))
            
            # Extract comments
            comments = response.xpath('//comment()').getall()
            self.results['comments'].update(comments)
        else:
            # For non-text responses, just collect the URL
            self.results['external_files'].add(response.url)
        
        self.log(f"Processed {response.url}")

    def closed(self, reason):
        self.log("Crawl finished, converting results to JSON.")
        # Convert sets to lists for JSON serialization
        for key in self.results:
            self.results[key] = list(self.results[key])
        
        with open('results.json', 'w') as f:
            json.dump(self.results, f, indent=4)

        self.log(f"Results saved to results.json")

def run_crawler(start_url):
    process = CrawlerProcess(settings={
        'LOG_LEVEL': 'INFO',
        'DOWNLOADER_MIDDLEWARES': {
            '__main__.CustomOffsiteMiddleware': 500,
        }
    })
    process.crawl(WebReconSpider, start_url=start_url)
    process.start()

if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(description="ReconSpider")
    parser.add_argument("start_url", help="The starting URL for the web crawler")
    args = parser.parse_args()
    
    run_crawler(args.start_url)
1
root@kakarot$ python3 ReconSpider.py domain.com

Search Engine

Search Operators:

Operator What It Does Example Explanation
site: Limit search to a single website or domain site:kakarot.info See only pages from kakarot.info
inurl: Find pages that have a certain word in the URL inurl:forum site:kakarot.info Look for forum pages on kakarot.info
filetype: Search for specific file types filetype:pdf site:kakarot.info Find PDF files on kakarot.info
intitle: Look for a word in the page title intitle:”guide” site:kakarot.info Find pages with “guide” in the title
intext: Search for a word in the main page content intext:”Dragon Ball” site:kakarot.info Find pages that mention “Dragon Ball”
cache: See Google’s stored copy of a page cache:kakarot.info View the cached version of kakarot.info
link: Find pages that link to a specific site link:kakarot.info Discover who links to kakarot.info
related: Find websites similar to a domain related:kakarot.info See sites related to kakarot.info
info: Get basic info about a website info:kakarot.info Shows title, description, and summary of kakarot.info
define: Get the meaning of a word define:kakarot Look up the definition of “kakarot”
numrange: Search for numbers in a range site:kakarot.info numrange:1-100 Find pages mentioning numbers from 1 to 100
allintext: Find pages containing all words in the content allintext:”character stats” site:kakarot.info Pages containing both “character” and “stats”
allinurl: Find pages containing all words in the URL allinurl:forum topic site:kakarot.info Look for URLs with both “forum” and “topic”
allintitle: Find pages containing all words in the title allintitle:”Dragon Ball guide” site:kakarot.info Pages with “Dragon Ball” and “guide” in the title
AND Make search results include all terms site:kakarot.info AND intext:”Saiyan” Only pages with both conditions
OR Include any of multiple terms site:kakarot.info “Goku” OR “Vegeta” Pages mentioning either “Goku” or “Vegeta”
NOT Exclude a term site:kakarot.info NOT intext:”ads” Exclude pages that mention ads
* (wildcard) Match any word site:kakarot.info “character * stats” Search for any word between “character” and “stats”
.. (range) Search between two numbers site:kakarot.info “level” 1..50 Find pages mentioning levels from 1 to 50
” “ Search exact phrase “Dragon Ball Z guide” site:kakarot.info Find pages that mention exactly “Dragon Ball Z guide”
- Exclude words site:kakarot.info -intext:”forum” Exclude pages about forums

Google Dorking:

  • Looking For Login Pages:
    • site:website.com inurl:login
    • site:website.com (inurl:login OR inurl:admin)
  • Looking For Exposed Files:
    • site:website.com filetype:pdf
    • site:website.com (filetype:xls OR filetype:docx)
  • Looking For Configuration Files:
    • site:website.com inurl:config.php
    • site:website.com (ext:cnf OR ext:conf)
  • Locating Database Backups:
    • site:website.com inurl:backup
    • site:website.com filetype:sql

Google Hacking DB

Automating Recon

Top Tools:

FinalRecon
Recon-ng
theHarvester
SpiderFoot
OSINT Framework


Recon Using FinalRecon:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
root@kakarot$ finalrecon -h                                              
usage: finalrecon [-h] [--url URL] [--headers] [--sslinfo] [--whois] [--crawl] [--dns] [--sub] [--dir] [--wayback] [--ps] [--full] [-nb] [-dt DT] [-pt PT] [-T T] [-w W] [-r] [-s] [-sp SP]
                  [-d D] [-e E] [-o O] [-cd CD] [-k K]

FinalRecon - All in One Web Recon | v1.1.7

options:
  -h, --help  show this help message and exit
  --url URL   Target URL
  --headers   Header Information
  --sslinfo   SSL Certificate Information
  --whois     Whois Lookup
  --crawl     Crawl Target
  --dns       DNS Enumeration
  --sub       Sub-Domain Enumeration
  --dir       Directory Search
  --wayback   Wayback URLs
  --ps        Fast Port Scan
  --full      Full Recon

Extra Options:
  -nb         Hide Banner
  -dt DT      Number of threads for directory enum [ Default : 30 ]
  -pt PT      Number of threads for port scan [ Default : 50 ]
  -T T        Request Timeout [ Default : 30.0 ]
  -w W        Path to Wordlist [ Default : wordlists/dirb_common.txt ]
  -r          Allow Redirect [ Default : False ]
  -s          Toggle SSL Verification [ Default : True ]
  -sp SP      Specify SSL Port [ Default : 443 ]
  -d D        Custom DNS Servers [ Default : 1.1.1.1 ]
  -e E        File Extensions [ Example : txt, xml, php ]
  -o O        Export Format [ Default : txt ]
  -cd CD      Change export directory [ Default : ~/.local/share/finalrecon ]
  -k K        Add API key [ Example : shodan@key ]

Hmmmmmmm… Done !

My Image

This post is licensed under CC BY 4.0 by the author.