328 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
		
			Executable File
		
	
	
	
	
			
		
		
	
	
			328 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
		
			Executable File
		
	
	
	
	
| # tldts - Blazing Fast URL Parsing
 | |
| 
 | |
| `tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.
 | |
| 
 | |
| **Features**:
 | |
| 
 | |
| 1. Tuned for **performance** (order of 0.1 to 1 μs per input)
 | |
| 2. Handles both URLs and hostnames
 | |
| 3. Full Unicode/IDNA support
 | |
| 4. Support parsing email addresses
 | |
| 5. Detect IPv4 and IPv6 addresses
 | |
| 6. Continuously updated version of the public suffix list
 | |
| 7. **TypeScript**, ships with `umd`, `esm`, `cjs` bundles and _type definitions_
 | |
| 8. Small bundles and small memory footprint
 | |
| 9. Battle tested: full test coverage and production use
 | |
| 
 | |
| # Install
 | |
| 
 | |
| ```bash
 | |
| npm install --save tldts
 | |
| ```
 | |
| 
 | |
| # Usage
 | |
| 
 | |
| Using the command-line interface:
 | |
| 
 | |
| ```js
 | |
| $ npx tldts 'http://www.writethedocs.org/conf/eu/2017/'
 | |
| {
 | |
|   "domain": "writethedocs.org",
 | |
|   "domainWithoutSuffix": "writethedocs",
 | |
|   "hostname": "www.writethedocs.org",
 | |
|   "isIcann": true,
 | |
|   "isIp": false,
 | |
|   "isPrivate": false,
 | |
|   "publicSuffix": "org",
 | |
|   "subdomain": "www"
 | |
| }
 | |
| ```
 | |
| 
 | |
| Programmatically:
 | |
| 
 | |
| ```js
 | |
| const { parse } = require('tldts');
 | |
| 
 | |
| // Retrieving hostname related informations of a given URL
 | |
| parse('http://www.writethedocs.org/conf/eu/2017/');
 | |
| // { domain: 'writethedocs.org',
 | |
| //   domainWithoutSuffix: 'writethedocs',
 | |
| //   hostname: 'www.writethedocs.org',
 | |
| //   isIcann: true,
 | |
| //   isIp: false,
 | |
| //   isPrivate: false,
 | |
| //   publicSuffix: 'org',
 | |
| //   subdomain: 'www' }
 | |
| ```
 | |
| 
 | |
| Modern _ES6 modules import_ is also supported:
 | |
| 
 | |
| ```js
 | |
| import { parse } from 'tldts';
 | |
| ```
 | |
| 
 | |
| Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts
 | |
| 
 | |
| # API
 | |
| 
 | |
| - `tldts.parse(url | hostname, options)`
 | |
| - `tldts.getHostname(url | hostname, options)`
 | |
| - `tldts.getDomain(url | hostname, options)`
 | |
| - `tldts.getPublicSuffix(url | hostname, options)`
 | |
| - `tldts.getSubdomain(url, | hostname, options)`
 | |
| - `tldts.getDomainWithoutSuffix(url | hostname, options)`
 | |
| 
 | |
| The behavior of `tldts` can be customized using an `options` argument for all
 | |
| the functions exposed as part of the public API. This is useful to both change
 | |
| the behavior of the library as well as fine-tune the performance depending on
 | |
| your inputs.
 | |
| 
 | |
| ```js
 | |
| {
 | |
|   // Use suffixes from ICANN section (default: true)
 | |
|   allowIcannDomains: boolean;
 | |
|   // Use suffixes from Private section (default: false)
 | |
|   allowPrivateDomains: boolean;
 | |
|   // Extract and validate hostname (default: true)
 | |
|   // When set to `false`, inputs will be considered valid hostnames.
 | |
|   extractHostname: boolean;
 | |
|   // Validate hostnames after parsing (default: true)
 | |
|   // If a hostname is not valid, not further processing is performed. When set
 | |
|   // to `false`, inputs to the library will be considered valid and parsing will
 | |
|   // proceed regardless.
 | |
|   validateHostname: boolean;
 | |
|   // Perform IP address detection (default: true).
 | |
|   detectIp: boolean;
 | |
|   // Assume that both URLs and hostnames can be given as input (default: true)
 | |
|   // If set to `false` we assume only URLs will be given as input, which
 | |
|   // speed-ups processing.
 | |
|   mixedInputs: boolean;
 | |
|   // Specifies extra valid suffixes (default: null)
 | |
|   validHosts: string[] | null;
 | |
| }
 | |
| ```
 | |
| 
 | |
| The `parse` method returns handy **properties about a URL or a hostname**.
 | |
| 
 | |
| ```js
 | |
| const tldts = require('tldts');
 | |
| 
 | |
| tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
 | |
| // { domain: 'amazonaws.com',
 | |
| //   domainWithoutSuffix: 'amazonaws',
 | |
| //   hostname: 'spark-public.s3.amazonaws.com',
 | |
| //   isIcann: true,
 | |
| //   isIp: false,
 | |
| //   isPrivate: false,
 | |
| //   publicSuffix: 'com',
 | |
| //   subdomain: 'spark-public.s3' }
 | |
| 
 | |
| tldts.parse(
 | |
|   'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv',
 | |
|   { allowPrivateDomains: true },
 | |
| );
 | |
| // { domain: 'spark-public.s3.amazonaws.com',
 | |
| //   domainWithoutSuffix: 'spark-public',
 | |
| //   hostname: 'spark-public.s3.amazonaws.com',
 | |
| //   isIcann: false,
 | |
| //   isIp: false,
 | |
| //   isPrivate: true,
 | |
| //   publicSuffix: 's3.amazonaws.com',
 | |
| //   subdomain: '' }
 | |
| 
 | |
| tldts.parse('gopher://domain.unknown/');
 | |
| // { domain: 'domain.unknown',
 | |
| //   domainWithoutSuffix: 'domain',
 | |
| //   hostname: 'domain.unknown',
 | |
| //   isIcann: false,
 | |
| //   isIp: false,
 | |
| //   isPrivate: false,
 | |
| //   publicSuffix: 'unknown',
 | |
| //   subdomain: '' }
 | |
| 
 | |
| tldts.parse('https://192.168.0.0'); // IPv4
 | |
| // { domain: null,
 | |
| //   domainWithoutSuffix: null,
 | |
| //   hostname: '192.168.0.0',
 | |
| //   isIcann: null,
 | |
| //   isIp: true,
 | |
| //   isPrivate: null,
 | |
| //   publicSuffix: null,
 | |
| //   subdomain: null }
 | |
| 
 | |
| tldts.parse('https://[::1]'); // IPv6
 | |
| // { domain: null,
 | |
| //   domainWithoutSuffix: null,
 | |
| //   hostname: '::1',
 | |
| //   isIcann: null,
 | |
| //   isIp: true,
 | |
| //   isPrivate: null,
 | |
| //   publicSuffix: null,
 | |
| //   subdomain: null }
 | |
| 
 | |
| tldts.parse('tldts@emailprovider.co.uk'); // email
 | |
| // { domain: 'emailprovider.co.uk',
 | |
| //   domainWithoutSuffix: 'emailprovider',
 | |
| //   hostname: 'emailprovider.co.uk',
 | |
| //   isIcann: true,
 | |
| //   isIp: false,
 | |
| //   isPrivate: false,
 | |
| //   publicSuffix: 'co.uk',
 | |
| //   subdomain: '' }
 | |
| ```
 | |
| 
 | |
| | Property Name         | Type   | Description                                     |
 | |
| | :-------------------- | :----- | :---------------------------------------------- |
 | |
| | `hostname`            | `str`  | `hostname` of the input extracted automatically |
 | |
| | `domain`              | `str`  | Domain (tld + sld)                              |
 | |
| | `domainWithoutSuffix` | `str`  | Domain without public suffix                    |
 | |
| | `subdomain`           | `str`  | Sub domain (what comes after `domain`)          |
 | |
| | `publicSuffix`        | `str`  | Public Suffix (tld) of `hostname`               |
 | |
| | `isIcann`             | `bool` | Does TLD come from ICANN part of the list       |
 | |
| | `isPrivate`           | `bool` | Does TLD come from Private part of the list     |
 | |
| | `isIP`                | `bool` | Is `hostname` an IP address?                    |
 | |
| 
 | |
| ## Single purpose methods
 | |
| 
 | |
| These methods are shorthands if you want to retrieve only a single value (and
 | |
| will perform better than `parse` because less work will be needed).
 | |
| 
 | |
| ### getHostname(url | hostname, options?)
 | |
| 
 | |
| Returns the hostname from a given string.
 | |
| 
 | |
| ```javascript
 | |
| const { getHostname } = require('tldts');
 | |
| 
 | |
| getHostname('google.com'); // returns `google.com`
 | |
| getHostname('fr.google.com'); // returns `fr.google.com`
 | |
| getHostname('fr.google.google'); // returns `fr.google.google`
 | |
| getHostname('foo.google.co.uk'); // returns `foo.google.co.uk`
 | |
| getHostname('t.co'); // returns `t.co`
 | |
| getHostname('fr.t.co'); // returns `fr.t.co`
 | |
| getHostname(
 | |
|   'https://user:password@example.co.uk:8080/some/path?and&query#hash',
 | |
| ); // returns `example.co.uk`
 | |
| ```
 | |
| 
 | |
| ### getDomain(url | hostname, options?)
 | |
| 
 | |
| Returns the fully qualified domain from a given string.
 | |
| 
 | |
| ```javascript
 | |
| const { getDomain } = require('tldts');
 | |
| 
 | |
| getDomain('google.com'); // returns `google.com`
 | |
| getDomain('fr.google.com'); // returns `google.com`
 | |
| getDomain('fr.google.google'); // returns `google.google`
 | |
| getDomain('foo.google.co.uk'); // returns `google.co.uk`
 | |
| getDomain('t.co'); // returns `t.co`
 | |
| getDomain('fr.t.co'); // returns `t.co`
 | |
| getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk`
 | |
| ```
 | |
| 
 | |
| ### getDomainWithoutSuffix(url | hostname, options?)
 | |
| 
 | |
| Returns the domain (as returned by `getDomain(...)`) without the public suffix part.
 | |
| 
 | |
| ```javascript
 | |
| const { getDomainWithoutSuffix } = require('tldts');
 | |
| 
 | |
| getDomainWithoutSuffix('google.com'); // returns `google`
 | |
| getDomainWithoutSuffix('fr.google.com'); // returns `google`
 | |
| getDomainWithoutSuffix('fr.google.google'); // returns `google`
 | |
| getDomainWithoutSuffix('foo.google.co.uk'); // returns `google`
 | |
| getDomainWithoutSuffix('t.co'); // returns `t`
 | |
| getDomainWithoutSuffix('fr.t.co'); // returns `t`
 | |
| getDomainWithoutSuffix(
 | |
|   'https://user:password@example.co.uk:8080/some/path?and&query#hash',
 | |
| ); // returns `example`
 | |
| ```
 | |
| 
 | |
| ### getSubdomain(url | hostname, options?)
 | |
| 
 | |
| Returns the complete subdomain for a given string.
 | |
| 
 | |
| ```javascript
 | |
| const { getSubdomain } = require('tldts');
 | |
| 
 | |
| getSubdomain('google.com'); // returns ``
 | |
| getSubdomain('fr.google.com'); // returns `fr`
 | |
| getSubdomain('google.co.uk'); // returns ``
 | |
| getSubdomain('foo.google.co.uk'); // returns `foo`
 | |
| getSubdomain('moar.foo.google.co.uk'); // returns `moar.foo`
 | |
| getSubdomain('t.co'); // returns ``
 | |
| getSubdomain('fr.t.co'); // returns `fr`
 | |
| getSubdomain(
 | |
|   'https://user:password@secure.example.co.uk:443/some/path?and&query#hash',
 | |
| ); // returns `secure`
 | |
| ```
 | |
| 
 | |
| ### getPublicSuffix(url | hostname, options?)
 | |
| 
 | |
| Returns the [public suffix][] for a given string.
 | |
| 
 | |
| ```javascript
 | |
| const { getPublicSuffix } = require('tldts');
 | |
| 
 | |
| getPublicSuffix('google.com'); // returns `com`
 | |
| getPublicSuffix('fr.google.com'); // returns `com`
 | |
| getPublicSuffix('google.co.uk'); // returns `co.uk`
 | |
| getPublicSuffix('s3.amazonaws.com'); // returns `com`
 | |
| getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true }); // returns `s3.amazonaws.com`
 | |
| getPublicSuffix('tld.is.unknown'); // returns `unknown`
 | |
| ```
 | |
| 
 | |
| # Troubleshooting
 | |
| 
 | |
| ## Retrieving subdomain of `localhost` and custom hostnames
 | |
| 
 | |
| `tldts` methods `getDomain` and `getSubdomain` are designed to **work only with _known and valid_ TLDs**.
 | |
| This way, you can trust what a domain is.
 | |
| 
 | |
| `localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`:
 | |
| 
 | |
| ```js
 | |
| const tldts = require('tldts');
 | |
| 
 | |
| tldts.getDomain('localhost'); // returns null
 | |
| tldts.getSubdomain('vhost.localhost'); // returns null
 | |
| 
 | |
| tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
 | |
| tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost'
 | |
| ```
 | |
| 
 | |
| ## Updating the TLDs List
 | |
| 
 | |
| `tldts` made the opinionated choice of shipping with a list of suffixes directly
 | |
| in its bundle. There is currently no mechanism to update the lists yourself, but
 | |
| we make sure that the version shipped is always up-to-date.
 | |
| 
 | |
| If you keep `tldts` updated, the lists should be up-to-date as well!
 | |
| 
 | |
| # Performance
 | |
| 
 | |
| `tldts` is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`).
 | |
| 
 | |
| Please see [this detailed comparison](./comparison/comparison.md) with other available libraries.
 | |
| 
 | |
| ## Contributors
 | |
| 
 | |
| `tldts` is based upon the excellent `tld.js` library and would not exist without
 | |
| the many contributors who worked on the project:
 | |
| <a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a>
 | |
| 
 | |
| This project would not be possible without the amazing Mozilla's
 | |
| [public suffix list][]. Thank you for your hard work!
 | |
| 
 | |
| # License
 | |
| 
 | |
| [MIT License](LICENSE).
 | |
| 
 | |
| [badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
 | |
| [badge-downloads]: https://img.shields.io/npm/dm/tldts.svg
 | |
| [public suffix list]: https://publicsuffix.org/list/
 | |
| [list the recent changes]: https://github.com/publicsuffix/list/commits/master
 | |
| [changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom
 | |
| [public suffix]: https://publicsuffix.org/learn/
 |