Add the Extract Website Content Tool step to your Tool
You can add the Extract Website Content Tool step to your Tool by:- Creating a new Tool, then searching for the ‘Extract Website Content’ Tool step
- Click ‘Expand’ to see the full Tool step
- Enter the URL of the website you want to scrape and extract content from in ‘Website URL’
- Under ‘Method’, select ‘Text’ or ‘HTML’ based on what format you want to have your website content be extracted as
- Click ‘Run step’ to test out your Tool step with your inputs!
Advanced Settings
Model
You can choose between two models to use for this Tool step:- Apify
- Browserless
Apify Advanced Settings
Scrape Type
You can select between two scrape types:- Simple HTML (cheaper)
- Full Web Page (expensive)
Use proxies
You can use proxies to scrape the website - this is more expensive.Max depth
The maximum number of links starting from the start URL that the crawler will recursively follow. The start URLs have a depth of 0. Capped at 10.Max pages
The maximum number of pages to crawl. Capped at 100.Browserless Advanced Settings
These Advanced Settings can only be used if you select ‘Text’ as your ‘Method’ in the Tool step.Element selector
You can specify which element from the HTML components to scrape. By default, it is set tobody
. Note that using ’+ New item’, you can specify a list of elements to be scrapped.
Extra headers
If you need to provide special information to be able to scrape a website, provide the data as a JSON object. The below object shows an example where an authentication token calledauth-token
and a user-id
are required.
Common errors
Over max depth
Over max depth
The error message below indicates that the ‘Max depth’ value has been set to under 0 or over 10.
Over max pages
Over max pages
The error message below indicates that the ‘Max pages’ value has been set to over 100.