Extract Website Content Tool step

The Extract Website Content Tool step will allow you to scrape and convert a website into text or HTML files for your Agents and Tools.

Add the Extract Website Content Tool step to your Tool

You can add the Extract Website Content Tool step to your Tool by:

Creating a new Tool, then searching for the ‘Extract Website Content’ Tool step
Click ‘Expand’ to see the full Tool step
Enter the URL of the website you want to scrape and extract content from in ‘Website URL’
Under ‘Method’, select ‘Text’ or ‘HTML’ based on what format you want to have your website content be extracted as
Click ‘Run step’ to test out your Tool step with your inputs!

Advanced Settings

Model

You can choose between two models to use for this Tool step:

Apify
Browserless

The following Advanced Settings will then vary based on which Model you choose.

Apify Advanced Settings

Scrape Type

You can select between two scrape types:

Simple HTML (cheaper)
Full Web Page (expensive)

Use proxies

You can use proxies to scrape the website - this is more expensive.

Max depth

The maximum number of links starting from the start URL that the crawler will recursively follow. The start URLs have a depth of 0. Capped at 10.

Max pages

The maximum number of pages to crawl. Capped at 100.

Browserless Advanced Settings

These Advanced Settings can only be used if you select ‘Text’ as your ‘Method’ in the Tool step.

Element selector

You can specify which element from the HTML components to scrape. By default, it is set to body. Note that using ’+ New item’, you can specify a list of elements to be scrapped.

Extra headers

If you need to provide special information to be able to scrape a website, provide the data as a JSON object. The below object shows an example where an authentication token called auth-token and a user-id are required.

{
    "auth-token":"AUTHENTICATION-TOKEN",
    "user-id":"USER-ID"
}

Common errors

Over max depth

The error message below indicates that the ‘Max depth’ value has been set to under 0 or over 10.

Studio transformation browserless_scrape input validation error: must be <= 10 {"comparison":"<=","limit":10} /model/max_depth

Over max pages

The error message below indicates that the ‘Max pages’ value has been set to over 100.

Studio transformation browserless_scrape input validation error: must be <= 100 {"comparison":"<=","limit":100} /model/max_pages

What’s the difference between this Tool step and Firecrawl?

Firecrawl is another Tool step you can use to scrape and extract website content. Unlike this Tool step, it requires you to sign up for Firecrawl and bring your Firecrawl API Key - however, it comes with more options for scraping outputs. Both Tool steps can be used, and we recommend trying both to see which one suits your needs more for your Agents and Tools.

Get started

Chat

Marketplace

Workforce

Agents

Tools

Knowledge

Integrations

Use cases

Add the Extract Website Content Tool step to your Tool

Advanced Settings

Model

Apify Advanced Settings

Scrape Type

Use proxies

Max depth

Max pages

Browserless Advanced Settings

Element selector

Extra headers

Common errors

What’s the difference between this Tool step and Firecrawl?

Get started

Chat

Marketplace

Workforce

Agents

Tools

Knowledge

Integrations

Use cases

​Add the Extract Website Content Tool step to your Tool

​Advanced Settings

​Model

​Apify Advanced Settings

​Scrape Type

​Use proxies

​Max depth

​Max pages

​Browserless Advanced Settings

​Element selector

​Extra headers

​Common errors

​What’s the difference between this Tool step and Firecrawl?

Add the Extract Website Content Tool step to your Tool

Advanced Settings

Model

Apify Advanced Settings

Scrape Type

Use proxies

Max depth

Max pages

Browserless Advanced Settings

Element selector

Extra headers

Common errors

What’s the difference between this Tool step and Firecrawl?