Javascript Scenario

Scrapfly's js_scenario provides an ability to fully control a headless web browser. Javascript Scenario can be used to issue browser commands like clicking buttons, filling in forms, scrolling and executing custom javascript code.

This feature require Javascript Rendering enabled and the target page must be an html page, otherwise it's not executed.

Javascript scenario details are available in the API response result.browser_data.js_scenario as well as the monitoring dashboard:

javascript scenario view on monitoring dashboard

Usage

Javascript scenario consists of one or multiple browser actions that are passed to Scrapfly as a base64 encoded JSON array. An average scenario looks something like this:

Each scenario step is a JSON object with a single key that represents the action to be performed and details of the action.

Once you design your javascript scenario use Scrapfly's base64 encoding online tool to convert it to a base64 encoded string that can be passed to the API for execution.

Note on Timeouts

The entire Javascript Scenario has an execution budget of 25 seconds. Scrapfly does a rough estimation on the maximum JS scenario execution time and will reject any scenarios that are estimated to take more than 25 seconds.

For long-running javascript scenario requiring more than 25sec - You can check the how timeout works
TL;DR retry=false timeout after 90s by default and you can customize the timeout with retry=false&timeout=120000

Full example with API Player

The best way to get familiar with Javascript Scenarios is to use the Scrapfly Web Player to design and test your scenario. However, here's an example to get you started - this scenario will login to web-scraping.dev/login by performing these steps:

  • Select username input box and fill value user123
  • Select password input box and fill value password
  • Select and click login button
  • Wait for navigation to acknowledge button click for maximum of 5 seconds

Then, this scenario can be base64 encoded and passed to Scrapfly API for execution:

curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "render_js=true" \
--data-urlencode "js_scenario=W3siZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9dXNlcm5hbWVdIiwidmFsdWUiOiJ1c2VyMTIzIn19LHsiZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9cGFzc3dvcmRdIiwidmFsdWUiOiJwYXNzd29yZCJ9fSx7ImNsaWNrIjp7InNlbGVjdG9yIjoiYnV0dG9uW3R5cGU9J3N1Ym1pdCddIn19LHsid2FpdF9mb3JfbmF2aWdhdGlvbiI6eyJ0aW1lb3V0Ijo1MDAwfX1d" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://web-scraping.dev/login"
"https://api.scrapfly.io/scrape?render_js=true&js_scenario=W3siZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9dXNlcm5hbWVdIiwidmFsdWUiOiJ1c2VyMTIzIn19LHsiZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9cGFzc3dvcmRdIiwidmFsdWUiOiJwYXNzd29yZCJ9fSx7ImNsaWNrIjp7InNlbGVjdG9yIjoiYnV0dG9uW3R5cGU9J3N1Ym1pdCddIn19LHsid2FpdF9mb3JfbmF2aWdhdGlvbiI6eyJ0aW1lb3V0Ijo1MDAwfX1d&key=&url=https%3A%2F%2Fweb-scraping.dev%2Flogin"

"api.scrapfly.io"
"/scrape"

render_js    = "true" 
js_scenario  = "W3siZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9dXNlcm5hbWVdIiwidmFsdWUiOiJ1c2VyMTIzIn19LHsiZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9cGFzc3dvcmRdIiwidmFsdWUiOiJwYXNzd29yZCJ9fSx7ImNsaWNrIjp7InNlbGVjdG9yIjoiYnV0dG9uW3R5cGU9J3N1Ym1pdCddIn19LHsid2FpdF9mb3JfbmF2aWdhdGlvbiI6eyJ0aW1lb3V0Ijo1MDAwfX1d" 
key          = "" 
url          = "https://web-scraping.dev/login" 

Example of response with scenario

Scenario Step Types

Currently, 7 scenario types are supported. Each scenario type has a different set of mandatory and optional parameters.

  • [MANDATORY] param_name:type
  • [OPTIONAL] param_name:type
  • Click

    selector:string ignore_if_not_visible:bool=false timeout budget (ms): +2500

    Click on a visible element. It's a native click and emit a trusted event - it's not simulated with javascript.

    Internal Workflow

    • Waiting element to be visible
    • Moving to the element (mouse and scroll) like a human
    • Trigger the focus of the element
    • Left click

    Parameters

    • selector:string Accept CSS Selector and XPATH Selector
    • ignore_if_not_visible:bool Wait the element if visible then just skip if not
    • multiple:bool If multiple element match, click on all matched elements

    Usage

    Fill

    selector:string value:string timeout budget (ms): +${timeout} +500

    Type the provided value in the targeted element. The typing is not simulated in javascript - it's from real keyboard input.

    Internal Workflow

    • Waiting element to be visible
    • Moving to the element (mouse and scroll) like a human
    • Trigger the focus of the element
    • Type the value in the input like a human

    Parameters

    • selector:string Any valid CSS and XPATH Selector
    • value:string Value to type in element
    • clear:boolean Clear the input field before writing

    Usage

    Condition

    Condition are exclusive to one of

    • status_code:int
    • selector:string
      • selector_state:string=existing
      • timeout:int=1000
    action:string=continue

    Parameters

    • selector:string Any valid CSS or XPATH selector
    • selector_state:string Can be existing or not_existing
    • action:string Action when the condition is met, can be continue, exit_success, exit_failed

    Play the scenario only if the condition is met

    Internal Workflow

    • Check the equality of the status code with the response status code

    Usage

    Wait

    timeout budget (ms): +${wait}

    Make a pause during the scenario. The whole pause time is added to the scenario budget

    Parameters

    There is no parameter, you pass directly the value expressed in millisecond

    Usage

    Scroll

    element:string=body selector:string=bottom timeout budget (ms): +500

    Scroll to the selector (if no selector, scroll to the bottom). If the element parameter is a valid selector, it's scroll within the element. The scroll is not simulated with javascript - it's simulated with real mouse input.

    Internal Workflow

    • Wait the element is visible
    • Wait the selector is visible
    • Scroll like a human

    Parameters

    • element:string=body a valid css selector or xpath or "body"
    • selector:string a valid css selector or xpath or "bottom"
    • infinite:int=0 infinite scroll - number of scroll iteration

    Usage

    Execute

    timeout:int=3000 timeout budget (ms): +${timeout}

    Execute a javascript script and store the result if a result is returned

    Internal Workflow

    • The Javascript code is executed
    • If the javascript code return something - it's stored and available in API response result.browser_data.js_scenario.steps, all "execute" step have a result entry.
    • Support Async/Await function

    Parameters

    • script:string Script to execute, it can return serializable value
    • timeout:int Timeout to wait after the script execution have started - expressed in millisecond

    Usage

    Wait For Navigation

    timeout:int=1000 timeout budget (ms): +${timeout} + 1500

    Time to wait to detect a navigation / changing page. The given timeout + 1500 (1.5s) is added to the scenario budget - this additional time represent the average duration of a standard page loading (with assets, xhr, etc). For example if you set a timeout of 1000, 2500 is counted.

    Parameters

    • timeout:int Maximum timeout to wait for a navigation - expressed in millisecond

    Usage

    Wait For Selector

    selector:string=body state:string=visible timeout budget (ms): +${timeout}

    Wait the element is visible (if state=visible) in the page or the element disappear (state=hidden). If the selector is not present in the desired state until the timeout this step failed and the scenario is aborted. The timeout is added to the scenario budget

    Parameters

    • selector:string=body a valid css selector or xpath or "body"
    • state:string=visible state of the element in the page "visible" or "hidden"
    • timeout:int=5000 Timeout to wait before fail - expressed in milliseconds

    Usage

    url:string timeout:int=5000 timeout budget (ms): +5500

    Navigate to the provided URL and await the page to load.

    Internal Workflow

    • Type the address into the bar of the current tab
    • Load page and wait the full rendering

    Parameters

    • url:string The URL to navigate
    • timeout:int Maximum time to wait the page loading

    Usage

Summary