Data scraping enables you to extract structured data from your browser, application or document to a database, .csv file or even Excel spreadsheet.
Note:
It is recommended to run your web automations on Internet Explorer 11 and above, Mozilla Firefox 50 or above, or the latest version of Google Chrome.
Structured data is a specific kind of information that is highly organized and is presented in a predictable pattern. For example, all Google search results have the same structure: a link at the top, a string of the URL and a description of the web page. This structure enables Studio to easily extract the information, as it always knows where to find it.
The scraping wizard can be opened from the Design tab, by clicking the Data Scraping button.
data:image/s3,"s3://crabby-images/15b2a/15b2ad287a112b75b36ef6f9ccecf41184cac8b8" alt="DataScraping_Ribbon.png 742"
The main steps of the data scraping wizard are:
- Select the first and last fields in the web page, document or application that you want to extract data from, so that Studio can deduce the pattern of the information.
data:image/s3,"s3://crabby-images/e24f4/e24f4515c52c38252dc4b99608ca59935e754ae7" alt="image_173.png 1035"
Note:
Studio automatically detects if you indicated a table cell, and asks you if you want to extract the entire table. If you click Yes, the Extract Wizard displays a preview of the selected table data.
data:image/s3,"s3://crabby-images/68d8a/68d8a09bdc17ea328013dec369a21cc0bfffd6d3" alt="image_174.png 815"
- Customize column headers and choose whether or not to extract URLs.
data:image/s3,"s3://crabby-images/1b28f/1b28f877ed48e823d949cfb4f782be228ddef037" alt="image_175.png 542"
- Preview the data, edit the number of maximum results to be extracted and change the order of the columns.
data:image/s3,"s3://crabby-images/6eb2b/6eb2b27856fe7400d2051a6d57714464c44d6d16" alt="image_176.png 1221"
- Optionally click Extract Correlated Data. This enables you to go through the Extract Wizard again, to extract additional info and add it as a new column in the same table.
- Indicate the Next button in the web page, application or document (if the information you want to extract spans multiple pages).
data:image/s3,"s3://crabby-images/6f6e2/6f6e2d2f93171bf73ee33686325d7b9c4ea82580" alt="image_177.png 484"
After you are finished with the wizard, a sequence is generated in Studio.
data:image/s3,"s3://crabby-images/a8be1/a8be19d31f8aa5e6f7a4490980dcca8da49232c0" alt="image_178.png 546"
Data scraping always generates a container (Attach Browser or Attach Window) with a selector for the top-level window and an Extract Structured Data activity with a partial selector, thus ensuring a correct identification of the app to be scraped.
Additionally, the Extract Structured Data activity also comes with an automatically generated XML string (in the ExtractMetadata property) that indicates the data to be extracted.
Lastly, all the scraped information is stored in a DataTable variable, that you can later use to populate a database, a .csv file or an Excel spreadsheet.
Updated 3 years ago