Oxylabs Real-Time Crawler

How to Use OxyLabs Real-Time Crawler [Part 1]: Real-Time Crawler for Google

Do you know how to use OxyLabs Real-time Crawler for Google? This is the most comprehensive introduction from OxyLabs official.

Quick Start

Real-Time Crawler is built for heavy-duty data retrieval operations. You can use Real-Time Crawler to access various Google pages, including regular search, hotel availability and Google Shopping. It enables effortless web data extraction from search engines without any delays or errors.

Real-Time Crawler for Google uses basic HTTP authentication that requires sending username and password.

This is by far the fastest way to start using Real-Time Crawler for Google. You will send a query adidas to google_search using Realtime integration method. Don't forget to replace USERNAME and PASSWORD with your proxy user credentials.

curl --user "USERNAME:PASSWORD" 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "google_search", "domain": "com", "query": "adidas"}'

If you have any questions not covered by this documentation, please contact your account manager or our support staff at [email protected].


Postman

Download and import this Postman collection to try out all of the Google crawler functions and data delivery methods documented on this page.

Real-Time Crawler for Google Postman


Integration Methods

Real-Time Crawler for Google supports three integration methods which have their unique benefits:

  • Push-Pull. Using this method it is now required to mainain an active connection with our endpoint to retrieve the data. Upon making a request, our system is able to automatically ping users server when the job is done (see Callback). This method saves computing resources and can be scaled easily.
  • Realtime. The method requires user to maintain an active connection with our endpoint in order to get the results successfully when the job is completed. This method can be implemented into one service while Push-Pull method is a two step process.
  • SuperAPI. This method is very similar to Realtime but instead posting data to our endpoint, user can use HTML Cralwer as a proxy. To retrieve the data, user must set up a proxy endpoint and make GET request to a desired URL. Additional parameters must be added using headers.

Our recommended data extraction method is Push-Pull.


Push-Pull

This is the most simple yet the most reliable and recommended data delivery method. In Push-Pull scenario you send us a query, we return you job id, and once the job is done you can use that id to retrieve content from /results endpoint. You can check job completion status yourself, or you can set up a simple listener that is able to accept POST queries. This way we will send you a callback message once the job is ready to be retrieved. In this particular example the results will be automatically uploaded to your S3 bucket named YOUR_BUCKET_NAME.

You can also try and see how Push-Pull method works via Postman. Download this file to get started.


Single Query

The following endpoint will handle single queries for one keyword or URL. The API will return a confirmation message containing job information, including job id. You can check job completion status using that id, or you can ask us to ping your callback endpoint once the scraping task is finished by adding callback_url in the query.

POST https://data.oxylabs.io/v1/queries

You need to post query parameters as data in JSON body.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json" 
 -d '{"source": "google_search", "domain": "com", "query": "adidas", "callback_url": "https://your.callback.url", "storage_type": "s3", "storage_url": "YOUR_BUCKET_NAME"}'

The API will respond with query information in JSON format, by printing it in response body, similar to this:

{
  "callback_url": "https://your.callback.url",
  "client_id": 5,
  "context": [
    {
      "key": "results_language",
      "value": null
    },
    {
      "key": "safe_search",
      "value": null
    },
    {
      "key": "tbm",
      "value": null
    },
    {
      "key": "cr",
      "value": null
    },
    {
      "key": "filter",
      "value": null
    }
  ],
  "created_at": "2019-10-01 00:00:01",
  "domain": "com",
  "geo_location": null,
  "id": "12345678900987654321",
  "limit": 10,
  "locale": null,
  "pages": 1,
  "parse": false,
  "render": null,
  "query": "adidas",
  "source": "google_search",
  "start_page": 1,
  "status": "pending",
  "storage_type": "s3",
  "storage_url": "YOUR_BUCKET_NAME/12345678900987654321.json",
  "subdomain": "www",
  "updated_at": "2019-10-01 00:00:01",
  "user_agent_type": "desktop",
  "_links": [
    {
      "rel": "self",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321",
      "method": "GET"
    },
    {
      "rel": "results",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
      "method": "GET"
    }
  ]
}

Check Job Status

If your query had callback_url, we will send you a message containing link to content once the scraping task is done. However, if there was no callback_url in the query, you will need to check job status yourself. For that you need to use the URL in href under rel:self in the response message you received after submitting your query to our API. It should look similar to this: http://data.oxylabs.io/v1/queries/12345678900987654321.

GET https://data.oxylabs.io/v1/queries/{id}

Querying this link will return job information, including its status. There are 3 possible status values:

pendingThe job is still in the queue and has not been completed.
doneThe job is completed, you may retrieve the result by querying the URL in href under rel:results : http://data.oxylabs.io/v1/queries/12345678900987654321/results
faultedThere was an issue with the job and we couldn't complete it, most likely due to a server error on the target site's side.
curl --user user:pass1 'http://data.oxylabs.io/v1/queries/12345678900987654321'

The API will respond with query information in JSON format, by printing it in response body. Notice that job status has been changed to done. You can now retrieve content by querying http://data.oxylabs.io/v1/queries/12345678900987654321/results.

You can also see that the task has been updated_at 2019-10-01 00:00:15 – the query took 14 seconds to complete.

{
  "client_id": 5,
  "context": [
    {
      "key": "results_language",
      "value": null
    },
    {
      "key": "safe_search",
      "value": null
    },
    {
      "key": "tbm",
      "value": null
    },
    {
      "key": "cr",
      "value": null
    },
    {
      "key": "filter",
      "value": null
    }
  ],
  "created_at": "2019-10-01 00:00:01",
  "domain": "com",
  "geo_location": null,
  "id": "12345678900987654321",
  "limit": 10,
  "locale": null,
  "pages": 1,
  "parse": false,
  "render": null,
  "query": "adidas",
  "source": "google_search",
  "start_page": 1,
  "status": "done",
  "subdomain": "www",
  "updated_at": "2019-10-01 00:00:15",
  "user_agent_type": "desktop",
  "_links": [
    {
      "rel": "self",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321",
      "method": "GET"
    },
    {
      "rel": "results",
      "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
      "method": "GET"
    }
  ]
}

Retrieve Job Content

Once you know the job is ready to retrieved either by checking its status or receiving a callback from us, you can GET it using the URL in href under rel:results in either our initial response or in callback message. It should look similar to this: http://data.oxylabs.io/v1/queries/12345678900987654321/results.

GET https://data.oxylabs.io/v1/queries/{id}/results

The results can be automatically retrieved without periodically checking job status by setting up Callback service. User needs to specfy the IP or domain of the server where the Callback service is running. When our system completes a job, it will send a message to the provided IP or domain and the Callback service will download the results as described in the Callback implementation example.

curl --user user:pass1 'http://data.oxylabs.io/v1/queries/12345678900987654321/results'

The API will return job content:

{
  "results": [
    {
      "content": "<!doctype html>
        CONTENT      
      ",
      "created_at": "2019-10-01 00:00:01",
      "updated_at": "2019-10-01 00:00:15",
      "page": 1,
      "url": "https://www.google.com/search?q=adidas&hl=en&gl=US",
      "job_id": "12345678900987654321",
      "status_code": 200
    }
  ]
}

Callback

A callback is a POST request we send to your machine, informing that the data extraction task is completed and providing URL to download scraped content. This means that you no longer need to check job status manually. Once the data is here, we will let you know, and all you need to do now is retrieve it.

# Please see code samples in Python and PHP.

Sample callback output

{  
   "created_at":"2019-10-01 00:00:01",
   "updated_at":"2019-10-01 00:00:15",
   "locale":null,
   "client_id":163,
   "user_agent_type":"desktop",
   "source":"google_search",
   "pages":1,
   "subdomain":"www",
   "status":"done",
   "start_page":1,
   "parse":0,
   "render":null,
   "priority":0,
   "ttl":0,
   "origin":"api",
   "persist":true,
   "id":"12345678900987654321",
   "callback_url":"http://your.callback.url/",
   "query":"adidas",
   "domain":"com",
   "limit":10,
   "geo_location":null,
   {...}
   "_links":[
      {  
         "href":"https://data.oxylabs.io/v1/queries/12345678900987654321",
         "method":"GET",
         "rel":"self"
      },
      {  
         "href":"https://data.oxylabs.io/v1/queries/12345678900987654321/results",
         "method":"GET",
         "rel":"results"
      }
   ],
}

Batch Query

Real-Time Crawler also supports executing multiple keywords, up to 1,000 keywords with each batch. The following endpoint will submit multiple keywords to the extraction queue.

POST https://data.oxylabs.io/v1/queries/batch

You need to post query parameters as data in JSON body.

The system will handle every keyword as a separate request. If you provided callback URL, you will get a separate call for each keyword. Otherwise, our initial response will contain job ids for all keywords. For example, if you sent 50 keywords, we will return 50 unique job ids.

Important! query is the only parameter that can have multiple values. All other parameters are the same for that batch query.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries/batch' -H 'Content-Type: application/json'
 -d '@keywords.json'

keywords.json content:

{  
   "query":[  
      "adidas",
      "nike",
      "reebok"
   ],
   "source": "google_search",
   "domain": "com",
   "callback_url": "https://your.callback.url"
}

The API will respond with query information in JSON format, by printing it in response body, similar to this:

{
  "queries": [
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2019-10-01 00:00:01",
      "domain": "com",
      "id": "12345678900987654321",
      {...}
      "query": "adidas",
      "source": "google_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/12345678900987654321/results",
          "method": "GET"
        }
      ]
    },
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2019-10-01 00:00:01",
      "domain": "com",
      "id": "12345678901234567890",
      {...}
      "query": "nike",
      "source": "google_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/12345678901234567890/results",
          "method": "GET"
        }
      ]
    },
    {
      "callback_url": "https://your.callback.url",
      {...}
      "created_at": "2019-10-01 00:00:01",
      "domain": "com",
      "id": "01234567899876543210",
      {...}
      "query": "reebok",
      "source": "google_search",
      {...}
          "rel": "results",
          "href": "http://data.oxylabs.io/v1/queries/01234567899876543210/results",
          "method": "GET"
        }
      ]
    }
  ]
}

Get Notifier IP Address List

You may want to whitelist the IPs sending you callback messages or get the list of these IPs for other purposes. This can be done by GETing this endpoint: https://data.oxylabs.io/v1/info/callbacker_ips.

curl --user user:pass1 'https://data.oxylabs.io/v1/info/callbacker_ips'

The API will return the list of IPs making callback requests to your system:

{
    "ips": [
        "x.x.x.x",
        "y.y.y.y"
    ]
}

Upload to Storage

By default RTC job results are stored in our databases. This means that you will need to query our results endpoint and retrieve content yourself. Custom storage feature allows you to store results in your own cloud storage. The advantage of this feature is that you don't have to make extra requests in order to fetch results – everything goes directly to your storage bucket.

We support Amazon S3 and Google Cloud Storage. If you would like to use a different type of storage, please contact your account manager to discuss the feature delivery timeline.

Amazon S3

To get your job results uploaded to your Amazon S3 bucket, please set up access permissions for our service. To do that, go to https://s3.console.aws.amazon.com/ > S3 > Storage > Bucket Name (if don't have one, create new) > Permissions > Bucket Policy

Real-Time Crawler for Google Upload to Storage1

You can find bucket policy in this JSON or in code sample area on the right. Don't forget to change bucket name under YOUR_BUCKET_NAME. This policy allows us to write to your bucket, give access to uploaded files to you, and know bucket location.

Google Cloud Storage

To get your job results uploaded to your Google Cloud Storage bucket, please set up special permissions for our service. To do that, please create a custom role with the storage.objects.create permission and assign it to the Oxylabs service account email oxyserps-storage[email protected].

Real-Time Crawler for Google Upload to Storage2

Real-Time Crawler for Google Upload to Storage3

Usage

To use this feature, please specify two additional parameters in your requests. Learn more here.

The upload path looks like this: YOUR_BUCKET_NAME/job_ID.json. You will find job ID in response body that you receive from us after submitting a request. In this example job ID is 12345678900987654321.

{
    "Version": "2012-10-17",
    "Id": "Policy1577442634787",
    "Statement": [
        {
            "Sid": "Stmt1577442633719",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::324311890426:user/oxylabs.s3.uploader"
            },
            "Action": "s3:GetBucketLocation",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME"
        },
        {
            "Sid": "Stmt1577442633719",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::324311890426:user/oxylabs.s3.uploader"
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
        }
    ]
}

Realtime

The data submission is the same as in Push-Pull method, but Realtime case we will return the content on open connection. You send us a query, the connection remains open, we retrieve the content and bring it to you. The endpoint that handles that is this:

POST https://realtime.oxylabs.io/v1/queries

There is a timeout limit of 150 seconds for open connections, therefore in rare cases of heavy load we may not be able to ensure the data gets to you.

You need to post query parameters as data in JSON body. Please see example for more details.

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" 
 -d '{"source": "google_search", "domain": "com", "query": "adidas"}'

Example response body that will be returned on open connection:

{
  "results": [
    {
      "content": "
      CONTENT
      "
      "created_at": "2019-10-01 00:00:01",
      "updated_at": "2019-10-01 00:00:15",
      "id": null,
      "page": 1,
      "url": "https://www.google.com/search?q=adidas&hl=en&gl=US",
      "job_id": "12345678900987654321",
      "status_code": 200
    }
  ]
}

SuperAPI

If you ever used regular proxies for data scraping, integrating SuperAPI delivery method will be a breeze. All that needs to be done is to use our entry node as proxy, authorize with Real-Time Crawler credentials, and ignore certificates. In cURL it's -k or --insecure. Your data will reach you on open connection.

GET realtime.oxylabs.io:60000

SuperAPI only supports a handful of parameters since it only works with a Direct data source where full URL is provided. These parameters should be sent as headers. This is a list of accepted parameters:

X-OxySERPs-User-Agent-TypeThere is no way to indicate a specific User-Agent, but you can let us know which browser and platform to use. A list of supported User-Agents can be found here.
X-OxySERPs-Geo-LocationIn some cases you may need to indicate the geographical location that the result should be adapted for. This parameter corresponds to the geo_location. Read about our suggested geo_location parameter structures here.

If you need help setting up SuperAPI, drop a line at [email protected].

curl -k -x realtime.oxylabs.io:60000 -U user:pass1 -H "X-OxySERPs-User-Agent-Type: desktop_chrome" -H "X-OxySERPs-Geo-Location: New York,New York,United States" "https://www.google.com/search?q=adidas"

Content Type

Real-Time Crawler can return either raw HTML, or structured (parsed) JSON. Bear in mind that not all data sources can be returned structured. An icon under each data source in this documentation will indicate whether we are able to parse it, or we can only return raw HTML.

Please see Parsed Data to see which fields we return with each Data Source.


Data Sources

There are multiple approaches how to retrieve data from Google using Real-Time Crawler. You can give us full URL via Direct, or you can specify parameters via specifically built data sources, such as SearchShopping Product or Images.

Technically not a content type, but Real-Time Crawler is able to render JavaScript when scraping. This is necessary in some Google pages, such as Flights and Patents. A checkmark under Render JS will indicate whether a particular data source can be scraped with JavaScript enabled.

If you are unsure which way to choose, drop us a line at [email protected] or contact your account manager.


Direct

Real-Time Crawler for Google Direct

google source is designed to retrieve content of direct URLs of various Google pages. This means that instead of sending multiple parameters, you can provide us with a direct URL to required Google page. We do not strip any parameters or alter your URLs in any other way.

This data source also supports parsed data (Parsed JSON), as long as the URL submitted is for Google Search (SERP page). If we are unable to confirm this is a SERP page request, a failure message will be returned.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle
urlDirect URL (link) to Google page
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
parsetrue will return parsed data, as long as the URL submitted is for Google Search. See Parsed Data for more information.
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

In this example the API will retrieve Google Scholar search for keyword newton in Push-Pull method:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google", "url": "https://scholar.google.com/scholar?hl=en&q=newton&btnG=&as_sdt=1%2C5&as_sdtp="}'

Here is the same example in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google", "url": "https://scholar.google.com/scholar?hl=en&q=newton&btnG=&as_sdt=1%2C5&as_sdtp="}'

And via SuperAPI:

curl -k -x realtime.oxylabs.io:60000 -U user:pass1 "https://scholar.google.com/scholar?hl=en&q=newton&btnG=&as_sdt=1%2C5&as_sdtp="

Real-Time Crawler for Google Search

google_search source is designed to retrieve Google Search results (SERP).

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_search
domainDomain localizationcom
queryUTF-encoded keyword
start_pageStarting page number1
pagesNumber of pages to retrieve1
limitNumber of results to retrieve in each page10
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
parsetrue will return parsed data. See Parsed Data for more information.
parser_typeLeave blank to get the default layout, or set the value to v2 to make use of the updated Google Search parsed output schema and/or receive the result in CSV format (only works with Google Web Search). See Parsed Data for more information.
context:Setting the fpstate value to aig will make Google load more apps. This parameter is only useful if used together with the render parameter.
fpstate
context:true will turn off spelling auto-correction.FALSE
nfpr
context:Results language. List of supported Google languages can be found here.
results_language
context:To-be-matched or tbm parameter. Accepted values are: app, blg, bks, dsc, isch, nws, pts, plcs, rcp, lcl
tbm
context:tbs parameter. This parameter is like a container for more obscure google parameters, like limiting/sorting results by date as well as other filters some of which depend on the tbm parameter (e.g. tbs=app_os:1 is only available with tbm value app). More info here.
tbs
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the  Upload to Storage  page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

API makes request to google.nl to retrieve search results pages from number 11 to number 20 for keyword adidas. The results will be displayed in French, since results_language parameter is also passed through via context. The API will post a JSON request to your.callback.url containing the URL to download the raw HTML output once the data retrieval task is successfully finished. This is Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_search", "domain": "nl", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url", "context": [{"key": "results_language", "value": "fr"}]}}'

And here is the same example in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_search", "domain": "nl", "query": "adidas", "start_page": 11, "pages": 10, "callback_url": "https://your.callback.url", "context": [{"key": "results_language", "value": "fr"}]}}'

Ads

Real-Time Crawler for Google Ads

google_ads source is optimized to retrieve Google Search results page (SERP) with paid ads. This source will return only 10 results per page, ensuring highest changes of paid results showing up. Other than that, it supports the same parameters as regular Search

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_ads
domainDomain localizationcom
queryUTF-encoded keyword
start_pageStarting page number1
pagesNumber of pages to retrieve1
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
callback_urlURL to your callback endpoint
parsetrue will return parsed data. See Parsed Data for more information.
context:true will turn off spelling auto-correction.FALSE
nfpr
context:Results language. List of supported Google languages can be found here.
results_language
context:To-be-matched or tbm parameter. Accepted values are: app, blg, bks, dsc, isch, nws, pts, plcs, rcp, lcl
tbm
context:tbs parameter. This parameter is like a container for more obscure google parameters, like limiting/sorting results by date as well as other filters some of which depend on the tbm parameter (e.g. tbs=app_os:1 is only available with tbm value app). More info here.
tbs
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

API makes request to google.nl to retrieve search results for keyword adidas. The API will post a JSON request to your.callback.url containing the URL to download the raw HTML output once the data retrieval task is successfully finished. This is Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_ads", "domain": "nl", "query": "adidas", "callback_url": "https://your.callback.url"}'

And here is the same example in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_ads", "domain": "nl", "query": "adidas"}'

Hotels

Real-Time Crawler for Google Hotels

google_hotels data source is designed to retrieve Google Hotel search results.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_hotels
domainDomain localizationcom
queryUTF-encoded keyword
start_pageStarting page number1
pagesNumber of pages to retrieve1
limitNumber of results to retrieve in each page10
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
results_languageResults language. List of supported Google languages can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
context:true will turn off spelling auto-correction.FALSE
nfpr
context:Number of guests2
hotel_occupancy
context:Length for staying in the hotel, from – to. Example: 2017-07-12,2017-07-13
hotel_dates
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

Please note that with Google hotels you always need to send a keyword with ‘hotels' word inside, for example ‘hotels in Los Angeles', ‘hotels in Paris, France', etc. Both ‘hotel' and ‘hotels' work. Google also supports local languages, so you can send ‘Hotelli Helsingissä' for hotels in Helsinki or ‘viešbučiai Vilnius' for hotels in Vilnius.

In this example API will retrieve first 3 pages of hotel availability for 1 guest between 2019-10-01 and 2019-10-10 for hotels in Paris from google.com. This is Push-Pull method.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_hotels", "domain": "com", "pages": 3, "query": "hotels in Paris", "context": [{"key": "hotel_occupancy", "value": 1}, {"key": "hotel_dates", "value": "2019-10-01,2019-10-10"}]}'

This is in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_hotels", "domain": "com", "pages": 3, "query": "hotels in Paris", "context": [{"key": "hotel_occupancy", "value": 1}, {"key": "hotel_dates", "value": "2019-10-01,2019-10-10"}]}'

Travel: Hotels

Real-Time Crawler for Google Travel Hotels

google_travel_hotels data source is designed to retrieve Google Travel service's hotel search results.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_travel_hotels
domainDomain localizationcom
queryUTF-encoded keyword
start_pageStarting page number1
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. Please note that this source can accept a limited number of geo_location values – please check this file to see geo_location values that don't yield accurate results.
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot). Please note that without JavaScript rendering, Google Travel Hotels will not return any useful content.
callback_urlURL to your callback endpoint
context:Number of guests2
hotel_occupancy
context:Filter results by # of hotel stars. You may specify one or more values between 2 and 5. Example: [3,4]
hotel_classes
context:Dates for staying at the hotel, from – to. Example: 2017-07-12,2017-07-13
hotel_dates
storage_typeStorage service provider. At the moment only Amazon S3 is supported: s3. Full implementation can be found on the Upload to Storage page.
storage_urlYour Amazon S3 bucket name
   – required parameter

Please note that with Google hotels you always need to send a keyword with ‘hotels' word inside, for example ‘hotels in Los Angeles', ‘hotels in Paris, France', etc. Both ‘hotel' and ‘hotels' work. Google also supports local languages, so you can send ‘Hotelli Helsingissä' for hotels in Helsinki or ‘viešbučiai Vilnius' for hotels in Vilnius.

In this example API will retrieve the 2nd page of results for of hotel availability for 2 guests between 2020-10-01 and 2020-10-10 for hotels in Paris from google.com. The results will be filtered to only show 2 and 4 star hotels. This is Push-Pull method.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_travel_hotels", "domain": "com", "start_page": 2, "query": "hotels in Paris", "callback_url": "https://your.callback.url", "context": [{"key": "hotel_occupancy", "value": 2}, {"key": "hotel_dates", "value": "2020-10-01,2020-10-10"}, {"key": "hotel_classes", "value": [2,4]}]}'

This is in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_travel_hotels", "domain": "com", "start_page": 2, "query": "hotels in Paris", "context": [{"key": "hotel_occupancy", "value": 2}, {"key": "hotel_dates", "value": "2020-10-01,2020-10-10"}, {"key": "hotel_classes", "value": [2,4]}]}'

Real-Time Crawler for Google Shopping Search

google_shopping_search source is designed to retrieve Google Shopping search results.

POST https://data.oxylabs.io/v1/queries

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_shopping_search
domainDomain localizationcom
queryUTF-encoded keyword
start_pageStarting page number1
pagesNumber of pages to retrieve1
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
results_languageResults language. List of supported Google languages can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
parsetrue will return parsed data. See Parsed Data for more information.
context:true will turn off spelling auto-correction.FALSE
nfpr
context:Sort product list by given criteria. r applies default Google sorting, rv – by review score, p – by price ascending, pd – by price descendingr
sort_by
context:Minimum price of products to filter
min_price
context:Maximum price of products to filter
max_price
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

The API will download first 4 pages of Google Shopping search for keyword adidas, sorted by descending price and minimum price of $20. This is how it's done in Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_search", "domain": "com", "query": "adidas", "pages": 4, "context": [{"key": "sort_by", "value": "pd"}, {"key": "min_price", "value": 20}]}'

Here is the same example in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_search", "domain": "com", "query": "adidas", "pages": 4, "context": [{"key": "sort_by", "value": "pd"}, {"key": "min_price", "value": 20}]}'

Shopping Product

Real-Time Crawler for Google Shopping Product

google_shopping_product source is designed to retrieve Google Shopping product page for specified product.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_shopping_product
domainDomain localizationcom
queryUTF-encoded product code
start_pageStarting page number1
pagesNumber of pages to retrieve1
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
results_languageResults language. List of supported Google languages can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
parsetrue will return parsed data. See Parsed Data for more information.
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

Here the API will download product page for product ID 5007040952399054528 from Google Shopping on google.com. It will also get first 4 pages with pricing information. This is how it looks in Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_product", "domain": "com", "query": "5007040952399054528"}'

The same in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_product", "domain": "com", "query": "5007040952399054528"}'

Shopping Product Pricing

Real-Time Crawler for Google Shopping Product Pricing

google_shopping_pricing source is designed to retrieve Google Shopping product pricing page for specified product.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_shopping_pricing
domainDomain localizationcom
queryUTF-encoded product code
start_pageStarting page number1
pagesNumber of pages to retrieve1
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
results_languageResults language. List of supported Google languages can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
parsetrue will return parsed data. See Parsed Data for more information.
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

Here the API will download product pricing page for product ID 5007040952399054528 from Google Shopping on google.com. Here is an example in Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_pricing", "domain": "com", "query": "5007040952399054528"}'

The same in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_shopping_pricing", "domain": "com", "query": "5007040952399054528"}'

Images

Real-Time Crawler for Google Images

google_images source is designed to retrieve Images search page for images that are similar to the one provided with query parameter, as well as websites containing those images.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_images
domainDomain localizationcom
queryURL to image
start_pageStarting page number1
pagesNumber of pages to retrieve1
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
context:true will turn off spelling auto-correction.FALSE
nfpr
context:Results language. List of supported Google languages can be found here.
results_language
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

In this example the API will download Image search page of similar images for image https://newsneakernews-wpengine.netdna-ssl.com/wp-content/uploads/2017/03/adidas-boost-march-25-2017.jpg from google.com. This is Push-Pull method:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_images", "domain": "com", "query": "https://newsneakernews-wpengine.netdna-ssl.com/wp-content/uploads/2017/03/adidas-boost-march-25-2017.jpg"}'

And this is the same request in Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_images", "domain": "com", "query": "https://www.example.com/img/image.jpg"}'

Suggestions

Real-Time Crawler for Google Suggestions

google_suggest source is designed to retrieve Google keyword suggestions.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_suggest
queryUTF-encoded keyword
localeAccept-Language header value. This will change Google search page web interface language (not results). For example if you use domain com and use locale parameter de-DE, the results will still be American, but Accept-Language will be set to de-DE,de;q=0.8. This would imitate person from US searching in com domain, who has his browser's UI set to German. If you don't use this parameter we will set ‘Accept-Language' parameter according to domain (i.e. en-US for com). List of available Google locales can be found here.
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
user_agent_typeDevice type and browser. The full list can be found here.desktop
renderEnable JavaScript rendering. Use when the target requires JavaScript to load content. Only works via Push-Pull (a.k.a. Callback) method. There are two available values for this parameter: html(get raw output) and png (get a Base64-encoded screenshot).
callback_urlURL to your callback endpoint
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter

API makes request to Google Suggestions page to retrieve suggestions for keyword adidas. The API will post a JSON payload to your.callback.url containing the URL to download the result once the task is finished. Here is an example with Push-Pull:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_suggest", "query": "adidas", "callback_url": "https://your.callback.url"}'

The same request with Realtime:

curl --user user:pass1 'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json"
 -d '{"source": "google_suggest", "query": "adidas"}'

Keyword Data

Real-Time Crawler for Google Keyword Data

google_msv data source will retrieve Google keyword data for specified keywords, as well as suggested keywords (unless passing ideas=False in context). Keywords are passed in query parameter as a string. Keywords are separated by commas. Commas within a keyword are not supported, so a keyword “Water Bottle 5,0L” will actually be interpreted as 2 keywords: “Water Bottle 5” and “0L”. See output example on the right for more details.

Query parameters

ParameterDescriptionDefault Value
sourceData sourcegoogle_msv
queryUTF-encoded keywords, separated by commas
geo_locationThe geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data. For more information, read about our suggested geo_location parameter structures here
context:Language, for example english or french. No parameter or empty value will return results for all languages.
language
context:3-symbol currency codeEUR
currency
context:If true, returns keyword ideas, false will return only data for requested keywordsTRUE
ideas
context:When fetching ideas, will limit the number of idea keywords to retrieve to provided limit rounded up to the nearest multiple of 50 (e.g. 20 -> 50, 123 -> 150). 0 means no limit.0
ideas_limit
context:When fetching ideas, will filter out idea keywords that have a lower average monthly search volume than the provided number. 0 means no filter.0
min_amsv
context:When fetching ideas, will filter out idea keywords that have a higher average monthly search volume than the provided number. 0 means no filter.0
max_amsv
context:When fetching ideas, will filter out idea keywords that do not fall into the provided category. Available categories in .null
category
storage_typeStorage service provider. We support Amazon S3 and Google Cloud Storage. The storage_type parameter values for these storage providers are, correspondingly, s3 and gcs. The full implementation can be found on the Upload to Storage page. This feature only works via Push-Pull (Callback) method.
storage_urlYour storage bucket name. Only works via Push-Pull (Callback) method.
   – required parameter
In this example API will keyword data for meilleur restaurant and all suggested keywords. Keyword language is french, and geo location is Paris,Ile-de-France,France and currency is EUR.

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
-d '{"source": "google_msv", "query": "meilleur restaurant", "geo_location": "Paris,Ile-de-France,France", "context": [{"key": "language", "value": "french"},{"key": "currency", "value": "EUR"}, {"key": "ideas", "value": true}]}'

# OR if you don't want ideas:

curl --user user:pass1 'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json"
-d '{"source": "google_msv", "query": "meilleur restaurant", "geo_location": "Paris,Ile-de-France,France", "context": [{"key": "language", "value": "french"},{"key": "currency", "value": "EUR"}, {"key": "ideas", "value": false}]}'

Sample output (historicalSearchVolume entries and ideas entries cut):

{
    "results": [
    {
        "content":
        {
            "ideas": [
            {
                "cpc": 4.712038,
                "keyword": "meilleur restaurant a paris",
                "currency": "EUR",
                "competition": 0.3385383889238515,
                "averageSearchVolume": 1900,
                "historicalSearchVolume": [
                {
                    "date": "201803",
                    "searchVolume": 1600
                },
                {
                    "date": "201802",
                    "searchVolume": 1900
                }]
            }],
            "seeds": [
            {
                "cpc": 4.05351,
                "keyword": "meilleur restaurant",
                "currency": "EUR",
                "competition": 0.3385341239238515,
                "averageSearchVolume": 2900,
                "historicalSearchVolume": [
                {
                    "date": "201803",
                    "searchVolume": 3600
                },
                {
                    "date": "201802",
                    "searchVolume": 2900
                }]
            }]
        }
    }]
}

Parsed data

Google Web Search (SERP) page is the only one that is extensively supported in parsed data delivery. Below you can find which particular SERP page fields we parse. Structure data is available with Search (all the time) and Direct (as long as SERP page URL is submitted).

Google Web Search ("source": "google_search") supports CSV output. To access it, please include these parameters in your Google Web Search job {"source": "google_search", "parse": true, "parser_type": "v2"}. The result retrieval URL for a CSV job is structured like this: http://data.oxylabs.io/v1/queries/{job_id}/results/normalized?format=csv.


Search

Organic & Paid

Real-Time Crawler for Google Organic & Paid

"results": {
  "paid": [
    {
      "pos": 1,
      "url": "https://www.adidas.com/us",
      "desc": "New York · 10 locations nearby",
      "title": "adidas.com | adidas® Official Site | Official adidas® Online Store‎",
      "url_shown": "www.adidas.com/Official/Site",
      "pos_overall": 1
    }
  ],
  "organic": [
    {
      "pos": 1,
      "url": "https://www.adidas.com/us",
      "desc": "Welcome to adidas Shop for adidas shoes, clothing and view new collections for adidas Originals, running, football, training and much more.",
      "title": "adidas Official Website | adidas US",
      "url_shown": "https://www.adidas.com › ...",
      "pos_overall": 2
    },
    {
      "pos": 2,
      "url": "https://www.mena.adidas.com/",
      "desc": "Browse for adidas shoes, clothing and collections, adidas Originals, Running, Football, Training and more on the official adidas website.",
      "title": "adidas Official Website | adidas",
      "url_shown": "https://www.mena.adidas.com",
      "pos_overall": 6
    },
    {
      "pos": 3,
      "url": "https://www.adidas-group.com/",
      "desc": "adidas AG Supervisory Board announces candidates as shareholder ... adidas celebrates its 70th anniversary and the opening of the Arena building. August 9 ...",
      "title": "adidas - Home",
      "url_shown": "https://www.adidas-group.com",
      "pos_overall": 7
    },
    {
      "pos": 4,
      "url": "https://www.nycgo.com/shopping/the-adidas-store",
      "desc": "You don't so much shop in this flagship Adidas store as you experience it. With an interior modeled on a high school stadium, this four-story Midtown outlet—the  ...",
      "title": "The Adidas Store (Midtown) | NYCgo - NYCgo.com",
      "url_shown": "https://www.nycgo.com › shopping › the-adidas-store",
      "pos_overall": 8
    },
    {
      "pos": 5,
      "url": "https://www.yelp.com/search?find_desc=adidas+store&find_loc=Manhattan%2C+NY",
      "desc": "Reviews on Adidas Store in Manhattan, NY - Adidas, Adidas Originals New York SoHo, adidas Sport Performance, Upper 90 Soccer - Manhattan, Nike Soho, ...",
      "title": "Adidas Store Manhattan, NY - Last Updated August 2019 - Yelp",
      "url_shown": "https://www.yelp.com › search › find_desc=adidas+store",
      "pos_overall": 9
    },
    {
      "pos": 6,
      "url": "https://en.wikipedia.org/wiki/Adidas",
      "desc": "Adidas AG is a multinational corporation, founded and headquartered in Herzogenaurach, Germany, that designs and manufactures shoes, clothing and ...",
      "title": "Adidas - Wikipedia",
      "url_shown": "https://en.wikipedia.org › wiki › Adidas",
      "pos_overall": 10
    }
  ]

Product Listing Ads

Real-Time Crawler for Google Product Listing Ads

"pla": [
  {
    "pos": 1,
    "url": "http://www.adidas.com/us/asweego-shoes/F37038.html?cm_mmc=AdieSEM_Feeds-_-GoogleProductAds-_-NA-_-F37038&cm_mmca1=US&cm_mmca2=NA&kpid=F37038&sourceid=543457011",
    "price": "$40.00",
    "title": "adidas Asweego Shoes Black 10.5 - Mens Running Shoes",
    "seller": "adidas",
    "source": ""
  },
  {
    "pos": 2,
    "url": "http://www.adidas.com/us/baseline-shoes/AW4299.html?cm_mmc=AdieSEM_Feeds-_-GoogleProductAds-_-NA-_-AW4299&cm_mmca1=US&cm_mmca2=NA&kpid=AW4299&sourceid=543457011",
    "price": "$50.00",
    "title": "adidas Baseline Shoes White 13K - Originals Shoes",
    "seller": "adidas",
    "source": ""
  },
  ...
  {
    "pos": 29,
    "url": "https://www.zappos.com/product/8466374/color/21766",
    "price": "$79.95",
    "title": "adidas Superstar W Originals Women's Classic Shoes White/Black/White : 9 B - Medium",
    "seller": "Zappos.com",
    "source": ""
  }
]

Top Stories

Real-Time Crawler for Google Top Stories

"top_stories": [
  {
    "url": "https://www.cnet.com/news/spacex-starhopper-prototype-takes-giant-leap-for-elon-musk/",
    "source": "Cnet",
    "headline": "SpaceX Starhopper rocket prototype takes giant leap for Elon Musk",
    "timeframe": "13 hours ago"
  },
  {
    "url": "https://electrek.co/2019/08/27/elon-musk-tesla-china-made-model-3-rumor/",
    "source": "Electrek",
    "headline": "Elon Musk is rumored to unveil first China-made Tesla Model 3 at event this \nweek",
    "timeframe": "16 hours ago"
  },
  {
    "url": "https://www.bloomberg.com/news/articles/2019-08-28/musk-to-join-china-ai-summit-despite-trump-ordering-firms-out",
    "source": "Bloomberg",
    "headline": "Elon Musk and Jack Ma Will Debate AI at China Summit",
    "timeframe": "4 hours ago"
  }
]

Real-Time Crawler for Google Featured Snippet

"featured_snippet": [
  {
    "url": "https://en.wikipedia.org/wiki/Contract_for_difference",
    "desc": "In finance, a contract for difference (CFD) is a contract between two parties, typically described as \"buyer\" and \"seller\", stipulating that the seller will pay to the buyer the difference between the current value of an asset and its value at contract time (if the difference is negative, then the buyer pays instead to ...",
    "title": "Contract for difference - Wikipedia",
    "url_shown": "https://en.wikipedia.org › wiki › Contract_for_difference",
    "pos_overall": 1
  }
]

Knowledge Base

Real-Time Crawler for Google Knowledge Base

"knowledge": {
  "title": "Adidas",
  "factoids": [
    {
      "title": "Stock price",
      "content": "ADDDF (OTCMKTS) $291.81 +2.74 (+0.95%)Aug 23, 4:00 PM EDT - Disclaimer"
    },
    {
      "title": "Founder",
      "content": "Adolf Dassler"
    },
    {
      "title": "Founded",
      "content": "August 18, 1949, Herzogenaurach, Germany"
    },
    {
      "title": "Headquarters",
      "content": "Herzogenaurach, Germany"
    },
    {
      "title": "Subsidiaries",
      "content": "Reebok, Five Ten Footwear, Runtastic, Ashworth, MORE"
    },
    {
      "title": "Website",
      "content": "https://www.adidas.com/us"
    }
  ],
  "subtitle": "Design company",
  "description": "DescriptionAdidas AG is a multinational corporation, founded and headquartered in Herzogenaurach, Germany, that designs and manufactures shoes, clothing and accessories. It is the largest sportswear manufacturer in Europe, and the second largest in the world, after Nike. Wikipedia"
}

Local Pack

Real-Time Crawler for Google Local Pack

"local_pack": [
  {
    "links": [
      {
        "href": "https://www.adidas.com/us?utm_source=gmb&utm_medium=organic&utm_campaign=US470198_local",
        "title": "Website"
      },
      {
        "href": "#",
        "title": "Directions"
      }
    ],
    "phone": "",
    "title": "adidas Originals Flagship Store",
    "rating": 0,
    "address": "Open ⋅ Closes 7PM",
    "subtitle": "(212) 966-0954",
    "pos_overall": 3,
    "rating_count": 0
  }
]

Twitter Feed

Real-Time Crawler for Google Twitter Feed

"twitter": [
  {
    "pos": 1,
    "url": "https://twitter.com/elonmusk",
    "title": "Elon Musk (@elonmusk) · Twitter",
    "tweets": [
      {
        "url": "https://twitter.com/elonmusk/status/1166081488648949760?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet",
        "content": "Starhopper flight currently tracking to 5pm Texas time for 150m / ~500ft hover test",
        "timeframe": "11 hours ago"
      },
      {
        "url": "https://twitter.com/elonmusk/status/1165377786338406400?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet",
        "content": "Looks like @SpaceX Starhopper flight may be as soon as Monday. FAA support is much appreciated!",
        "timeframe": "2 days ago"
      },
      {
        "url": "https://twitter.com/elonmusk/status/1165371975528640512?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet",
        "content": "If you’re a utility or public utilities commission, please consider using the Tesla Megapack. Better for the environment & usually lower cost than fossil fuel peaker plants! www.tesla.com/megapack",
        "timeframe": "2 days ago"
      }
    ],
    "pos_overall": 1
  }
]

Job Listings

Real-Time Crawler for Google Job Listings

"jobs": {
  "listings": [
    {
      "title": "SR SOFTWARE DEVELOPER",
      "source": "via LinkedIn",
      "employer": "Jobs @ TheJobNetwork",
      "location": "Tulsa, OK",
      "extra_details": [
        "1 day ago",
        "Full-time"
      ]
    },
    {
      "title": "Autonomous Vehicle Simulation Software Engineer",
      "source": "via Built In Colorado",
      "employer": "Azevtec",
      "location": "United States",
      "extra_details": [
        "17 hours ago",
        "Full-time"
      ]
    },
    {
      "title": "Senior Software Engineer - Oracle Transportation Management",
      "source": "via LinkedIn",
      "employer": "XPO Logistics, Inc.",
      "location": "United States",
      "extra_details": [
        "21 hours ago",
        "Full-time"
      ]
    }
  ],
  "location_header": "Near United States"
}

Real-Time Crawler for Google Carousel

"item_carousel": {
  "items": [
    {
      "title": "Chris Evans",
      "subtitle": "Captain America"
    },
    {
      "title": "Mark Ruffalo",
      "subtitle": "Hulk"
    },
    {
      "title": "Tom Holland",
      "subtitle": "Spider-Man"
    },
    {
      "title": "Stan Lee",
      "subtitle": "Old Man in TV Report, Bus Driver"
    },
    {
      "title": "Chris Pratt",
      "subtitle": "Star-Lord"
    }
  ],
  "title": "The Avengers/Cast"
}

Images

Real-Time Crawler for Google Parsed data Images

"images": [
  {
    "alt": "Image result for contemporary wall clock",
    "href": "/search?q=contemporary+wall+clock&safe=off&hl=en&gl=US&tbm=isch&source=iu&ictx=1&fir=Qspcw8WiAmXYzM%253A%252C-m-5575uWYilbM%252C_&vet=1&usg=AI4_-kTGLIU9LAzoCJxO8gp7kK322MV8Yg&sa=X&ved=2ahUKEwjFy8rSy7HkAhWkDrkGHck7A24Q9QEwAXoECAkQBg#imgrc=Qspcw8WiAmXYzM:",
    "source": "https://www.allmodern.com/decor-pillows/sb0/wall-clocks-c429917.html"
  },
  {
    "alt": "Image result for contemporary wall clock",
    "href": "/search?q=contemporary+wall+clock&safe=off&hl=en&gl=US&tbm=isch&source=iu&ictx=1&fir=G0pFK8TQ91ls6M%253A%252Cr5nLxZQfxnA3MM%252C_&vet=1&usg=AI4_-kStPZh1tpSdQ5vTAZUIXwW4zThzQg&sa=X&ved=2ahUKEwjFy8rSy7HkAhWkDrkGHck7A24Q9QEwAnoECAkQCQ#imgrc=G0pFK8TQ91ls6M:",
    "source": "https://www.wayfair.com/decor-pillows/cat/modern-wall-clocks-c1869680.html"
  },
  ...
  {
    "alt": "Image result for contemporary wall clock",
    "href": "/search?q=contemporary+wall+clock&safe=off&hl=en&gl=US&tbm=isch&source=iu&ictx=1&fir=o4ZXIngZyr9HAM%253A%252C-m-5575uWYilbM%252C_&vet=1&usg=AI4_-kTIJMWyTs07HFcVKHTfTd6otLL82w&sa=X&ved=2ahUKEwjFy8rSy7HkAhWkDrkGHck7A24Q9QEwCnoECAkQIQ#imgrc=o4ZXIngZyr9HAM:",
    "source": "https://www.allmodern.com/decor-pillows/sb0/wall-clocks-c429917.html"
  }
]

Real-Time Crawler for Google Related Questions

"related_questions": [
  {
    "pos": 1,
    "question": "What does Adidas stand for?"
  },
  {
    "pos": 2,
    "question": "Is Adidas German?"
  },
  {
    "pos": 3,
    "question": "Are Jordans Adidas?"
  },
  {
    "pos": 4,
    "question": "What shoe brands does adidas own?"
  }
]

Shopping Search

Real-Time Crawler for Google Shopping Search2

...
"organic": [
            {
              "pos": 1,
              "url": "/aclk?sa=l&ai=DChcSEwju8fmd84jpAhUPTxgKHQshDIcYABAHGgJsZQ&sig=AOD64_1BTHVcnNzI5775j9xNkILrCU2KYA&ctype=5&q=&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQvxMI4wQ&adurl=",
              "type": "grid",
              "price": 85,
              "title": "Adidas Women's Swift Run Casual Shoes in White ...",
              "merchant": {
                "url": "/aclk?sa=l&ai=DChcSEwju8fmd84jpAhUPTxgKHQshDIcYABAHGgJsZQ&sig=AOD64_1BTHVcnNzI5775j9xNkILrCU2KYA&ctype=5&q=&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQg-UECOoE&adurl=",
                "name": "Finish Line"
              },
              "price_str": "$85.00.",
              "pos_overall": 1
            },
            {
              "pos": 2,
              "url": "/shopping/product/4092922174439754197?uule=w+CAIQICIXQ29sb3JhZG8sIFVuaXRlZCBTdGF0ZXM&q=adidas&prds=epd:6096059639745774212,paur:ClkAsKraX5cxKGk1E_r15f66xbFqydL47KoF9cO04jau1Hw_EeaJnz0EV5mb_JEjRlE5_m7N_B5Vg-krR5766rvdESfkczSSBqkGVDV7A5Ts8BlTUCNfpUxgtxIZAFPVH73vXbe47J5qGlzkfYH83D9zVPSv8w,prmr:1&sa=X&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQvxMI7AQ",
              "type": "grid",
              "price": 139.97,
              "title": "adidas Mens Alphaboost Training Shoes White ...",
              "merchant": {
                "url": "/aclk?sa=l&ai=DChcSEwju8fmd84jpAhUPTxgKHQshDIcYABAEGgJsZQ&sig=AOD64_3S0xuLlA1GOzNxCvYQdpeTLZkRyQ&ctype=5&q=&ved=0ahUKEwjpr_Sd84jpAhVI2aYKHYn1CeMQg-UECPQE&adurl=",
                "name": "Baseball Savings.com"
              },
              "price_str": "$139.97.",
              "pos_overall": 2
            },
...

Shopping Product

Real-Time Crawler for Google Shopping Product2

...
{
  "type": "Bundle",
  "items": [
    {
      "value": "Console Only",
      "selected": true,
      "available": true,
      "product_id": "5007040952399054528"
    },
    {
      "value": "Splatoon 2 Bundle",
      "available": false,
      "product_id": "6767220879106424425"
    },
    {
      "value": "Super Mario Odyssey Edition",
      "available": false,
      "product_id": "11634753303078094444"
    }
  ]
}
...

Shopping Product Pricing

Real-Time Crawler for Google Shopping Product Pricing2

"content": {
  "url": "https://www.google.com/shopping/product/5007040952399054528/online",
  "title": "Nintendo Switch with Joy-Con - 32 GB - Gray/Black",
  "rating": 4.5,
  "pricing": [
    {
      "price": 319.99,
      "seller": "Electronic Express",
      "details": "Free shipping",
      "currency": "$",
      "price_tax": 0,
      "price_total": 319.99,
      "seller_link": "/aclk?sa=l&ai=DChcSEwi9t9HqoJ7mAhVCXw0KHdyPBEYYABABGgJxYg&sig=AOD64_2gaL_J1BQ5J5PR-JazDM86N23Nww&adurl=&ctype=5&q=",
      "price_shipping": 0
    },
    {
      "price": 334.99,
      "seller": "ShopZodys",
      "details": "Arrives Dec 9 – 13",
      "currency": "$",
      "price_tax": 27.69,
      "price_total": 412.67,
      "seller_link": "/aclk?sa=l&ai=DChcSEwi9t9HqoJ7mAhVCXw0KHdyPBEYYABADGgJxYg&sig=AOD64_1Rqy4wxKvZXAaoX9FNDBy379EAAA&adurl=&ctype=5&q=",
      "price_shipping": 49.99
    }

Parameter Values

User-Agent

Download full list of user_agent_type values in JSON here.

[
  {
    "user_agent_type": "desktop",
    "description": "Random desktop browser User-Agent"
  },
  {
    "user_agent_type": "desktop_firefox",
    "description": "Random User-Agent of one of the latest versions of desktop Firefox"
  },
  {
    "user_agent_type": "desktop_chrome",
    "description": "Random User-Agent of one of the latest versions of desktop Chrome"
  },
  {
    "user_agent_type": "desktop_opera",
    "description": "Random User-Agent of one of the latest versions of desktop Opera"
  },
  {
    "user_agent_type": "desktop_edge",
    "description": "Random User-Agent of one of the latest versions of desktop Edge"
  },
  {
    "user_agent_type": "desktop_safari",
    "description": "Random User-Agent of one of the latest versions of desktop Safari"
  },
  {
    "user_agent_type": "mobile",
    "description": "Random mobile browser User-Agent"
  },
  {
    "user_agent_type": "mobile_android",
    "description": "Random User-Agent of one of the latest versions of Android browser"
  },
  {
    "user_agent_type": "mobile_ios",
    "description": "Random User-Agent of one of the latest versions of iPhone browser"
  },
  {
    "user_agent_type": "tablet",
    "description": "Random tablet browser User-Agent"
  },
  {
    "user_agent_type": "tablet_android",
    "description": "Random User-Agent of one of the latest versions of Android tablet"
  },
  {
    "user_agent_type": "tablet_ios",
    "description": "Random User-Agent of one of the latest versions of iPad tablet"
  }
]

Locale

Download full list of locale values in JSON here.

[  
   {  
      "locale":{  
         "en-ai":{  
            "description":"Anguilla - English",
            "domain":"com.ai"
         },
         "es-pr":{  
            "description":"Puerto Rico - Spanish",
            "domain":"com.pr"
         },
         ...
         "en-by":{  
            "description":"Belarus - English",
            "domain":"by"
         },
         "en-in":{  
            "description":"India - English",
            "domain":"co.in"
         }
      }
   }
]

Results Language

Download full list of results_language values in JSON here.

[
 {
   "results_language": "af",
   "language": "Afrikaans"
 },
 {
   "results_language": "ar",
   "language": "Arabic"
 },
 ...
 {
   "results_language": "vi",
   "language": "Vietnamese"
 }
]

Geo_location

There are a few ways you can use the geo_location parameter to get correctly-localized Google results.

  • Using Google’s Canonical Location Name. This is very straightforward. Just pass us one of the values found on the CSV download here. Example: “geo_location”: “New York,New York,United States”.
  • Using a state name. Strip the first part of a Google's Canonical Location Name and pass a geo_location value in a “State,Country” format. Works with United States, Australia, India and other countries with federated states. Example: “geo_location”: “California,United States”.
  • Using a country name. To get results localized for the geographical center point of a country, pass an official country name. Example: “geo_location”: “United Kingdom”.
  • Using coordinates and radius. To get hyperlocal search results (especially useful for searches such as “restaurants near me”), you can pass latitude, longitude and radius values. The following example passes the coordinates of Space Needle in Seattle, WA: “geo_location”: “lat: 47.6205, lng: -122.3493, rad: 25000”.

If you pass a misspelled geo_location parameter, chances are, either us or Google will interpret and correct it for you. Nonetheless, we recommend using the parameter structures outlined above, combined with the locale and domain parameters, to get the most accurate results.


Account Status

Usage Statistics

You can find your usage statistics by querying the following endpoint:

GET https://data.oxylabs.io/v1/stats

By default the API will return all time usage statistics. Adding ?group_by=month will return monthly stats, while ?group_by=day will return daily numbers.

This query will return all time statistics. You can find your daily and monthly usage by adding either ?group_by=day or ?group_by=month

curl --user user:pass1 'https://data.oxylabs.io/v1/stats'

Sample output:

{
    "data": {
        "sources": [
            {
                "realtime_results_count": "90",
                "results_count": "10",
                "title": "google_hotels"
            },
            {
                "realtime_results_count": "19",
                "results_count": "87",
                "title": "google_search"
            }
        ]
    },
    "meta": {
        "group_by": null
    }
}

Limits

The following endpoint will give your monthly commitment information as well as how much has already been used:

GET https://data.oxylabs.io/v1/stats/limits
curl --user user:pass1 'https://data.oxylabs.io/v1/stats/limits'

Sample output:

{
    "monthly_requests_commitment": 4500000,
    "used_requests": 985000
}

Response Codes

CodeStatusDescription
204No ContentYou are trying to retrieve a job that has not been completed yet.
400Multiple error messagesBad request structure, could be a misspelled parameter or invalid value. Response body will have more specific error message.
401‘Authorization header not provided' / ‘Invalid authorization header' / ‘Client not found'Missing authorization header or incorrect login credentials.
403ForbiddenYour account does not have access to this resource.
404Not FoundJob ID you are looking for is no longer available.
429Too many requestsExceeded rate limit. Please contact your account manager to increase limits.
500Unknown ErrorService unavailable.
524TimeoutService unavailable.
612Undefined Internal ErrorSomething went wrong and we failed the job you submitted. You can try again at no extra cost, as we don't charge you for faulted jobs. If that doesn't work, give us a shout.
613Faulted After Too Many RetriesWe tried scraping the job you submitted, but gave up after reaching our retry limit. You can try again at no extra cost, as we don't charge you for faulted jobs. If that doesn't work, give us a shout.

Parsed data response codes:

CodeStatusDescription
12000SuccessThe parsed content returned is full and there should be no missing or broken fields.
12002FailureWe couldn't parse the page entirely. There may be an issue with the target website changing its HTML structure.
12003Not SupportedThe web page you asked us to parse is not supported.
12004Partial SuccessWe were able to parse the majority of the page, but there are a few missing fields.
12005Partial SuccessWe were able to parse the majority of the page, but there might be some fields with default values because we could not find them in the HTML.
12006FailureUnexpected error. Let us know you got this response and we'll check what went wrong.
12007UnknownUnknown parsed data status. The actual result could range anywhere from a complete failure to a total success.
12008FailureParsed content is missing.
12009FailureProduct not found. Check the URL you submitted.

Cloud storage upload response codes:

CodeStatusDescription
10001Unexpected ExceptionSomething terribly wrong happened. We probably know about this already and are fixing it. Let us know anyway.
13000Upload SuccessAll good!
13001Upload FailedWe couldn't upload job results your bucket.
13102No Such PathWe couldn't find a bucket with such name. Please double check.
13103Access DeniedBucket doesn't have required permissions. To find out how to give us required access, see here.

References

 


Disclaimer: This part of the content is mainly from the merchant. If the merchant does not want it to be displayed on my website, please contact us to delete your content.

Last Updated on September 11, 2021


Do you recommend the proxy service?

Click on a trophy to award it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top