urlopen(url, filename=None, save=None, headers=None, params=None, data=None, prefix='http', convert=True, die=False, response='text', verbose=False)[source]#

Download a single URL.

Alias to urllib.request.urlopen(url).read(). See also sc.download() for downloading multiple URLs. Note: sc.urlopen()/sc.wget() are aliases.

  • url (str) – the URL to open, either as GET or POST

  • filename (str) – if supplied, save to file instead of returning output

  • save (bool) – if supplied instead of filename, then use the default filename

  • headers (dict) – a dictionary of headers to pass

  • params (dict) – a dictionary of parameters to pass to the GET request

  • data (dict) –

  • prefix (str) – the string to ensure the URL starts with (else, add it)

  • convert (bool) – whether to convert from bytes to string

  • die (bool) – whether to raise an exception if converting to text failed

  • response (str) – what to return: ‘text’ (default), ‘json’ (dictionary version of the data), ‘status’ (the HTTP status), or ‘full’ (the full response object)

  • verbose (bool) – whether to print progress


html = sc.urlopen('wikipedia.org') # Retrieve into variable html
sc.urlopen('http://wikipedia.org', filename='wikipedia.html') # Save to file wikipedia.html
sc.urlopen('https://wikipedia.org', save=True, headers={'User-Agent':'Custom agent'}) # Save to the default filename (here, wikipedia.org), with headers
sc.urlopen('wikipedia.org', response='status') # Only return the HTTP status of the site
New in version 2.0.0: renamed from wget to urlopen; new arguments
New in version 2.0.1: creates folders by default if they do not exist
New in version 2.0.4: “prefix” argument, e.g. prepend “http://” if not present
New in version 3.1.4: renamed “return_response” to “response”; additional options