How to download multiple files and images from a website using python

How to download multiple files and images from a website using python

Thats a problem that requires a coding solution. I can point you to some tools to use to accomplish this, but not a full code solution.

Request Library: Communicating with HTTP Server (websites)

http://docs.python-requests.org/en/latest/

BeautifulSoup: Html Parser (website source code parsing)

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Example:

>>> import requests
>>> from bs4 import BeautifulSoup as BS
>>> 
>>> response = requests.get(http://news.ycombinator.com)
>>> response.status_code # 200 == OK
200
>>> 
>>> soup = BS(response.text) # Create a html parsing object
>>>
>>> soup.title # Heres the browser title tag
<title>Hacker News</title>
>>>
>>> soup.title.text # The contents of the tag
uHacker News
>>> 
>>> # Heres some article posts
... 
>>> post_containers = soup.find_all(tr, attrs={class:athing})
>>> 
>>> print There are %d article posts. % len(post_containers)
There are 30 article posts.
>>> 
>>> 
>>> # The article name is the 3rd and last object in a post_container
... 
>>> for container in post_containers:
...     title = container.contents[-1] # The last tag
...     title.a.text # Grab the `a` tag inside our titile tag, print the text
... 
uShow HN: u201cWho is hiring?u201d Map
uu2018Flash Boysu2019 Programmer in Goldman Case Prevails Second Time
uForthcoming OpenSSL releases
uShow HN: YouTube Filesystem u2013 YTFS
uGoogle launches Uber rival RideWith
uFinish your stuff
uThe Plan to Feed the World by Hacking Photosynthesis
uNew electric engine improves safety of light aircraft
uHacking Team hacked, attackers claim 400GB in dumped data
uShow HN: Proof of concept u2013 Realtime single page apps
uBerkeley CS 61AS u2013 Structure and Interpretation of Computer Programs, Self-Paced
uAn evaluation of Erlang global process registries: meet Syn
uShow HN: Nearby Buzz u2013xa0Take control of your online reviews
uThe Grateful Deads Wall of Sound
uThe Effects of Intermittent Fasting on Human and Animal Health
uJsCoq
uTaking stock of startup innovation in the Netherlands
uHangout: Becoming a freelance developer
uPanning for Pangrams: The Search for the New Quick Brown Fox
uShow HN: MUI u2013 Lightweight CSS Framework for Material Design
uIntels 10nm Cannonlake delayed, replaced by 14nm Kaby Lake
uVP of Logistics u2013 EasyPost (YC S13) Hiring
uColoradou2019s Effort Against Teenage Pregnancies Is a Startling Success
uLexical Scanning in Go (2011)
uAvoiding traps in software development with systems thinking
uApache Cordova: after 10 months, I wont using it anymore
uAn exercise in profiling a Go program
uThe Science of Pixars u2018Inside Outu2019
uAsk HN: What tech blogs, podcasts do you follow outside of HN?
uNASAu2019s New Horizons Plans July 7 Return to Normal Science Operations
>>> 

A Python solution is to use urllib to download the PDFs. Please see Download pdf using urllib?.

To get a list of PDFs to download, use xml module.

website = urllib.urlopen(http://www.wsdot.wa.gov/mapsdata/tools/InterchangeViewer/SR5.htm).read()
root = ET.fromstring(website)
list = root.findall(table)
hrefs = list.findall(a)
for a in hrefs:
  download(a)

How to download multiple files and images from a website using python

Since your goal is to batch download pdf files, the simplest way is not writing a script, but to use commitial software. Internet Download Manager can just compete what you need in two steps:

  • Copy all those text including links in webbrowser.
  • Select Task > Add batch download from clipboard.

enter

Leave a Reply

Your email address will not be published.