`
SCRIPTS

Useful programs & scripts

IMAGEBOARD MEDIA DOWNLOADER

Ever been browsing a thread with hundredths of images / videos / gifs worth downloading? This script extracts and formats all the media in a thread into a newline separated list of directly links to the target. This list is then piped over to wget to be downloaded.


The script is invoked by a bash function wrapper that can clear or append to the specified download folder.



Add the following aliases to your terminal config script of choice:


export SCRIPTS_DIR=/home/$USER/.scripts/
export IMGBOARD_DL_DIR=/home/$USER/Downloads/imgdl

imgdl() {
    if [ ! -d $IMGBOARD_DL_DIR ]; then
        mkdir $IMGBOARD_DL_DIR;
    fi
    rm -rf $IMGBOARD_DL_DIR/*;
    cd $IMGBOARD_DL_DIR;
    python3 $SCRIPTS_DIR"download_images.py" $1 | xargs wget;  # Path to script
}

                

Add the following script to the directory specified in the SCRIPTS_DIR environment variable.


from bs4 import BeautifulSoup
import urllib.request
import re
import sys

if __name__ == "__main__":

    assert len(sys.argv) == 2, "Invalid number of arguments"
    url = sys.argv[1]
    con = urllib.request.urlopen(urllib.request.Request(
        url, headers={'User-Agent': "Magic Browser"}))
    html = BeautifulSoup(con.read(), "html.parser")
    img_list = []

    # Remove the initial '//' in the href (used by 4chan and some free imageboard engines)
    for a in html.find_all("a", {"href": re.compile("([-\w]+\.(?:bmp|jpeg|jpg|gif|png|webm|pdf|epub))")}):
        m = str.replace(a["href"], "//", "")
        if m not in img_list:
            img_list.append(m)
    # Print (parse) the processed list of found media to wget
    print("\n".join(set(img_list)))

                

To use, pass the thread's URL to the alias that wraps around the python script.


Downloading all the wallpapers in a /wg/ thread
`