We’ve welcomed a new friend to the dev team. Here’s a soft welcome for Peacock and lovely card featuring Heath by Discord Member @Corvid_Reaven.


Peacock

Our good friend peacock getting comfortable

Welcome!

We will have a grander welcoming for peacock at the end of this series, but you may introduce yourself now if you’d like. We met peacock over this past weekend and bonded immediately following the departure of aela.

Saying bye to aela

Bye.

Peacock is a wonderful addition to the team, helping out with some minor but important daily tasks. Task complexity will increase with a bit more time and experience, but for now, we’re keeping it simple so as to allow peacock to become familiar with the workflow. We want mastery first, in this case.

Task

Peacock is currently tasked with downloading images for us. If a link or a direct image is sent into a specific Discord channel, peacock will download it into the appropriate folder.

We’ll break down how peacock operates over a couple of posts, as it can get a little complicated.

Tools

The following are requirements of peacock:

We’ll cover the python parts today. This tutorial was very helpful for peacock’s initial setup, and it steps through registering various Discord services. We highly recommend taking a look at it and its second part, particularly if you are interested in some more recreational functions.

Technical details

Follow along in the above-linked tutorial to create a Discord server to play in and register the proper accounts.

The following code block should have fairly clear comments, but let us know if you have any questions. To get peacock started locally, just run python peacock.py. Peacock will be hard at work as long as your host (local computer, in this case) is online.

# peacock.py
import discord
import requests
from bs4 import BeautifulSoup
import re, os, glob, sys
import datetime
import os.path

TOKEN = '<DISCORD BOT TOKEN>'

client = discord.Client()

exts = ["png", "jpg", "webp", "gif"] # the types of image extensions we want to download
domains = ["imgur.com", "we.tl", "https://cdn.discordapp.com/"] # some common url domains we see

def is_dl_link(token):
	''' Check if the token is a URL '''
	
	# Handy dandy code snippet for checking if a substring (s) is in a list (of extensions and domains, as sometimes URLs don't have direct links to files with extensions)
    if any(s in token for s in (exts+domains)):
        return True
    else:
        return False


def get_links(message):
	''' Find all the links in a single message object '''
	
    # assume all attachment links are images we want to download
    attachments = []
    if len(message.attachments) != 0:
        url = message.attachments[0]['url']
        attachments.append(url)

	# assume any embed with a thumbnail preview is an image we want to download
    embed_dls = []
    if len(message.embeds) != 0:
        for embed in message.embeds:
            if embed['thumbnail']:
                embed_link = embed['thumbnail']['url']
                embed_dls.append(embed_link)

	# we're working with English, which splits words up with a [space] character, so split the entire message content up into individual words
    tokens = message.content.split()
    
    # use list comprehension to get a list of tokens that are URLs, and combine it with any attachments and embedded links 
    all_links = [token for token in tokens if is_dl_link(token)] + attachments + embed_dls

    return all_links

def downloadImage(imageUrl, imageFolder, imgCount):
	''' Download a link into a folder '''
	
	# try to retrieve the URL's content with the wonderful `requests` library
    response = requests.get(imageUrl)
    
    # create a unique file name for the image, as it is possible for a message to have duplicate links
    # we're naming it: [Date][Message's i-th image][Original image name]
    localFileName = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S_") + imgCount + "-" + imageUrl.split('/')[-1]

	# define the full folder/path where you want to save the image, including the image name
    localPath = imageFolder + "/" + localFileName
    
    # check if the request to retrieve the URL's content was successful
    if response.status_code == 200:

        print('Downloading %s... to %s' % (imageUrl, localPath))
        
        # if the folder we want to save to doesn't exist, create it
        try:
            os.makedirs(imageFolder)
        except FileExistsError:
            # directory already exists
            pass
            
        # save the retrieved file to the intended location
        with open(localPath, 'wb') as fo:
            print("localpath: {} imageurl: {}".format(localPath, imageUrl))
            for chunk in response.iter_content(4096):
                fo.write(chunk)

	# request to retrieve the URL's content failed :(
    else:
        print("Can't download {}...error: {}".format(imageUrl, response.status_code))

    return localPath, localFileName

# @ is a special python syntax for decorators. You can read more here: https://wiki.python.org/moin/PythonDecorators, but we don't really understand them ourselves.
@client.event
# async lets your methods multitask...read more about async here: https://realpython.com/async-io-python/
async def on_message(message):
	''' Run this function whenver a new message is received in a channel. The new message is the input. '''
    print("**BEEP BOOP NEW MESSAGE RECEIVED**\n")
    # get the name of the channel
    channel_dir = str(message.channel)
    # who wrote the message?
    author_dir = str(message.author).split('#')[0]
    # get all the links in the message
    all_links = get_links(message)

    print("Found  {} images...".format(len(all_links)))

	# this block is for finding the download links of imgur links that aren't direct links to an image
	# they look like: https://imgur.com/gallery/U8Lgojr
	# and the direct link is: https://i.imgur.com/lYNmstv.png
    new_all_links = []
    for link in all_links:
        print("checking link: {}".format(link))
        if not any(s in link for s in exts):

            response = requests.get(link)
            
            soup = BeautifulSoup(response.text, "html.parser")
            imageUrl = soup.find_all("link", rel="image_src")[0]["href"]
            imageUrl = imageUrl.rpartition('.')[0] + "h." + imageUrl.rpartition('.')[-1]
            new_all_links.append(imageUrl)
            print("replaced link")
        else:
            new_all_links.append(link)

    # print(new_all_links)
    
    # get unique links only
    new_all_links = set(new_all_links)  
    count = 1
    print("After getting rid of duplicates, found {} images...".format(len(new_all_links)))
    
    # not that a single message would ever have more than 100 links, but here's some hardcoded padding for filenaming
    for link in new_all_links:

		# add padding to filename if the count is under 10 so that files are ordered
        if count < 9:
            imgCount = "0" + str(count)
        else:
            imgCount = str(count)

        imageFolder, fileName = downloadImage(link, channel_dir + "/" + author_dir, imgCount)

        count += 1

@client.event
async def on_ready():
	''' Let us know when the bot is online and ready to work hard '''
    print('Logged in as: {}'.format(client.user.name))
    print('------')

if __name__ == '__main__':
	''' turn on the bot '''
	client.run(TOKEN)

There are a couple of known issues with the way we’ve trained peacock, particularly with handling indirect imgur links, but we strongly control for the content that peacock works with and do some post-processing later (which you’ll see in our post about Google Cloud). As a fledgling member, peacock may not be robust, but there’s plenty of time to wisen up.

The lack of robustness is a low-priority issue. One very important issue we needed to solve immediately is: how do we ensure the office is open for peacock to work if we are asleep? The solution is to host peacock somewhere that is online 24/7. We’ll be using Google Cloud services for that and describe it in a later article.

Fanart

We have a lovely Heath card from @Corvid_Reaven on our Discord server, created on Valentine’s Day.

Heath winks