We’ve welcomed a new friend to the dev team. Here’s a soft welcome for Peacock and lovely card featuring Heath by Discord Member @Corvid_Reaven.
Peacock
We will have a grander welcoming for peacock
at the end of this series, but you may introduce yourself now if you’d like. We met peacock
over this past weekend and bonded immediately following the departure of aela
.
Peacock
is a wonderful addition to the team, helping out with some minor but important daily tasks. Task complexity will increase with a bit more time and experience, but for now, we’re keeping it simple so as to allow peacock
to become familiar with the workflow. We want mastery first, in this case.
Task
Peacock
is currently tasked with downloading images for us. If a link or a direct image is sent into a specific Discord channel, peacock
will download it into the appropriate folder.
We’ll break down how peacock
operates over a couple of posts, as it can get a little complicated.
Tools
The following are requirements of peacock
:
- python 3.6
- discord.py
- Discord
- Google Cloud services
We’ll cover the python
parts today. This tutorial was very helpful for peacock
’s initial setup, and it steps through registering various Discord services. We highly recommend taking a look at it and its second part, particularly if you are interested in some more recreational functions.
Technical details
Follow along in the above-linked tutorial to create a Discord server to play in and register the proper accounts.
The following code block should have fairly clear comments, but let us know if you have any questions. To get peacock
started locally, just run python peacock.py
. Peacock
will be hard at work as long as your host (local computer, in this case) is online.
# peacock.py
import discord
import requests
from bs4 import BeautifulSoup
import re, os, glob, sys
import datetime
import os.path
TOKEN = '<DISCORD BOT TOKEN>'
client = discord.Client()
exts = ["png", "jpg", "webp", "gif"] # the types of image extensions we want to download
domains = ["imgur.com", "we.tl", "https://cdn.discordapp.com/"] # some common url domains we see
def is_dl_link(token):
''' Check if the token is a URL '''
# Handy dandy code snippet for checking if a substring (s) is in a list (of extensions and domains, as sometimes URLs don't have direct links to files with extensions)
if any(s in token for s in (exts+domains)):
return True
else:
return False
def get_links(message):
''' Find all the links in a single message object '''
# assume all attachment links are images we want to download
attachments = []
if len(message.attachments) != 0:
url = message.attachments[0]['url']
attachments.append(url)
# assume any embed with a thumbnail preview is an image we want to download
embed_dls = []
if len(message.embeds) != 0:
for embed in message.embeds:
if embed['thumbnail']:
embed_link = embed['thumbnail']['url']
embed_dls.append(embed_link)
# we're working with English, which splits words up with a [space] character, so split the entire message content up into individual words
tokens = message.content.split()
# use list comprehension to get a list of tokens that are URLs, and combine it with any attachments and embedded links
all_links = [token for token in tokens if is_dl_link(token)] + attachments + embed_dls
return all_links
def downloadImage(imageUrl, imageFolder, imgCount):
''' Download a link into a folder '''
# try to retrieve the URL's content with the wonderful `requests` library
response = requests.get(imageUrl)
# create a unique file name for the image, as it is possible for a message to have duplicate links
# we're naming it: [Date][Message's i-th image][Original image name]
localFileName = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S_") + imgCount + "-" + imageUrl.split('/')[-1]
# define the full folder/path where you want to save the image, including the image name
localPath = imageFolder + "/" + localFileName
# check if the request to retrieve the URL's content was successful
if response.status_code == 200:
print('Downloading %s... to %s' % (imageUrl, localPath))
# if the folder we want to save to doesn't exist, create it
try:
os.makedirs(imageFolder)
except FileExistsError:
# directory already exists
pass
# save the retrieved file to the intended location
with open(localPath, 'wb') as fo:
print("localpath: {} imageurl: {}".format(localPath, imageUrl))
for chunk in response.iter_content(4096):
fo.write(chunk)
# request to retrieve the URL's content failed :(
else:
print("Can't download {}...error: {}".format(imageUrl, response.status_code))
return localPath, localFileName
# @ is a special python syntax for decorators. You can read more here: https://wiki.python.org/moin/PythonDecorators, but we don't really understand them ourselves.
@client.event
# async lets your methods multitask...read more about async here: https://realpython.com/async-io-python/
async def on_message(message):
''' Run this function whenver a new message is received in a channel. The new message is the input. '''
print("**BEEP BOOP NEW MESSAGE RECEIVED**\n")
# get the name of the channel
channel_dir = str(message.channel)
# who wrote the message?
author_dir = str(message.author).split('#')[0]
# get all the links in the message
all_links = get_links(message)
print("Found {} images...".format(len(all_links)))
# this block is for finding the download links of imgur links that aren't direct links to an image
# they look like: https://imgur.com/gallery/U8Lgojr
# and the direct link is: https://i.imgur.com/lYNmstv.png
new_all_links = []
for link in all_links:
print("checking link: {}".format(link))
if not any(s in link for s in exts):
response = requests.get(link)
soup = BeautifulSoup(response.text, "html.parser")
imageUrl = soup.find_all("link", rel="image_src")[0]["href"]
imageUrl = imageUrl.rpartition('.')[0] + "h." + imageUrl.rpartition('.')[-1]
new_all_links.append(imageUrl)
print("replaced link")
else:
new_all_links.append(link)
# print(new_all_links)
# get unique links only
new_all_links = set(new_all_links)
count = 1
print("After getting rid of duplicates, found {} images...".format(len(new_all_links)))
# not that a single message would ever have more than 100 links, but here's some hardcoded padding for filenaming
for link in new_all_links:
# add padding to filename if the count is under 10 so that files are ordered
if count < 9:
imgCount = "0" + str(count)
else:
imgCount = str(count)
imageFolder, fileName = downloadImage(link, channel_dir + "/" + author_dir, imgCount)
count += 1
@client.event
async def on_ready():
''' Let us know when the bot is online and ready to work hard '''
print('Logged in as: {}'.format(client.user.name))
print('------')
if __name__ == '__main__':
''' turn on the bot '''
client.run(TOKEN)
There are a couple of known issues with the way we’ve trained peacock
, particularly with handling indirect imgur
links, but we strongly control for the content that peacock
works with and do some post-processing later (which you’ll see in our post about Google Cloud). As a fledgling member, peacock
may not be robust, but there’s plenty of time to wisen up.
The lack of robustness is a low-priority issue. One very important issue we needed to solve immediately is: how do we ensure the office is open for peacock
to work if we are asleep? The solution is to host peacock
somewhere that is online 24/7. We’ll be using Google Cloud
services for that and describe it in a later article.
Fanart
We have a lovely Heath card from @Corvid_Reaven on our Discord server, created on Valentine’s Day.