In this guide, we will learn to retrieve and send scraped data into Slack. We’ll quickly set up a Slack app that scrapes websites for links using a slash command and posts the results inside a Slack channel like this:

Once you deploy this Slack App live, you can return and modify the code to add additional logic, scrape different data.

Use Case

📰 Scrape data from news websites, share, compare and discuss inside a Slack channel.

💌 Scrape domains for email addresses and apply enrichment with Clearbit APIs - share with your Sales team inside Slack

☎️ Quickly Gather phone numbers and assist your SDR team in call calling campaigns inside Slack

🔎Assist your marketing team by pulling data from forums and social media to perform sentiment analysis all within Slack.

Please remember to respect the policies around web crawlers of any sites you scrape.

Table of Contents:

  • Install from Github
  • Test Your Slack App Website Scraper
  • Making Changes
  • Support
  • Acknowledgments

Install from Github

Head on over to Github to fork my project’s code 👉🏼https://github.com/JanethL/SlackAppWebscraper/blob/master/README.md

Click the Open in Autocode button. You will be prompted to sign in or create a FREE account. If you have a Standard Library account click Already Registered and sign in using your Standard Library credentials.

Give your project a unique name and select Start API Project from Github

Autocode automatically sets up a project scaffold to save your project as an API endpoint, but it hasn’t been deployed.

To deploy your API to the cloud navigate through the functions/events/slack/command/ folders and select scrape.js file.

Select the 1 Account Required red button which will prompt you to link a Slack account.

If you’ve built Slack apps with Standard Library, you’ll see existing Slack accounts, or you can select Link New Resource to link a new Slack app.

Select Install Standard Library App.

You should see an OAuth popup that looks something like this:

Select Allow. You’ll have the option to customize your Slack app with a name and image.

Select Finish. The green checkmarks confirm that you’ve linked your accounts correctly. Click Finished Linking.

To deploy your API to the cloud select Deploy API in the bottom-left of the file manager.

🙌 Test Your Slack App Website Scraper

You’re all done. Try it out! Your Slack App is now available for use in the Slack workspace you authorized it for.

Your Slack app should respond to:/cmd scrape <url> <selector> as I show in the screenshot:

I’ve included an additional command as a cheatsheet and list a few websites and their selectors to retrieve links.

Just type /cmd list and you should see your app respond with the following message.

Or review the previous tutorial to learn how to scrape using css selectors.

How It Works

When you submit /cmd scrape https://techcrunch.com/ a.post-block__title__link (or any URL followed by its respective selector) in Slack’s message box, a webhook will be triggered. The webhook, built and hosted on Standard Library, will first make a request to crawler.api, which will return a JSON payload with results from the query.

Our webhook will then create Slack messages for each event and post those to the channel where the command was invoked.

const lib = require('lib')({token: process.env.STDLIB_SECRET_TOKEN});/*** An HTTP endpoint that acts as a webhook for Slack command event* @param {object} event* @returns {object} result Your return value*/module.exports = async (event) => {  // Store API Responses  const result = {slack: {}, crawler: {}};    if ((event.text || '').split(/\s+/).length != 2) {    return lib.slack.channels['@0.6.6'].messages.create({      channel: `#${event.channel_id}`,      text: `${event.text} has wrong format. `    });  }    console.log(`Running [Slack → Retrieve Channel, DM, or Group DM by id]...`);  result.slack.channel = await lib.slack.conversations['@0.2.5'].info({      id: `${event.channel_id}`  });  console.log(`Running [Slack → Retrieve a User]...`);  result.slack.user = await lib.slack.users['@0.3.32'].retrieve({      user: `${event.user_id}`  });    console.log(`Running [Crawler → Query (scrape) a provided URL based on CSS selectors]...`);  result.crawler.pageData = await lib.crawler.query['@0.0.1'].selectors({      url: event.text.split(/\s+/)[0],      userAgent: `stdlib/crawler/query`,      includeMetadata: false,      selectorQueries: [          {              'selector': event.text.split(/\s+/)[1],              'resolver': `attr`,              'attr': 'href'          }      ]  });  let text = `Here are the links that we found for ${event.text.split(/\s+/)[0]}\n \n ${result.crawler.pageData.queryResults[0].map((r) => {    if (r.attr.startsWith('http://') || r.attr.startsWith('https://') || r.attr.startsWith('//')) {        return r.attr;    } else {        return result.crawler.pageData.url + r.attr;    }  }).join(' \n ')}`;  console.log(`Running [Slack → Send a Message from your Bot to a Channel]...`);  result.slack.response = await lib.slack.channels['@0.6.6'].messages.create({    channel: `#${event.channel_id}`,    text: text  })  return result;};

The first line of code imports an NPM package called “lib” to allow us to communicate with other APIs on top of Standard Library:

const lib = require(‘lib’)({token: process.env.STDLIB_SECRET_TOKEN});

Lines 2–6 is a comment that serves as documentation and allows Standard Library to type check calls to our functions. If a call does not supply a parameter with a correct (or expected type) it would return an error.

Line 7 is a function (module.exports) that will export our entire code found in lines 8–54. Once we deploy our code, this function will be wrapped into an HTTP endpoint (API endpoint) and it’ll automatically register with Slack so that every time a Slack command event happens, Slack will send the event payload for our API endpoint to consume.

Lines 11–16 is an if statement that handles improper inputs and posts a message to Slack using lib.slack.channels['@0.6.6'].messages.create.

Lines 18–21 makes an HTTP GET request to the lib.slack.conversations[‘@0.2.5’] API and uses the info method to retrieve the channel object which has info about the channel including name, topic, purpose etc and stores it in result.slack.channel.

Lines 22–25 also makes an HTTP GET request to lib.slack.users[‘@0.3.32’] and uses the retrieve method to get the user object which has info about the user and stores it in result.slack.user.

Lines 27–39 is making an HTTP GET request to lib.crawler.query['@0.0.1'] and passes in inputs from when a Slack command event is invoked. For the url we pass in the first input from our Slack event event.text.split(/\s+/)[0].

userAgent is set to the default: stdlib/crawler/query

includeMetadata is False (if True, will return additional metadata in a meta field in the response)

selectorQueries is an array with one object, the values being {selector:event.text.split(/\s+/)[1],resolver':'attr, attr: href}

For selector we retrieve the second input from the Slack event using event.text.split(/\s+/)[1].

Lines 40–53 creates and posts your message using the parameters that are passed in: channelId, Text.

You can read more about API specifications and parameters here: https://docs.stdlib.com/connector-apis/building-an-api/api-specification/

Making Changes

Now that your app is live, you can return at any time to add additional logic and scrape websites for data with crawler.api.

There are two ways to modify your application. The first is via our in-browser editor, Autocode. The second is via the Standard Library CLI.

via Web Browser

Simply visit Autocode.com and select your project. You can easily make updates and changes this way, save your changes and deploy directly from your browser.

Shipping to Production

Standard Library has easy dev/prod environment management, if you’d like to ship to production, visit build.stdlib.com, find your project and select manage.

From the environment management screen, simply click Ship Release.

Link any necessary resources, specify the version of the release and click Create Release to proceed.

That’s all you need to do!

Support

Via Slack: libdev.slack.com

You can request an invitation by clicking Community > Slack in the top bar on https://stdlib.com.

Via Twitter: @Sandard Library

Via E-mail: support@stdlib.com

Tags

What's your story?  Tell us how you use no-code
Something wrong?