Introduction
- We use many social media apps nowadays. Often as a content creator we care about subscriber count or followers count.
- In this article our goal as a developer is to build backend logic that extract followers / subscriber information online and then serve it using API endpoint to integrate with UI.
Tools Used
- We will use JSoup java library to extract information and Spring boot to build API on top of it.
Medium Followers Count
- Let’s goto medium profile page and inspect the page for followers count
Followers page:
Inspect element
- we can right click on authors follower count element to understand the html structure of that element and look whether we can extract it by classname or id or xpath etc.
- Since it provides classname “pw-follower-count” we will try to extract it by class name.
- We will use jsoup to extract the element by class name “pw-follower-count”. It will give us the Element object so we can get the text element by text() method.
public String getMediumFollowers(@PathVariable(value = "profilePath") String profilePath) throws IOException {
Document document = Jsoup.connect("https://" + profilePath).get();
Element first = document.getElementsByClass("pw-follower-count").first();
String numOfFollowers = first.text();
return numOfFollowers;
}
Youtube Subscriber Count
- Now if we want to extract youtube subscriber count , we can again go to channel page and right click, then inspect element to understand html structure.
- But in case of youtube scraping was not so simple hence, i had to spend some time to understand the html structure, then i found script var that includes all the crucial data so i read that and scrapped subscriber count from it.
Here is the code:
- FilterAndGetSubscriber method is extracting the subscriber count , since actual data elements is nested too much and required some cleaning.
public String getYTSubscribers(@PathVariable(value = "channelId") String channelId) throws IOException {
Document document = Jsoup.connect(YT_PREFIX + channelId).get();
Elements script = document.getElementsByTag("script");
Optional<Element> data = script.stream().filter(a -> a.html().contains("ytInitialData =") == true).findFirst();
return filterAndGetSubscriber(data);
}
/**
* filterAndGetSubscriber
* @param data
* @return
*/
private String filterAndGetSubscriber(Optional<Element> data) {
if (data.isPresent()) {
Element element = data.get();
String html = element.data();
String resultText = html.substring(html.indexOf("subscriberCountText"), html.indexOf("tvBanner"));
String fResultText = resultText.substring(resultText.indexOf("\"simpleText\":\""));
String subscribers = fResultText.replace("\"simpleText\"", "")
.replace(":", "")
.replace("},", "")
.replace("\"", "")
.replace("subscribers", "");
System.out.println(subscribers);
return subscribers;
}
return "Could not get subscriber count";
}
Building API
- Now we have two methods which scrap medium follower count and subscriber count.
- As a next step we will wrap these method with API call so that if we want to integrate with client ( app or web ) we can consume it and build UI with it.
- Since i like Java and Spring boot is goto framework for me to build API layer I will use it .
Dependency
- Let’s add spring boot bare minimum dependencies to build api to get started in the pom.xml file.
API Endpoints
- We will create SocialStatsEndpoint class and added @RestController and @RequestMapping annotations
- Additionally , I added @GetMapping and @CrossOrigin annotation at the method level which act as endpoint resources for use case.
- Once we deploy our spring app , we will have two resources/endpoint .
Medium follower count : http://localhost:8080/resources/medium/followers/{profilePath}
Youtube Subscriber count:
http://localhost:8080/resources/youtube/followers/{channelId}
package com.socialstats.api;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.web.bind.annotation.*;
import java.io.IOException;
import java.util.Optional;
@RestController
@RequestMapping("/resources")
public class SocialStatsEndpoint {
private static final String YT_PREFIX = "https://www.youtube.com/c/";
@CrossOrigin(origins = "*")
@GetMapping("/medium/followers/{profilePath}")
public String getMediumFollowers(@PathVariable(value = "profilePath") String profilePath) throws IOException {
Document document = Jsoup.connect("https://" + profilePath).get();
Element first = document.getElementsByClass("pw-follower-count").first();
String numOfFollowers = first.text();
return numOfFollowers;
}
@CrossOrigin(origins = "*")
@GetMapping("/youtube/subscribers/{channelId}")
public String getYTSubscribers(@PathVariable(value = "channelId") String channelId) throws IOException {
Document document = Jsoup.connect(YT_PREFIX + channelId).get();
Elements script = document.getElementsByTag("script");
Optional<Element> data = script.stream().filter(a -> a.html().contains("ytInitialData =") == true).findFirst();
return filterAndGetSubscriber(data);
}
}
Test
- Now Let’s test our API to consume our API’s.
Medium api
- Enter medium profile url along with the endpoint.
Youtube API
- Enter youtube channelname to the endpoint.
It works , that’s it!
Here is the entire code that is used in this article .
Endpoint & Logic
ackage com.socialstats.api;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.web.bind.annotation.*;
import java.io.IOException;
import java.util.Optional;
@RestController
@RequestMapping("/resources")
public class SocialStatsEndpoint {
private static final String YT_PREFIX = "https://www.youtube.com/c/";
@CrossOrigin(origins = "*")
@GetMapping("/medium/followers/{profilePath}")
public String getMediumFollowers(@PathVariable(value = "profilePath") String profilePath) throws IOException {
Document document = Jsoup.connect("https://" + profilePath).get();
Element first = document.getElementsByClass("pw-follower-count").first();
String numOfFollowers = first.text();
return numOfFollowers;
}
@CrossOrigin(origins = "*")
@GetMapping("/youtube/subscribers/{channelId}")
public String getYTSubscribers(@PathVariable(value = "channelId") String channelId) throws IOException {
Document document = Jsoup.connect(YT_PREFIX + channelId).get();
Elements script = document.getElementsByTag("script");
Optional<Element> data = script.stream().filter(a -> a.html().contains("ytInitialData =") == true).findFirst();
return filterAndGetSubscriber(data);
}
/**
* filterAndGetSubscriber
* @param data
* @return
*/
private String filterAndGetSubscriber(Optional<Element> data) {
if (data.isPresent()) {
Element element = data.get();
String html = element.data();
String resultText = html.substring(html.indexOf("subscriberCountText"), html.indexOf("tvBanner"));
String fResultText = resultText.substring(resultText.indexOf("\"simpleText\":\""));
String subscribers = fResultText.replace("\"simpleText\"", "")
.replace(":", "")
.replace("},", "")
.replace("\"", "")
.replace("subscribers", "");
System.out.println(subscribers);
return subscribers;
}
return "Could not get subscriber count";
}
}
Pom.xml
<!-- JSOUP Dependency -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.15.2</version>
</dependency>
<!-- Spring Dependencies -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
Conclusion
- In this article we used JSoup library to extract subscriber / follower information to build socialstatsendpoints. Then we wrap our logic into consumable API using spring boot app.
- We can extend this same ideas to add Twitter , Facebook or Instagram to extend the scope of the API.