CalvinTorra
Dom Traversal for Fun and Profit
DIY-PHD

Learn These 6 Figure Scraping Techniques As A Side Hustle

Access secret data points from SaaS websites and much more.

ā†’ 6 Figure Scraping Guide ā†


Use this data for your own startup or sell it to those that need it.

No spam, just Javascript, web3 and indie hacking!

ā€‹

    We won’t send you spam. Unsubscribe at any time.

    During my time writing funny words in an IDE to make the computer do what I want, I dabbled in a little web scraping for cash.

    I kept forgetting how to target certain parts of the page that I wanted to scrape and organise within my program.

    So below, I’m putting together a few notes to share with my future self and you šŸ™‚

    Let’s start with a little boilerplate HTML that we can work with.

    <div class="grandparent" id="grandparent-id">
    <!-- top level grandparent -->
        <div class="parent"> <!-- first parent -->
            <div class="child" id="child-one"></div> <!-- child 1 -->
            <div class="child"></div> <!-- child 2 -->
        </div>
        <div class="parent"> <!-- second parent -->
            <div class="child"></div> <!-- child 3 -->
            <div class="child" id="child-four"></div> <!-- child 4 -->
        </div>
    </div>
    

    Get Element by ID

    There should only be one unique ID name per page. So we call getElement (singular).

    const grandparent = document.getElementById("grandparent-id")
    

    Get Elements by Class Name

    Calling get elements (plural) returns an HTMLCollection of elements from the DOM (both the parents in the HTML above). However, when trying to use Array methods on this collection you’ll get an error.

    We can get around this by wrapping the returned collection of elements inside an array, then we’re able to use array methods on that content.

    const parent = Array.from(document.getElementsByClassName("parent"))
    

    Query Selector

    This gives us a single element (the first one that appears in the DOM tree) by targeting the DOM using CSS selectors.

    const grandparent = document.querySelector("#grandparent-id") // id
    const grandparent = document.querySelector(".grandparent") // class
    

    Query Selector All

    Similar to Get Elements by ID, this gives all the elements that match our query. However, this returns a NodeList, which allows us to use Array methods.

    const grandparent = document.querySelectorAll("#grandparent-id") // id
    const grandparent = document.querySelectorAll(".grandparent") // class
    

    Selecting Child Element

    First, we want to target the top grandparent node. From there we can grab all of the children underneath.

    Even though we’re using QuerySelector which usually gives us a NodeList, when calling on the children, we get back an HTMLCollection!! Annoying.

    So we’ll need to create an Array from the returned children.

    const grandparent = document.querySelector(".grandparent")
    const parents = Array.from(grandparent.children)
    const parentOne = parents[0] // etc
    

    We can also drill down into the parent’s children.

    const children = parentOne.children
    

    Selecting Parent Element

    We can use QuerySelector on NodeLists that we’ve already captured to go straight to the child level and skip the parents.

    const childFour = document.querySelector("#child-four")
    const parent = childFour.parent
    

    Selecting Closest Grandparent Element

    This works very similar to QuerySelector, but instead of going down the DOM it moves upwards.

    It takes a CSS argument which moves up the DOM to find the closest element that has the passed selector.

    const childFour = document.querySelector("#child-four")
    const grandparent = childFour.closest(".grandparent")
    

    Skipping DOWN half the DOM

    We can use QuerySelector on NodeLists that we’ve already captured to go straight to the child level and skip the parents.

    const grandparent = document.querySelector(".grandparent")
    const childOne = grandparent.querySelector(".child")
    

    Selecting Siblings Previous + Next

    This gets the next element along from where you currently are. Instead of going up and down, it’s like we’re going sideways through the DOM

    const childOne = document.querySelector("#child-one")
    const childTwo = childOne.nextElementSibling
    
    const childFour = document.querySelector("#child-four")
    const childThree = childFour.previousElementSibling