Crawler Module
This module runs standalone crawlers that scrape product data from external websites and feed it to the open4goods ingestion pipeline. It exposes a Spring Boot application so crawler nodes can be deployed independently from the API.
Building
From this directory you can build and test the crawler with Maven:
mvn clean install # build and run tests
mvn test # run tests only
Alternatively, from the repository root run:
mvn -pl crawler -am clean install
(See AGENTS.md for details.)
Usage
The crawler can be started as any Spring Boot application. Example using the development profile:
java -Dspring.profiles.active=dev -jar target/bin/open4goods-crawler.jar
Once running, the crawler interface will be available at http://localhost:8080.
For more information about the embedded crawler4j library and our patches to it, refer to src/main/java/edu/uci/ics/crawler4j/README.md.