Published

Add a HTML datasource

Add a HTML datasource

A HTML datasource is also a yaml configuration file, whom conforms to HtmlDataSourceProperties. It bridges HTML pages containing datas on products into the platform, mapping HTML pages and associated resources to DataFragments. To handle this, the datasource configuration files are based on xpath and json expressions, and contains all classical websites crawling rules.

TODO : Syntax pointers
Thanks crawl4j

Please, refers to existing configurations for use as examples.

TODO : cdiscount.com.yml la conf d'exemple pas compatible avec (modele fermé / ouvert des confs). Heberger des pages de tests sur official websites pour crawling de test