Settings¶
Since this project depends both on Flask and Scrapy, you can mention the individual settings for each in the settings.py.
Global settings¶
You can set the global settings for Flask and Scrapy through these variables:
Field | Description |
---|---|
USER_AGENT |
Crawl responsibly. Set the USER_AGENT variable for each spider. Default set to Arachne (+http://github.com/kirankoduru/arachne) |
EXPORT_PATH |
Set the export path for your json and csv files. Default set to exports/ directory |
LOGS_PATH |
Set the logs path for your spiders. Default set to logs/ directory. Each day is logged in the datetime file %Y-%m-%d.scrapy.log |
EXPORT_JSON |
Turn json exporting for all spiders ON(True ) or OFF(False ). Default set to False |
EXPORT_CSV |
Turn csv exporting for all spiders ON(True ) or OFF(False ). Default set to False |
LOGS |
Turn ON(True ) or off(False ) HTTP logging for your flask app. Default set to False |
Spider settings¶
You can customize each spider with by modifying the SPIDER_SETTINGS variable in settings.py file. For the initial release you can set the following settings for each spider:
Field | Description |
---|---|
endpoint |
The URL endpoint that you would like to associate with the spider. |
location |
The spider location is usally the module location to the spider in a dot notation. Consider that your DmozSpider is in the spiders directory, then the location variable will be set to spiders.DmozSpider. |
spider |
The class name of the Spider. |
scrapy_settings |
[Coming Soon] This will let you override the individual settings for each spider in the scrapers. You can add scrapy pipelines or extensions through this variable. |