Settings

Since this project depends both on Flask and Scrapy, you can mention the individual settings for each in the settings.py.

Global settings

You can set the global settings for Flask and Scrapy through these variables:

Field Description
USER_AGENT Crawl responsibly. Set the USER_AGENT variable for each spider. Default set to Arachne (+http://github.com/kirankoduru/arachne)
EXPORT_PATH Set the export path for your json and csv files. Default set to exports/ directory
LOGS_PATH Set the logs path for your spiders. Default set to logs/ directory. Each day is logged in the datetime file %Y-%m-%d.scrapy.log
EXPORT_JSON Turn json exporting for all spiders ON(True) or OFF(False). Default set to False
EXPORT_CSV Turn csv exporting for all spiders ON(True) or OFF(False). Default set to False
LOGS Turn ON(True) or off(False) HTTP logging for your flask app. Default set to False

Spider settings

You can customize each spider with by modifying the SPIDER_SETTINGS variable in settings.py file. For the initial release you can set the following settings for each spider:

Field Description
endpoint The URL endpoint that you would like to associate with the spider.
location The spider location is usally the module location to the spider in a dot notation. Consider that your DmozSpider is in the spiders directory, then the location variable will be set to spiders.DmozSpider.
spider The class name of the Spider.
scrapy_settings [Coming Soon] This will let you override the individual settings for each spider in the scrapers. You can add scrapy pipelines or extensions through this variable.

Note

You can override all the Flask settings from the settings.py file but for this relase you have limited settings allowed for Scrapy.