Logging¶
There are 2 types of logs that you can work with in this project. One will be the HTTP logs for your Flask application and the next one will be your Scrapy spider.
Flask logs¶
By default HTTP logging is disabled for the Flask application. To turn on logging to a file please set the LOGS
variable in settings.py to True
:
from datetime import datetime
from twisted.web import http
from twisted.web.wsgi import WSGIResource
from twisted.web.server import Site
from twisted.internet import reactor
from arachne import Arachne
app = Arachne(__name__)
resource = WSGIResource(reactor, reactor.getThreadPool(), app)
# log files in the `logs` directory
site = Site(resource,
logFormatter=http.combinedLogFormatter,
logPath="logs/"+datetime.now().strftime("%Y-%m-%d.web.log"))
reactor.listenTCP(8080, site)
if __name__ == '__main__':
reactor.run()
Scrapy logs¶
Scrapy logs for spiders are very important to know exactly what is going on. The Scrapy logs are turned on by default for stdout
but to log to a file set the DEBUG
variable in your settings.py to False
.