Monitoring task completion in IPython Notebook
I would like to share a simple but useful tool. When you work a lot with data, primitive but long operations often arise, for example: “download 10,000 URLs”, “read a 2GB file, and do something with each line”, “parse 10,000 html files and get the headers” . It is alarming to look into a hung terminal for a long time, so for a long time I used the following ingenious code:data:image/s3,"s3://crabby-images/3a7a5/3a7a5f9351cd5079a3723a998ff074b1733d4a9e" alt=""
This function is beautiful, for more than a year she wandered from task to task. But recently, I noticed an IntProgress widget in the standard Jupyter package and realized that it was time to change something:
data:image/s3,"s3://crabby-images/59bf5/59bf5c7b4abc76aec98ff2c740011ff0debfa830" alt=""
Logging in stderr has three small problems:
Like many people who work with data, I am a fan of Jupyter. I spend most of my time there. Therefore, I can afford the following solution, incompatible with other environments:data:image/s3,"s3://crabby-images/3c8f1/3c8f16a41e146c2100c8759e196e64fcca983388" alt=""
Everything is the same, only the counter is displayed not in stderr, but in a special widget. Very simple and convenient. For those who are also hooked on Jupyter, I posted a slightly improved version on github.com/alexanderkuk/log-progress on github . The module is distributed by copy-paste. Use on health.
The improved version displays, in addition to the strip, a counter. And it changes color depending on whether the operation completed successfully or not: It
data:image/s3,"s3://crabby-images/16658/166585d60f2565c7ad5befa9ebf53bc1cc826761" alt=""
data:image/s3,"s3://crabby-images/3feff/3feff3568bf6227e0899172da6da1e395c46e0d2" alt=""
supports iterators:
data:image/s3,"s3://crabby-images/61e49/61e49e1dc1a57438f5be7c5e5bfa6216c5b2c4a9" alt=""
Naturally, there can be several progress of bars in one cell:
data:image/s3,"s3://crabby-images/6abd5/6abd5551fbc517d446922778eb5806699e4246d1" alt=""
data:image/s3,"s3://crabby-images/98c43/98c43ae1c3a33b740a5c32af09ef2f14eb612552" alt=""
And they can even work from different threads:
data:image/s3,"s3://crabby-images/5124a/5124a9c8a57aa1d669a5796fac2dc34288f33286" alt=""
In short, once again the link to the code github.com/alexanderkuk/log -progress .
def log_progress(sequence, every=10):
for index, item in enumerate(sequence):
if index % every == 0:
print >>sys.stderr, index,
yield item
data:image/s3,"s3://crabby-images/3a7a5/3a7a5f9351cd5079a3723a998ff074b1733d4a9e" alt=""
This function is beautiful, for more than a year she wandered from task to task. But recently, I noticed an IntProgress widget in the standard Jupyter package and realized that it was time to change something:
data:image/s3,"s3://crabby-images/59bf5/59bf5c7b4abc76aec98ff2c740011ff0debfa830" alt=""
Logging in stderr has three small problems:
- It's not beautiful. Obviously.
- Sometimes it explodes the buffer.
- Sometimes someone else writes to stderr or stdout.
Like many people who work with data, I am a fan of Jupyter. I spend most of my time there. Therefore, I can afford the following solution, incompatible with other environments:
def log_progress(sequence, every=10):
from ipywidgets import IntProgress
from IPython.display import display
progress = IntProgress(min=0, max=len(sequence), value=0)
display(progress)
for index, record in enumerate(sequence):
if index % every == 0:
progress.value = index
yield record
data:image/s3,"s3://crabby-images/3c8f1/3c8f16a41e146c2100c8759e196e64fcca983388" alt=""
Everything is the same, only the counter is displayed not in stderr, but in a special widget. Very simple and convenient. For those who are also hooked on Jupyter, I posted a slightly improved version on github.com/alexanderkuk/log-progress on github . The module is distributed by copy-paste. Use on health.
The improved version displays, in addition to the strip, a counter. And it changes color depending on whether the operation completed successfully or not: It
data:image/s3,"s3://crabby-images/16658/166585d60f2565c7ad5befa9ebf53bc1cc826761" alt=""
data:image/s3,"s3://crabby-images/3feff/3feff3568bf6227e0899172da6da1e395c46e0d2" alt=""
supports iterators:
data:image/s3,"s3://crabby-images/61e49/61e49e1dc1a57438f5be7c5e5bfa6216c5b2c4a9" alt=""
Naturally, there can be several progress of bars in one cell:
data:image/s3,"s3://crabby-images/6abd5/6abd5551fbc517d446922778eb5806699e4246d1" alt=""
data:image/s3,"s3://crabby-images/98c43/98c43ae1c3a33b740a5c32af09ef2f14eb612552" alt=""
And they can even work from different threads:
data:image/s3,"s3://crabby-images/5124a/5124a9c8a57aa1d669a5796fac2dc34288f33286" alt=""
In short, once again the link to the code github.com/alexanderkuk/log -progress .