Image Crawler Meets rm -f *

I wrote a simple web crawler that archived any images it found from a site with a large number of backgrounds. I wanted to have rolling backgrounds that almost never repeated.

I let my crawler go and stopped it after a 24 hour period. I ran ls on the directory the images were being saved in to see the results and my ssh session locked up… Or so I thought. I hit Ctrl-C and nothing happened… So I closed my window and opened a new one.

Maybe there was a file name that hit some weird glitch in ls causing it too lock up. Not a big deal I can start from scratch. I switched to the directory and ran an rm -f * and was greeted with this:

/bin/rm: Argument list too long.

Wait what? All I get was this irritating and slightly elusive error message. So the * was being expanded, and making the list too long? To be sure I started hunting in the man pages. man rm recommended me to info coreutils 'rm invocation'. I read through and couldn’t find any limitations or warnings that might relate to the problem. The only thing I could find in there was a line that said:

GNU ‘rm’, like every program that uses the ‘getopt’ function to parse its arguments…

Moving on… getopt is parsing it’s options… man and info pages on getopt don’t really reveal anything…

After some creative Googling I found the answer: getopt’s argument limit is 1024. How many files did I have? I wanted to give ls one more try… I typed it in and sure enough console froze… Or did it? I walked away and did other things. When I came back I had a list of files longer than my console buffer. I was smarter the second time around: ls | wc -l. After about five minutes it came back again with just the number.

177654

I needed to clean up that directory and rm wasn’t going to help me. `find . | xargs rm -f1 and…. Victory! The directory is clean and happy again. Now I’ll just have to figure out how to cut down the number of files, or at the very least organize them into more managable chunks…