How many lines?

I read Size of phpDocumentor on Joshua Eichorn’s blog and questioned the methodology of simply passing all the *.php and *.inc files through wc -l to count the lines. It does not seem reasonable to me to count whitespace and comments as lines of code.

I though perhaps I could use the whitespace/comment stripping ablity of the php cli binary (the php -w option) in the middle of a shell command to get a more accurate count. Something like:

find . -type f | grep -e 'php$\|inc$' | xargs -n 1 php -w | wc -l

But is seems that this option is somewhat more aggresive with removing new lines that I would have prefered. My next thought was to convert every instance of a ; to ;\n, which you can easily do with sed.

Here was the resulting output from running
find . -type f | grep -e 'php$\|inc$' | xargs -n 1 php -w | sed -e 's/;/;\n/g' | wc -l

This is better, but still leaves you vunerable to ; embeded in stings and counts the <?php and ?> delimiters, etc. Not perfect, but a reasonable shot for a one liner shell command.

A little googling turned up SLOCCount, which professes to count multiple languages and strips comments, etc. It also passes those line count through some interesting statisical manipulation and summarizes the results.

Here is output of sloccount on some of my favorite php projects:

phpDocumentor:

SLOC    Directory       SLOC-by-Language (Sorted)
70488   phpDocumentor   php=70442,pascal=46
663     docbuilder      php=663
661     Documentation   php=661
457     scripts         php=446,sh=11
296     top_dir         php=248,sh=48
198     HTML_TreeMenu-1.1.2 php=198
102     tutorials       php=102
0       media           (none)
0       user            (none)


Totals grouped by language (dominant language first):
php:          72760 (99.86%)
sh:              59 (0.08%)
pascal:          46 (0.06%)




Total Physical Source Lines of Code (SLOC)                = 72,865
Development Effort Estimate, Person-Years (Person-Months) = 18.06 (216.70)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 1.61 (19.30)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 11.23
Total Estimated Cost to Develop                           = $ 2,439,420
 (average salary = $56,286/year, overhead = 2.40).
SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."

SimpleTest:

SLOC    Directory       SLOC-by-Language (Sorted)
7099    test            php=7099
5942    top_dir         php=5942
159     packages        php=136,sh=23
139     ui              php=139
93      extensions      php=93
0       CVS             (none)
0       docs            (none)
0       tutorials       (none)


Totals grouped by language (dominant language first):
php:          13409 (99.83%)
sh:              23 (0.17%)




Total Physical Source Lines of Code (SLOC)                = 13,432
Development Effort Estimate, Person-Years (Person-Months) = 3.06 (36.71)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.82 (9.83)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 3.73
Total Estimated Cost to Develop                           = $ 413,228
 (average salary = $56,286/year, overhead = 2.40).

WACT:

SLOC    Directory       SLOC-by-Language (Sorted)
12154   framework       php=12154
9971    tests           php=9949,sh=22
2176    examples        php=2176
530     external        php=530
249     benchmarks      php=195,sh=54
92      make            php=92
63      top_dir         sh=63
0       CVS             (none)


Totals grouped by language (dominant language first):
php:          25096 (99.45%)
sh:             139 (0.55%)




Total Physical Source Lines of Code (SLOC)                = 25,235
Development Effort Estimate, Person-Years (Person-Months) = 5.93 (71.17)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 1.05 (12.64)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 5.63
Total Estimated Cost to Develop                           = $ 801,208
 (average salary = $56,286/year, overhead = 2.40).

ADOdb:

SLOC    Directory       SLOC-by-Language (Sorted)
8193    drivers         php=8193
6203    top_dir         php=6203
2049    tests           php=2049
1512    session         php=1512
997     datadict        php=997
921     perf            php=921
482     lang            php=482
230     pear            php=230
102     contrib         php=102
0       cute_icons_for_site (none)
0       docs            (none)
0       xsl             (none)


Totals grouped by language (dominant language first):
php:          20689 (100.00%)




Total Physical Source Lines of Code (SLOC)                = 20,689
Development Effort Estimate, Person-Years (Person-Months) = 4.81 (57.77)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.97 (11.68)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 4.95
Total Estimated Cost to Develop                           = $ 650,381
 (average salary = $56,286/year, overhead = 2.40).