How many lines?
I read Size of phpDocumentor on Joshua Eichorn’s blog and questioned the methodology of simply passing all the *.php and *.inc files through wc -l to count the lines. It does not seem reasonable to me to count whitespace and comments as lines of code.
I though perhaps I could use the whitespace/comment stripping ablity of the php cli binary (the php -w option) in the middle of a shell command to get a more accurate count. Something like:
find . -type f | grep -e 'php$\|inc$' | xargs -n 1 php -w | wc -l
But is seems that this option is somewhat more aggresive with removing new lines that I would have prefered. My next thought was to convert every instance of a ; to ;\n, which you can easily do with sed.
Here was the resulting output from running
find . -type f | grep -e 'php$\|inc$' | xargs -n 1 php -w | sed -e 's/;/;\n/g' | wc -l
This is better, but still leaves you vunerable to ; embeded in stings and counts the <?php and ?> delimiters, etc. Not perfect, but a reasonable shot for a one liner shell command.
A little googling turned up SLOCCount, which professes to count multiple languages and strips comments, etc. It also passes those line count through some interesting statisical manipulation and summarizes the results.
Here is output of sloccount on some of my favorite php projects:
phpDocumentor:
SLOC Directory SLOC-by-Language (Sorted) 70488 phpDocumentor php=70442,pascal=46 663 docbuilder php=663 661 Documentation php=661 457 scripts php=446,sh=11 296 top_dir php=248,sh=48 198 HTML_TreeMenu-1.1.2 php=198 102 tutorials php=102 0 media (none) 0 user (none) Totals grouped by language (dominant language first): php: 72760 (99.86%) sh: 59 (0.08%) pascal: 46 (0.06%) Total Physical Source Lines of Code (SLOC) = 72,865 Development Effort Estimate, Person-Years (Person-Months) = 18.06 (216.70) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 1.61 (19.30) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 11.23 Total Estimated Cost to Develop = $ 2,439,420 (average salary = $56,286/year, overhead = 2.40). SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL. Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
SimpleTest:
SLOC Directory SLOC-by-Language (Sorted) 7099 test php=7099 5942 top_dir php=5942 159 packages php=136,sh=23 139 ui php=139 93 extensions php=93 0 CVS (none) 0 docs (none) 0 tutorials (none) Totals grouped by language (dominant language first): php: 13409 (99.83%) sh: 23 (0.17%) Total Physical Source Lines of Code (SLOC) = 13,432 Development Effort Estimate, Person-Years (Person-Months) = 3.06 (36.71) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 0.82 (9.83) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 3.73 Total Estimated Cost to Develop = $ 413,228 (average salary = $56,286/year, overhead = 2.40).
WACT:
SLOC Directory SLOC-by-Language (Sorted) 12154 framework php=12154 9971 tests php=9949,sh=22 2176 examples php=2176 530 external php=530 249 benchmarks php=195,sh=54 92 make php=92 63 top_dir sh=63 0 CVS (none) Totals grouped by language (dominant language first): php: 25096 (99.45%) sh: 139 (0.55%) Total Physical Source Lines of Code (SLOC) = 25,235 Development Effort Estimate, Person-Years (Person-Months) = 5.93 (71.17) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 1.05 (12.64) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 5.63 Total Estimated Cost to Develop = $ 801,208 (average salary = $56,286/year, overhead = 2.40).
ADOdb:
SLOC Directory SLOC-by-Language (Sorted) 8193 drivers php=8193 6203 top_dir php=6203 2049 tests php=2049 1512 session php=1512 997 datadict php=997 921 perf php=921 482 lang php=482 230 pear php=230 102 contrib php=102 0 cute_icons_for_site (none) 0 docs (none) 0 xsl (none) Totals grouped by language (dominant language first): php: 20689 (100.00%) Total Physical Source Lines of Code (SLOC) = 20,689 Development Effort Estimate, Person-Years (Person-Months) = 4.81 (57.77) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 0.97 (11.68) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 4.95 Total Estimated Cost to Develop = $ 650,381 (average salary = $56,286/year, overhead = 2.40).
$801,208! We’re rich!
Actually have been involved in a “benchmark” (as in value benchmark not performance) that used this approach as work, the study organised by Gartner.
Despite the fundamental silliness of counting lines of code to measure a projects value (which penalises those that take advantage of re-use), Gartner compare the result to a large number of other software products, taking into account other factors like user base etc. The end result did seem relatively fair. Given a large project, think the result is fairly accurate; their methodology works best with big numbers basically.
There is another way to measure value, that I’ve heard of, where you measure by “function points” (http://ourworld.compuserve.com/homepages/softcomp/fpfaq.htm) but that requires alot of human effort.
How did it come up with the number of years effort? In fact how did it come up with the schedule estimate, etc? Did you have to feed in any other figures?
It looks like he provides a number of the details behind it (and even tweaking factors if you have a project that differes from the standard) on the programs site
http://www.dwheeler.com/sloccount/sloccount.html#cocomo.
Basic COCOMO Assumptions and Definitions
The COCOMO estimating model is based on the assumptions and definitions discussed below.