Yet another place where intuitions derived from the normal distribution about the behavior of distributions screws people.
An explanation I like, from michaelochurch:
Let's say that you have 20 tasks. Each involves rolling a 10-sided die.
If it's a 1 through 8, wait that number of minutes. If it's a 9, wait 15
minutes. If it's a 10, wait an hour.
How long is this string of tasks going to take? Summing the median time
expectancy, we get a sum 110 minutes, because the median time for a task is
5.5 minutes. The actual expected time to completion is 222 minutes, with 5+
hours not being unreasonable if one rolls a lot of 9's and 10's.
This is an obvious example where summing the median expected time for the
tasks is ridiculous, but it's exactly what people do when they compute time
estimates, even though the reality on the field is that the time-cost
distribution has a lot more weight on the right. (That is, it's more common
for a "6-month" project to take 8 months than 4. In statistics-wonk terms,
the distribution is "log-normal".)
When I put together plans and estimates, I always take a lot of care to separate out those things which are linear and those things which exponentially impact other things within the schedule, along with the sort of inflection points. I may not know where I'm going to roll a 9 or 10, as they may crop up anywhere, but there are certainly areas where they are more possible and less possible.
In a sane world, at least. Can't do much with a black swan.
In a startup I've worked on, we did estimates based on hours, but also had the "no idea" tasks. We would then make sure to alternate sprints to work on well estimated tasks (as you say, "linear" tasks) and others to work on the hard ones, without a set deadline; or split the team effort in that manner.
I think that's a better strategy than making wild guesses and ultimately falling behind schedule, but at the same time maintaining cadence, which buys you power to sometimes say "hard things are hard, I don't know when it'll be ready to ship".
If, as the article says, "Clients’ Focus on Low Price Is a Major Reason for Effort Overruns", then probably a simpler theoretical explanation can just be the winner's curse.
No need to talk about distributions, gaussian or otherwise.
I think many and perhaps most poor estimates are caused by initial estimates being viewed as too high for the project, and instead of deciding the project isn't worth doing at its estimated cost, instead deciding the estimates must be wrong in order to align expected project value with expected project cost.
Perhaps in a twisted future where we estimate project cost before deciding which projects to take on, we might discover our estimates are much better.
A related pathology is trading technical debt for speed, every time, on every project. The debt will be paid.
Well, exactly. If I understand the paper, the only condition you need in order to arrive at accurate estimates is the absence of pressure to underestimate.
Software effort estimation methods fail because they ignore the margin of error. In Mathematics, engineering and statistics a result does not mean anything if it does not include the margin of error. One month may be one month if the margin of error is one day, and one month may be one year if the margin of error is one year. Classic estimation techniques like Cocomo or Albrecht Function Points ignore this fact. They have no mathematical rigor. If presented with mathematical rigor they would be absurd, because their margin of error is between 100% and 600%. Classic software effort estimation techniques are harmfull and dangerous, because ignoring the margin of error they invite to make decisions that ignore existing risks. No automatic method can replace human experience and wisdom. They have not bound margin of error too, but at least they do not pretend to hide existing risks.
Many customers/clients don't even want accurate estimates. Given the choice between an accurate estimate of $x, and a competing estimate of $0.75x with later surprises and deadline stress and renegotiations to pay another $0.35x for "phase 2" which gets the product up to what they originally wanted, especially when the business relationship has "bonded" in a way where it's all rah-rah, go-team, we're-in-this-together... clients will go for the latter path way more often than they should.
Part of the reason estimates are inaccurate is because there's that business disincentive to be accurate.
Hofstadter's Law: It always takes longer than you expect,
even when you take into account Hofstadter's Law.
The best advice I ever got on project-time estimation (from a biology postdoc) was: make your best, most honest best effort, and then double it.
When I make projections with a spreadsheet, I have a cell that copies my grand total of all costs and call that copy "unforeseen costs". I always hate bidding that high at the start, but the estimate ends up being close to right surprisingly often.
This article says 30% overruns are common, which is within my former boss' +100% bounds.
The other nice thing about doubling your cost estimate is it prevents you from catching the winner's curse and landing an overly-stingy client. Plus if you really can keep costs within your spec for the project, then you win extra profits. You'll never win that "game" if you don't leave room for error.
I think people should start using confidence intervals. Then the upper bounds become more realistic. If you need to estimate roughly with 90% condifence intervals, then a developer can communicate the uncertainity: I think task A will take about a week. At least 2 days. And no more than 3 months.
You can immediately see that it's probably best to either a) work with this tasks a few days and make a new estimate based on the acquired knowledge b) or if that's not possible, try to split the task to smaller subtasks to identify which parts are the most uncertain.
Doubling your estimate is a common rule of thumb that goes way back but over time if you keep learning from your previous estimates you should be able to become more accurate on average. There's another subtle thing going on is that many times when the estimate increases so does the actual time, a self fulfilling prophecy, hence Hofstadter's law...
I've been giving estimates for a decade, and still feel like I'm winging it every time. Planning poker definitely works, provided you understand the requirements, and by 'understand' i don't mean you have read a spec document, but that you understand the customer's business problem and have figured out how the proposed solution aims to solve it. Sadly most large projects don't have the time in their pre-sales estimation phase for the team that produces the estimate to build an understanding of the whole problem domain. Paradoxically this low confidence in the estimate will tempt the sales team to cut it even further, since they interpret uncertainty as a liberty to seek the low bound (or even lower).
As others have pointed out in those large projects it's better not to make up-front estimates and just build as much value as possible for a fixed cost, using agile principles. However, that's typically not how large software projects are sold (or bought). Fixed price almost always means fixed scope. I'd like to know of any large software project sold to a customer in truly agile fashion (no fixed scope determined in advance). To me it sounds like a software development unicorn: you hear about it, but you're never the one building it.
Often a technical realistic schedule is derived and presented, but the business side deems the project cost is too high and asks the schedule to be "optimized." The optimistic scenarios of the schedule is adapted and revised. Of course reality sets in when the project goes forward and it ends up taking as much time as predicted.
That's often what I see. An estimate roadmap is presented, management expresses that it wants it sooner, the roadmap is "shuffled" and "optimized", it is approved, yet reality still sets in during development :-)
IMHO it is quite easy to make a reliable estimation of well-planned project. However it is extremely difficult to plan the project more than one step ahead of what is already done... This is why agile development is so popular.
In general when under-estimating the project you can make it:
> IMHO it is quite easy to make a reliable estimation of well-planned project.
Evidence suggests otherwise. Sure you can estimate +/- 100-200% early on but that isn't what anyone is aiming for in a software project. Even detailed plans of repeatable (non-trivial) software projects do not result error bars that anyone really desires.
I don't know that I'd say it's easy, but it is certainly possible. The big takeaway from the article that I agree with is that historical data significantly improves estimates. If you know that e.g., the last 5 projects took an average of x weeks on the authentication layer, then it's likely that your project will take somewhere around the same time.
The problem is that most companies don't record this data. Start today!
My point was that it is the planning step that is extremely difficult, not the estimating one. With most real-world projects the project plan must follow changing requirements (based on external input or on things you have learned during development). It is extremely unlikely that the original plan will (or should) be followed to the end.
"A tendency toward underestimation of effort is particularly present in price-competitive situations, such as bidding rounds. In less price-competitive contexts, such as inhouse software development, there are no such tendencies - in fact, you might even see the opposite. This suggests that a main reason for effort overruns is that clients tend to focus on low price when selecting software providers - that is, the project proposals that underestimate effort are more likely to be started. "
Is that really correct? Are there studies that shows that inhouse projects (or not fixed-price projects) do not underestimate systematically as opposed to fixed-price client projects?
"Six to eight weeks" was the default estimate my project managers gave for anything above trivial. Long enough to make the task seem difficult, not too long to scare off the client.
Double it with each layer. Double your estimate for each individual task, then when you've got the whole Gantt chart built for the iteration, double that.
We have so much technical debt I can no longer estimate with any accuracy. Something that should take half a day takes a week. Most of the week is cleaning up all of the crap in the code, and then spend a couple hours writing the few lines of code that solve the business requirement.
4. T. Menzies and M. Shepperd, “Special Issue on Repeatable Results in Software Engineering Prediction,”
Empirical Software Eng., vol. 17, no. 1, 2012, pp. 1–17
That we don't know whether software development is subject to economy or diseconomy of scale stuck out to me.
Estimation cost (of doing the estimation, not consequences of estimation) not mentioned. Is estimation itself significantly costly relative to subject of estimation?
To the extent open source works relatively well as a development practice, how much of a role does suppression of estimation play (assuming there is suppression; harder to even pretend to hold anyone to an estimate without a contract, so why bother)?
Problem with estimates is that once there is an estimate the team can really stick to this estimate. Regardless of quality.
It is feasible for the team to claim that it met the estimate, and it is feasible to have all indicators green on the day the deadline is met. Simply do less design, less refactoring, less thinking, less tests, less collarborative work, less engineering...
Can Machine Learning (or NLP) ever help in estimating effort based on expected lines of code where the model would be trained upon similar applications/files that already exist? If so, is anyone researching this in academia or in any research lab?
> An implication of these observations is that clients can avoid effort overrun by being less price - and more competence - focused when selecting providers.
I never understood why software estimates are so bad when other fields do so well.
When a contractor gives you an estimate of how long and how much it's going to take to do a remodel, he's invariably on time and on schedule, right?
And when Boeing spends billions of dollars on a new plane, they have on ready and on budget, right?
So why can't software people do the same?
Oh wait, complex, badly defined projects tend to run late and over budget. It's not that complicated. Spend 3-6 months defining all the details of your new web application, promise not to change anything on the fly, don't ask us to make it work on IE 7, and by the time we do 3-4 of these, we'll be able to give you a good estimate.
An explanation I like, from michaelochurch: