What's so special about standard deviation?












33












$begingroup$


Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have




  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?


What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?










share|cite|improve this question











$endgroup$








  • 6




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    yesterday








  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    20 hours ago










  • $begingroup$
    @MarkViola And? Variance can be generalized, therefore it's meaningful?
    $endgroup$
    – Jack M
    3 hours ago






  • 1




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    2 hours ago












  • $begingroup$
    @Winther: That makes sense. What I don't understand is why it normalizes values (eg covariance), and what is its geometric interpretation in one dimension (I undestand the vector space approach given by other answers)
    $endgroup$
    – blue_note
    1 hour ago
















33












$begingroup$


Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have




  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?


What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?










share|cite|improve this question











$endgroup$








  • 6




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    yesterday








  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    20 hours ago










  • $begingroup$
    @MarkViola And? Variance can be generalized, therefore it's meaningful?
    $endgroup$
    – Jack M
    3 hours ago






  • 1




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    2 hours ago












  • $begingroup$
    @Winther: That makes sense. What I don't understand is why it normalizes values (eg covariance), and what is its geometric interpretation in one dimension (I undestand the vector space approach given by other answers)
    $endgroup$
    – blue_note
    1 hour ago














33












33








33


20



$begingroup$


Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have




  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?


What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?










share|cite|improve this question











$endgroup$




Equivalently, about variance?



I realize it measures the spread of a distribution, but many other metrics could do the same (e.g., the average absolute deviation). What is its deeper significance? Does it have




  • a particular geometric interpretation (in the sense, e.g., that the mean is the balancing point of a distribution)?

  • any other intuitive interpretation that differentiates it from other possible measures of spread?


What's so special about it that makes it act as a normalizing factor in all sorts of situations (for example, convert covariance to correlation)?







statistics






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 15 hours ago









amWhy

192k28225439




192k28225439










asked yesterday









blue_noteblue_note

34928




34928








  • 6




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    yesterday








  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    20 hours ago










  • $begingroup$
    @MarkViola And? Variance can be generalized, therefore it's meaningful?
    $endgroup$
    – Jack M
    3 hours ago






  • 1




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    2 hours ago












  • $begingroup$
    @Winther: That makes sense. What I don't understand is why it normalizes values (eg covariance), and what is its geometric interpretation in one dimension (I undestand the vector space approach given by other answers)
    $endgroup$
    – blue_note
    1 hour ago














  • 6




    $begingroup$
    Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
    $endgroup$
    – Mark Viola
    yesterday








  • 3




    $begingroup$
    Possible duplicate of Intuition behind Variance forumla
    $endgroup$
    – Michael Hoppe
    20 hours ago










  • $begingroup$
    @MarkViola And? Variance can be generalized, therefore it's meaningful?
    $endgroup$
    – Jack M
    3 hours ago






  • 1




    $begingroup$
    The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
    $endgroup$
    – Winther
    2 hours ago












  • $begingroup$
    @Winther: That makes sense. What I don't understand is why it normalizes values (eg covariance), and what is its geometric interpretation in one dimension (I undestand the vector space approach given by other answers)
    $endgroup$
    – blue_note
    1 hour ago








6




6




$begingroup$
Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
$endgroup$
– Mark Viola
yesterday






$begingroup$
Have you heard the term "moment?" The variance is the second moment about the mean. See HERE
$endgroup$
– Mark Viola
yesterday






3




3




$begingroup$
Possible duplicate of Intuition behind Variance forumla
$endgroup$
– Michael Hoppe
20 hours ago




$begingroup$
Possible duplicate of Intuition behind Variance forumla
$endgroup$
– Michael Hoppe
20 hours ago












$begingroup$
@MarkViola And? Variance can be generalized, therefore it's meaningful?
$endgroup$
– Jack M
3 hours ago




$begingroup$
@MarkViola And? Variance can be generalized, therefore it's meaningful?
$endgroup$
– Jack M
3 hours ago




1




1




$begingroup$
The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
$endgroup$
– Winther
2 hours ago






$begingroup$
The absolute value deviation is a perfectly valid measure of deviation. However absolute values are very hard to work with analytically, squares are much easier. That's one answer: calculabillity.
$endgroup$
– Winther
2 hours ago














$begingroup$
@Winther: That makes sense. What I don't understand is why it normalizes values (eg covariance), and what is its geometric interpretation in one dimension (I undestand the vector space approach given by other answers)
$endgroup$
– blue_note
1 hour ago




$begingroup$
@Winther: That makes sense. What I don't understand is why it normalizes values (eg covariance), and what is its geometric interpretation in one dimension (I undestand the vector space approach given by other answers)
$endgroup$
– blue_note
1 hour ago










5 Answers
5






active

oldest

votes


















41












$begingroup$

There's a very nice geometric interpretation.



Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






share|cite|improve this answer









$endgroup$









  • 8




    $begingroup$
    Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
    $endgroup$
    – blue_note
    yesterday






  • 2




    $begingroup$
    @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
    $endgroup$
    – J.G.
    yesterday








  • 2




    $begingroup$
    Can someone provide a concrete example or other similar dumbing down of this answer?
    $endgroup$
    – user1717828
    yesterday








  • 1




    $begingroup$
    A paragraph on wikipedia about it @blue_note
    $endgroup$
    – WorldSEnder
    yesterday












  • $begingroup$
    For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
    $endgroup$
    – James Martin
    19 hours ago



















6












$begingroup$

I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



$$frac{overline{X} - mu}{frac{sigma}{sqrt{n}}}$$



is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Related question to this: The role of variance in Central Limit Theorem
    $endgroup$
    – Winther
    1 hour ago



















3












$begingroup$

An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



(This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
    $endgroup$
    – blue_note
    yesterday



















1












$begingroup$

When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






share|cite|improve this answer








New contributor




Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
    $endgroup$
    – mephistolotl
    yesterday



















1












$begingroup$

The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbb{R}$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






share|cite|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071367%2fwhats-so-special-about-standard-deviation%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    41












    $begingroup$

    There's a very nice geometric interpretation.



    Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



    Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






    share|cite|improve this answer









    $endgroup$









    • 8




      $begingroup$
      Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
      $endgroup$
      – blue_note
      yesterday






    • 2




      $begingroup$
      @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
      $endgroup$
      – J.G.
      yesterday








    • 2




      $begingroup$
      Can someone provide a concrete example or other similar dumbing down of this answer?
      $endgroup$
      – user1717828
      yesterday








    • 1




      $begingroup$
      A paragraph on wikipedia about it @blue_note
      $endgroup$
      – WorldSEnder
      yesterday












    • $begingroup$
      For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
      $endgroup$
      – James Martin
      19 hours ago
















    41












    $begingroup$

    There's a very nice geometric interpretation.



    Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



    Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






    share|cite|improve this answer









    $endgroup$









    • 8




      $begingroup$
      Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
      $endgroup$
      – blue_note
      yesterday






    • 2




      $begingroup$
      @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
      $endgroup$
      – J.G.
      yesterday








    • 2




      $begingroup$
      Can someone provide a concrete example or other similar dumbing down of this answer?
      $endgroup$
      – user1717828
      yesterday








    • 1




      $begingroup$
      A paragraph on wikipedia about it @blue_note
      $endgroup$
      – WorldSEnder
      yesterday












    • $begingroup$
      For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
      $endgroup$
      – James Martin
      19 hours ago














    41












    41








    41





    $begingroup$

    There's a very nice geometric interpretation.



    Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



    Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.






    share|cite|improve this answer









    $endgroup$



    There's a very nice geometric interpretation.



    Random variables of finite mean form a vector space. Covariance is a useful inner product on that space. Oh, wait, that's not quite right: constant variables are orthogonal to themselves in this product, so it's only positive semi-definite. So, let me be more precise - on the quotient space formed by the equivalence relation "differs from by a constant", covariance is a true inner product. (If quotient spaces are an unfamiliar concept, just focus on the vector space of zero-mean variables; it gets you the same outcome in this context.)



    Right, let's carry on. In the norm this inner product induces, standard deviation is a variable's length, while the correlation coefficient between two variables (their covariance divided by the product of their standard deviations) is the cosine of the "angle" between them. That the correlation coefficient is in $[-1,,1]$ is then a restatement of the vector space's Cauchy-Schwarz inequality.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered yesterday









    J.G.J.G.

    23.8k22538




    23.8k22538








    • 8




      $begingroup$
      Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
      $endgroup$
      – blue_note
      yesterday






    • 2




      $begingroup$
      @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
      $endgroup$
      – J.G.
      yesterday








    • 2




      $begingroup$
      Can someone provide a concrete example or other similar dumbing down of this answer?
      $endgroup$
      – user1717828
      yesterday








    • 1




      $begingroup$
      A paragraph on wikipedia about it @blue_note
      $endgroup$
      – WorldSEnder
      yesterday












    • $begingroup$
      For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
      $endgroup$
      – James Martin
      19 hours ago














    • 8




      $begingroup$
      Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
      $endgroup$
      – blue_note
      yesterday






    • 2




      $begingroup$
      @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
      $endgroup$
      – J.G.
      yesterday








    • 2




      $begingroup$
      Can someone provide a concrete example or other similar dumbing down of this answer?
      $endgroup$
      – user1717828
      yesterday








    • 1




      $begingroup$
      A paragraph on wikipedia about it @blue_note
      $endgroup$
      – WorldSEnder
      yesterday












    • $begingroup$
      For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
      $endgroup$
      – James Martin
      19 hours ago








    8




    8




    $begingroup$
    Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
    $endgroup$
    – blue_note
    yesterday




    $begingroup$
    Interesting approach. Is it a personal interpretation or a standard one? If it's standard, are there any resources you can provide? I haven't seen it in any book...
    $endgroup$
    – blue_note
    yesterday




    2




    2




    $begingroup$
    @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
    $endgroup$
    – J.G.
    yesterday






    $begingroup$
    @blue_note You're most likely to encounter it in a discussion of regression, since regressing $Y$ against $X$ writes $Y$ as a multiple of $X$, plus a variable orthogonal to $X$ in this sense. In fact, the coefficients involved in such an expression square to the proportion of variance explained. This has a well-understood connection to probability in quantum mechanics. But really, any source that explains why there's a $^2$ in $R^2$ will at least hint at these ideas.
    $endgroup$
    – J.G.
    yesterday






    2




    2




    $begingroup$
    Can someone provide a concrete example or other similar dumbing down of this answer?
    $endgroup$
    – user1717828
    yesterday






    $begingroup$
    Can someone provide a concrete example or other similar dumbing down of this answer?
    $endgroup$
    – user1717828
    yesterday






    1




    1




    $begingroup$
    A paragraph on wikipedia about it @blue_note
    $endgroup$
    – WorldSEnder
    yesterday






    $begingroup$
    A paragraph on wikipedia about it @blue_note
    $endgroup$
    – WorldSEnder
    yesterday














    $begingroup$
    For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
    $endgroup$
    – James Martin
    19 hours ago




    $begingroup$
    For this inner product to be properly defined everywhere, we perhaps need to restrict to the space of finite-variance random variables, rather than just to those which have finite (or zero) mean?
    $endgroup$
    – James Martin
    19 hours ago











    6












    $begingroup$

    I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



    $$frac{overline{X} - mu}{frac{sigma}{sqrt{n}}}$$



    is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






    share|cite|improve this answer









    $endgroup$













    • $begingroup$
      Related question to this: The role of variance in Central Limit Theorem
      $endgroup$
      – Winther
      1 hour ago
















    6












    $begingroup$

    I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



    $$frac{overline{X} - mu}{frac{sigma}{sqrt{n}}}$$



    is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






    share|cite|improve this answer









    $endgroup$













    • $begingroup$
      Related question to this: The role of variance in Central Limit Theorem
      $endgroup$
      – Winther
      1 hour ago














    6












    6








    6





    $begingroup$

    I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



    $$frac{overline{X} - mu}{frac{sigma}{sqrt{n}}}$$



    is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.






    share|cite|improve this answer









    $endgroup$



    I take it as unproblematic that the standard deviation is important in the normal distribution since the standard deviation (or variance) is one of its parameters (though it could doubtless be reparameterized in various ways). By the Central Limit Theorem, the normal distribution is in turn relevant for understanding just about any distribution: If $X$ is a normal variable with mean $mu$ and standard deviation $sigma$, then for large $n$



    $$frac{overline{X} - mu}{frac{sigma}{sqrt{n}}}$$



    is approximately standard normal. No other measure of dispersion can so relate $X$ with the normal distribution. Said simply, the Central Limit Theorem in and of itself guarantees that the standard deviation plays a prominent role in statistics.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered 23 hours ago









    John ColemanJohn Coleman

    3,79811223




    3,79811223












    • $begingroup$
      Related question to this: The role of variance in Central Limit Theorem
      $endgroup$
      – Winther
      1 hour ago


















    • $begingroup$
      Related question to this: The role of variance in Central Limit Theorem
      $endgroup$
      – Winther
      1 hour ago
















    $begingroup$
    Related question to this: The role of variance in Central Limit Theorem
    $endgroup$
    – Winther
    1 hour ago




    $begingroup$
    Related question to this: The role of variance in Central Limit Theorem
    $endgroup$
    – Winther
    1 hour ago











    3












    $begingroup$

    An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



    (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






    share|cite|improve this answer









    $endgroup$









    • 1




      $begingroup$
      Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
      $endgroup$
      – blue_note
      yesterday
















    3












    $begingroup$

    An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



    (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






    share|cite|improve this answer









    $endgroup$









    • 1




      $begingroup$
      Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
      $endgroup$
      – blue_note
      yesterday














    3












    3








    3





    $begingroup$

    An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



    (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)






    share|cite|improve this answer









    $endgroup$



    An interesting feature of the standard deviation is its connection to the (root) mean square error. This measures how well a predictor does in predicting the values. The root mean square error of using the mean as a predictor is the standard deviation, and this is the least root mean square error that you can get with a constant predictor.



    (This, of course, shifts the question to why the root mean squared error is interesting. I find it a bit more intuitive than the standard deviation, though: you can see it as the $L_2$ norm of the error vector, corrected for the number of points.)







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered yesterday









    Anton GolovAnton Golov

    273111




    273111








    • 1




      $begingroup$
      Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
      $endgroup$
      – blue_note
      yesterday














    • 1




      $begingroup$
      Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
      $endgroup$
      – blue_note
      yesterday








    1




    1




    $begingroup$
    Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
    $endgroup$
    – blue_note
    yesterday




    $begingroup$
    Good point. However, it indeed shifts the question. Although I can see that in a vector space, in a standard 2D plot of (X, Y) pairs I can see what the variance is on the eg. horizontal axis
    $endgroup$
    – blue_note
    yesterday











    1












    $begingroup$

    When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



    If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






    share|cite|improve this answer








    New contributor




    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













    • $begingroup$
      If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
      $endgroup$
      – mephistolotl
      yesterday
















    1












    $begingroup$

    When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



    If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






    share|cite|improve this answer








    New contributor




    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













    • $begingroup$
      If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
      $endgroup$
      – mephistolotl
      yesterday














    1












    1








    1





    $begingroup$

    When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



    If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.






    share|cite|improve this answer








    New contributor




    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$



    When defining "standard deviation", we want some way to take a bunch of deviations from a mean and quantify how big they typically are using a single number in the same units as the deviations themselves. But any definition of "standard deviation" induces a corresponding definition of "mean" because we want our choice of "mean" to always minimize the value of our "standard deviation" (intuitively, we want to define "mean" to be the "middlemost" point as measured by "standard deviation"). Only by defining "standard deviation" in the usual way do we recover the arithmetic mean while still having a measure in the right units. (Without getting into details, the key point is that the quadratic becomes linear when we take the derivative to find its critical point.)



    If we want to use some other mean, we can of course find a different "standard deviation" that will match that mean (the progress is somewhat analogous to integration), but in practice it's just easier to transform the data so that the arithmetic mean is appropriate.







    share|cite|improve this answer








    New contributor




    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    share|cite|improve this answer



    share|cite|improve this answer






    New contributor




    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    answered yesterday









    QwertyQwerty

    111




    111




    New contributor




    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    New contributor





    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    Qwerty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.












    • $begingroup$
      If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
      $endgroup$
      – mephistolotl
      yesterday


















    • $begingroup$
      If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
      $endgroup$
      – mephistolotl
      yesterday
















    $begingroup$
    If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
    $endgroup$
    – mephistolotl
    yesterday




    $begingroup$
    If all you want is to minimization at the mean and the right units, why not sum/integrate the magnitude of the deviations?
    $endgroup$
    – mephistolotl
    yesterday











    1












    $begingroup$

    The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbb{R}$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



    I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






    share|cite|improve this answer









    $endgroup$


















      1












      $begingroup$

      The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbb{R}$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



      I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






      share|cite|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbb{R}$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



        I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).






        share|cite|improve this answer









        $endgroup$



        The normal distribution has maximum entropy among real distributions supported on $(-infty, infty)$ with specified standard deviation (equivalently, variance). (Reference.) Consequently, if the only thing you know about a real distribution supported on $mathbb{R}$ is its mean and variance, the distribution that presumes the least prior information is the normal distribution.



        I don't tend to think of the statement above as the important fact. It's more: normal distributions appear frequently and knowing the location parameter (mean) is reasonable. So what else do I have to know to make the least presumptive model be the normal distribution? The dispersion (variance).







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 7 hours ago









        Eric TowersEric Towers

        32.3k22267




        32.3k22267






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071367%2fwhats-so-special-about-standard-deviation%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wolfgang Unzicker

            Unua mondmilito

            Schloss Hohenburg (Lenggries)