Среда, 08.05.2024, 13:44
Информационные технологии и телекоммуникации.
Приветствую Вас Гость | RSS
Главная Statistics in SQL Регистрация Вход
Меню сайта

Статистика

Онлайн всего: 1
Гостей: 1
Пользователей: 0

Форма входа

Statistics in SQL

 

Statistics in SQL

 

SQL isn’t a statistical programming language, but you can use built-in functions and a few

tricks to calculate simple descriptive statistics such as the sum, mean, and standard deviation.

For more-sophisticated analyses you should use your DBMS’s OLAP (online analytical

processing) component or export your data to a dedicated statistical environment such as

Excel, R, SAS, or SPSS.

What you should not do is write statistical routines yourself in SQL or a host language.

Implementing statistical algorithms correctly—even simple ones—means understanding

trade-offs in efficiency (the space needed for arithmetic operations), stability (cancellation

of significant digits), and accuracy (handling pathologic sets of values). See, for example,

Ronald Thisted’s Elements of Statistical Computing (Chapman & Hall/CRC) or John

Monahan’s Numerical Methods of Statistics (Cambridge University Press).

You can get away with using small combinations of built-in SQL functions, such as

STDEV()/(SQRT(COUNT()) for the standard error of the mean, but don’t use complex SQL

expressions for correlations, regression, ANOVA (analysis of variance), or matrix arithmetic,

for example. Check your DBMS’s SQL and OLAP documentation to see which functions it

offers. Built-in functions aren’t portable, but they run far faster and more accurately than

equivalent query expressions.

The functions MIN() and MAX() calculate order statistics, which are values derived from a

dataset that’s been sorted (ordered) by size. Well-known order statistics include the trimmed

mean, rank, range, mode, and median. Chapter 15 covers the trimmed mean, rank, and median.

The range is the difference between the largest and smallest values: MAX(expr)-MIN(expr). The

mode is the value that appears most frequently. A dataset can have more than one mode. The

mode is a weak descriptive statistic because it’s not robust, meaning that it can be affected by

adding a small number or unusual or incorrect values to the dataset. This query finds the

mode of book prices in the sample database:

SELECT price, COUNT(*) AS frequency

FROM titles

GROUP BY price

HAVING COUNT(*) >= ALL(SELECT COUNT(*) FROM titles GROUP BY price);

price has two modes:

price frequency

————— —————————

12.99 2

19.95 2

Поиск

Календарь
«  Май 2024  »
ПнВтСрЧтПтСбВс
  12345
6789101112
13141516171819
20212223242526
2728293031

Друзья сайта
  • Официальный блог
  • Сообщество uCoz
  • FAQ по системе
  • Инструкции для uCoz

  • Copyright Vitalie Bancu - vitallybankou © 2024 Бесплатный конструктор сайтов - uCoz
     
     
    Besucherzahler senior people meet.com
    счетчик посещений