Difference between revisions of "Statistics/Content size"
Jump to navigation
Jump to search
(Created page with "TODO") |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | == Results == | |
+ | |||
+ | Last run on <code>2017-10-19 12:38:24 +0000</code>, querying the replica DB on somerset.internal.softwareheritage.org. | ||
+ | |||
+ | {| class="wikitable" | ||
+ | ! Min. | ||
+ | ! 1st Qu. | ||
+ | ! Median | ||
+ | ! Mean | ||
+ | ! 3rd Qu. | ||
+ | ! Max. | ||
+ | |- | ||
+ | | 2.000e+00 | ||
+ | | 1.052e+03 | ||
+ | | 3.387e+03 | ||
+ | | 8.011e+04 | ||
+ | | 1.247e+04 | ||
+ | | 1.314e+09 | ||
+ | |} | ||
+ | |||
+ | Unit is length in bytes of ''uncompressed'' contents. | ||
+ | |||
+ | Sample size: 3.837.249 contents. | ||
+ | |||
+ | [[File:Content_sizes.png]] | ||
+ | |||
+ | == Reproducibility == | ||
+ | |||
+ | SQL: | ||
+ | <pre> | ||
+ | SELECT length | ||
+ | FROM content TABLESAMPLE BERNOULLI (0.1); -- 0.1% of known blobs | ||
+ | </pre> | ||
+ | |||
+ | Shell: | ||
+ | <pre> | ||
+ | psql service=swh-replica -At -f content-size-1pct.sql | gzip -9c > content-size-1pct.txt.gz | ||
+ | </pre> | ||
+ | |||
+ | R: | ||
+ | <pre> | ||
+ | sizes <- scan(gzfile('content-size-1pct.txt.gz')) | ||
+ | |||
+ | length(sizes) | ||
+ | summary(sizes) | ||
+ | |||
+ | png('content_sizes.png') | ||
+ | hist(log10(sizes)) | ||
+ | dev.off() | ||
+ | </pre> |
Latest revision as of 15:13, 19 October 2017
Results
Last run on 2017-10-19 12:38:24 +0000
, querying the replica DB on somerset.internal.softwareheritage.org.
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
2.000e+00 | 1.052e+03 | 3.387e+03 | 8.011e+04 | 1.247e+04 | 1.314e+09 |
Unit is length in bytes of uncompressed contents.
Sample size: 3.837.249 contents.
Reproducibility
SQL:
SELECT length FROM content TABLESAMPLE BERNOULLI (0.1); -- 0.1% of known blobs
Shell:
psql service=swh-replica -At -f content-size-1pct.sql | gzip -9c > content-size-1pct.txt.gz
R:
sizes <- scan(gzfile('content-size-1pct.txt.gz')) length(sizes) summary(sizes) png('content_sizes.png') hist(log10(sizes)) dev.off()