https://wiki.softwareheritage.org/index.php?title=Ceph_Cluster&feed=atom&action=historyCeph Cluster - Revision history2024-03-29T15:28:05ZRevision history for this page on the wikiMediaWiki 1.31.16https://wiki.softwareheritage.org/index.php?title=Ceph_Cluster&diff=821&oldid=prevArdumont: /* OSDs */2018-05-24T06:59:16Z<p><span dir="auto"><span class="autocomment">OSDs</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 06:59, 24 May 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l25" >Line 25:</td>
<td colspan="2" class="diff-lineno">Line 25:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== OSDs ===</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== OSDs ===</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>OSDs (Object Storage Daemons) are the machines whose role is to store the actual data. You would typically set up a machine with a bunch of disks, create one partition on each disk and run one OSD on each of those partitions. You do not need to set up OS-level <del class="diffchange diffchange-inline">redunancy </del>as the Ceph redundancy settings will handle everything, from disk failure up to datacenter failure, in a transparent way, rebalancing and re-replicating data as needed to keep the replication constraints satisfied.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>OSDs (Object Storage Daemons) are the machines whose role is to store the actual data. You would typically set up a machine with a bunch of disks, create one partition on each disk and run one OSD on each of those partitions. You do not need to set up OS-level <ins class="diffchange diffchange-inline">redundancy </ins>as the Ceph redundancy settings will handle everything, from disk failure up to datacenter failure, in a transparent way, rebalancing and re-replicating data as needed to keep the replication constraints satisfied.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Monitors ===</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Monitors ===</div></td></tr>
</table>Ardumonthttps://wiki.softwareheritage.org/index.php?title=Ceph_Cluster&diff=161&oldid=prevStefanoZacchiroli: 1 revision: import public pages from the intranet wiki2016-07-20T13:02:34Z<p>1 revision: import public pages from the intranet wiki</p>
<table class="diff diff-contentalign-left" data-mw="interface">
<tr class="diff-title" lang="en">
<td colspan="1" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="1" style="background-color: #fff; color: #222; text-align: center;">Revision as of 13:02, 20 July 2016</td>
</tr><tr><td colspan="2" class="diff-notice" lang="en"><div class="mw-diff-empty">(No difference)</div>
</td></tr></table>StefanoZacchirolihttps://wiki.softwareheritage.org/index.php?title=Ceph_Cluster&diff=160&oldid=prevStefanoZacchiroli: /* OSD sizing */2016-05-04T10:25:08Z<p><span dir="auto"><span class="autocomment">OSD sizing</span></span></p>
<p><b>New page</b></p><div><br />
# TODO: expand this section to be less handwavey.<br />
<br />
[http://ceph.com/ Ceph] is a massively scale-out distributed object storage system. It provides transparent replication of objects across disks and storage nodes, load balancing for reads, and a set of features that would be very useful in the context of Software Heritage (for instance, the possibility to federate several clusters distributed on several data centers).<br />
<br />
Ceph provides a low-level object storage layer through the RADOS library, as well as more high level APIs such as a Swift/S3 compatibility layer, a block device storage layer, as well as a filesystem layer. Considering the Software Heritage abstraction level, the RADOS object storage layer should be sufficient for our needs.<br />
<br />
We're investigating using ceph as an object store for the Software Heritage data.<br />
<br />
= Dimensioning a Ceph cluster =<br />
<br />
Ceph clusters use two components: monitors and OSDs.<br />
== Machine types and jargon ==<br />
<br />
=== node ===<br />
<br />
A node is a physical machine, that each can run one or more of the ceph storage daemons<br />
<br />
=== rack ===<br />
<br />
A rack is a set of nodes interconnected with an access and a replication network<br />
<br />
=== datacenter ===<br />
A datacenter is a set of racks interconnected with their access and replication networks.<br />
<br />
=== OSDs ===<br />
OSDs (Object Storage Daemons) are the machines whose role is to store the actual data. You would typically set up a machine with a bunch of disks, create one partition on each disk and run one OSD on each of those partitions. You do not need to set up OS-level redunancy as the Ceph redundancy settings will handle everything, from disk failure up to datacenter failure, in a transparent way, rebalancing and re-replicating data as needed to keep the replication constraints satisfied.<br />
<br />
=== Monitors ===<br />
<br />
Monitors keep the state of the cluster and orchestrate the OSDs. There can (and should !) be several monitors per cluster to be able to tolerate failures at all levels of node distribution. However, monitors aren't central machines to the cluster as reads and writes are directly dispatched from the clients to the OSDs.<br />
<br />
== Assumptions ==<br />
<br />
For this cluster dimensioning, we aim for 800TB usable storage, with every object is replicated 3 times: capacity to withstand the catastrophic loss of two nodes simultaneously. The best practice is to keep the cluster at around 70% utilization, and to scale it out by adding nodes regularly when usage increases. <br />
<br />
Note that this approach also works while bootstrapping the cluster: you only really need three nodes (one monitor and two OSDs) to bootstrap the cluster, and then you can add machines as you go.<br />
<br />
== OSD sizing ==<br />
<br />
OSDs need about 100MB RAM / TB of storage in typical cases, but extreme situations can require up to 2GB. We therefore should consider 2GB RAM / 1TB storage. CPU on those machines is not relevant (well, they need one).<br />
<br />
Here are some reasonable options considering Dell offerings, and the disks we already have available:<br />
<br />
{|<br />
|-<br />
! Base server<br />
! Chassis settings<br />
! RAM<br />
! Number of disks bought<br />
! Max storage capacity (with recycling)<br />
! Price<br />
! Number needed for 800TB capacity<br />
|-<br />
| PowerEdge R730xd<br />
| 12 x 3.5" disks<br />
| 192 GB (max 96TB disks)<br />
| 6 x 8TB 3.5" (= 48 TB)<br />
| 6 x 8TB + 6 x 6TB (= 84 TB)<br />
| 6994 EUR HT<br />
| 25<br />
|-<br />
| PowerEdge R730xd<br />
| 24 x 2.5" disks<br />
| 128 GB (max 64TB disks)<br />
| 12 x 2TB 2.5" (= 24 TB)<br />
| add 12 x 2TB (= 48 TB)<br />
| 5500 EUR HT<br />
| 50<br />
|}<br />
<br />
Total: '''~450 KEUR'''<br />
<br />
Assuming linear scaling in both redundancy and size ('''TO BE VERIFIED''', including disk reuse considerations):<br />
<br />
* 800 TB with 2 copies: ~300 KEUR<br />
* 300 TB with 3 copies: ~168 KEUR<br />
* 300 TB with 2 copies: ~110 KEUR<br />
<br />
== Monitor sizing ==<br />
<br />
TBD<br />
<br />
<br />
[[Category:Infrastructure]]<br />
[[Category:System administration]]</div>StefanoZacchiroli