Difference between revisions of "New machine setup"

From Software Heritage Wiki
Jump to navigation Jump to search
m
 
(8 intermediate revisions by the same user not shown)
Line 36: Line 36:
 
* reboot
 
* reboot
  
= Setting up a new Virtual Machine (manual process) =
+
= Setting up a new Virtual Machine (semi-manual process) =
 +
As a requisite step, clone the [https://forge.softwareheritage.org/diffusion/SPRE/repository/master/ sysadm-provisioning] repository.
  
Naming scheme: machine_name.<zone>.<hoster>.internal.softwareheritage.org.
+
== Naming scheme ==
 +
<machine-name>.(<zone>.:<hoster>).internal.softwareheritage.org.
  
* Provision the virtual machine from a Debian image ([https://forge.softwareheritage.org/diffusion/DSNIP/browse/master/olasd/azure/create-vm.sh Provisioning script example for azure])
+
The parenthesis denotes optional scheme.
** Sets a temporary admin user with an ssh key (the real setup will be installed through puppet later)
 
** Avoid public IPs if you don't need them
 
** Example:
 
  
    ./create-vm.sh worker01
+
== Modus operandi ==
 +
The modus operandi is as follows:
  
* Add the machine to the internal dns (swh-site commit and push + deploy the latest puppet recipes in the puppet master pergamon)
+
* Install infra's requisite dependencies (read per infra/cloud resources)
  
    ssh pergamon.internal.softwareheritage.org
+
* provision a vm from the dedicated infrastructure cloud provider
    sudo /usr/local/bin/deploy.sh
 
  
* Connect to the machine with the temp admin user
+
* bootstrap packages puppet dependencies on that vm
  
    ssh -i <public-key-used-during-provisioning> <user>@<new-vm>
+
* run puppet agent on that vm
  
* Update machine to the latest
+
* run puppet agent on the dns node
  
    apt-get update
+
== Example with azure ==
    apt-get dist-upgrade
+
First, Install [https://forge.softwareheritage.org/diffusion/SPRE/browse/master/azure/README.md$4 azure's requirements].
    apt-get autoremove --purge
 
  
* Set a root password (xckdpass, add to password store)
+
    cd /path/to/sysadm-provisioning
 
+
    # historic implementation detail (really use the following user)
     # generate password (for example)
+
     # using this user (uid 1000) permits to simplify steps down the
     xkcdpass --numwords=5 --delimiter=' ' --min=5 --max=6 --valid-chars='[a-z]'
+
     # line
      
+
    ADMIN_USER=zack
     # insert into swh's password store
+
    # Create the vm 'worker01' with type 'worker'
     cd /path/to/swh/credentials;
+
    # (other possible types: db, replica, <whatever>)
     pass git pull --rebase
+
    ./azure/create-vm.sh worker01 worker
     pass insert infra/<machine-name>/root
+
    # retrieve ip of the new vm
    pass git push
+
     # then copy the provision-vm.sh script to run there
 +
     # (this does as entertained earlier puppet bootstrap + run)
 +
     scp ./azure/provision-vm.sh $ADMIN_USER@<ip>:/tmp
 +
     ssh $ADMIN_USER@<ip> chmod +x /tmp/provision-vm.sh
 +
     ssh $ADMIN_USER@<ip> /tmp/provision-vm.sh public
 
      
 
      
* Allow root ssh password login (edit /etc/ssh/sshd_config and flip to yes the following options)
+
    # note that you could also connect to the node, install tmux, run a
 
+
     # tmux session, then trigger the script from within
    PermitRootLogin yes
 
    PasswordAuthentication yes
 
 
 
* Restart sshd service
 
 
 
    systemctl restart sshd.service
 
 
 
* In another shell, check the ssh connection with the root login works.
 
 
 
     ssh root@<new-vm>
 
 
 
* If connection ok, close the first connection with the temporary user.
 
  
* As root, remove temporary user (foo for the example)
+
After this, run the puppet agent on the dns server:
  
     deluser foo
+
     ssh <your-user>@pergamon.internal.softwareheritage.org sudo puppet agent --test
    rm -rf /home/foo
 
  
* Set the hostname to the appropriate one:
+
As always, the truth lies within the source code, details explained in
** Edit /etc/hostname: machine.zone.hoster (e.g. worker01.euwest.azure)
+
comments:
** Edit /etc/hosts: add {{<ip> machine.zone.hoster.internal.softwareheritage.org machine.zone.hoster}}} line
 
  
* reboot to get new hostname
+
* [https://forge.softwareheritage.org/diffusion/SPRE/browse/master/azure/create-vm.sh create-vm.sh]
  
    reboot
+
* [https://forge.softwareheritage.org/diffusion/SPRE/browse/master/azure/provision-vm.sh provision-vm.sh]
  
* connect as root again to the machine
+
= Troubleshoot =
  
    ssh root@<new-vm>
+
== Recreating machine with the same exact configuration ==
  
* install and setup puppet
+
It so happens that we could scratch and recreate the same machine.
 +
We then need to clean up on the puppet-master the old certificate (based on the machine's fqdn).
  
     apt-get install puppet
+
     puppet cert clean <fqdn>
    systemctl disable puppet
 
  
* Edit /etc/puppet/puppet.conf and add the following line in the [main] section
+
== Duplicate resource found error ==
  
    server=pergamon.internal.softwareheritage.org
+
For information, after a wrong manipulation (wrong hostname setup for example), you could end
 +
up having stale data in the puppet master (in puppetdb).
  
* run puppet agent:
+
You would end up with the puppet agent complaining about duplicate resources found, for example:
 +
    A duplicate resource was found while collecting exported resources
  
    puppet agent --enable
+
That means there exists some stale data in the master (puppetdb).
    # Add fact about its location (for example, with <vm_location> as "azure_euwest" in the following example)
+
Here is the command to clean those up.
    mkdir -p /etc/facter/facts.d/
 
    echo "location=<vm_location>" > /etc/facter/facts.d/location.txt
 
    # to check everything is ok (if we reuse an existing hostname vm, puppet may complain about certificate errors, and ask further actions, do as entertain)
 
    puppet agent --test --noop
 
    # when everything is fine then, actually apply the manifest
 
    puppet agent --test
 
  
* On the puppet master host (pergamon):
+
    puppet node deactivate <wrong-fqdn>
** run puppet to update munin server config
 
  
* reboot to check new services
 
* update clustershell configuration on louvre
 
  
 
[[Category:Infrastructure]]
 
[[Category:Infrastructure]]
 
[[Category:System administration]]
 
[[Category:System administration]]

Latest revision as of 08:27, 14 June 2018

Setting up a new Software Heritage desktop machine

Debian install

  • Stable
  • root w/temporary password; no regular user (after setting up root password, cancel twice and jump forward to clock settings)
  • full disk with LVM; reduce home LV to leave half of the disk free
  • Standard system utilities, ssh server, no desktop environment (puppet will install that)

Base system setup (from console)

  • Login as root
  • Enable password root access in ssh (/etc/ssh/sshd_config, PermitRootLogin yes)
  • Write down IP configuration and add the machine to the Gandi DNS
  • Test SSH login as root from your workstation
  • Stay at your desk :)

Full system setup (from your desk)

  • SSH login as root
  • Edit sources.list to add testing
  • apt-get update, dist-upgrade, autoremove --purge
    • While you wait, create Vpn certificates for the new machine
    • add the machine to the puppet configuration, in the swh_desktop role
  • apt-get install puppet openvpn
  • configure openvpn per Vpn
    • add pergamon IP address to /etc/resolv.conf
    • add louvre.softwareheritage.org to /etc/hosts
  • configure puppet
    • systemctl disable puppet
    • server=pergamon.internal.softwareheritage.org in /etc/puppet/puppet.conf
    • puppet agent --enable
    • puppet agent -t
    • run puppet on pergamon to update munin server config
  • set proper root password, add it to password store
  • reboot

Setting up a new Virtual Machine (semi-manual process)

As a requisite step, clone the sysadm-provisioning repository.

Naming scheme

<machine-name>.(<zone>.:<hoster>).internal.softwareheritage.org.

The parenthesis denotes optional scheme.

Modus operandi

The modus operandi is as follows:

  • Install infra's requisite dependencies (read per infra/cloud resources)
  • provision a vm from the dedicated infrastructure cloud provider
  • bootstrap packages puppet dependencies on that vm
  • run puppet agent on that vm
  • run puppet agent on the dns node

Example with azure

First, Install azure's requirements.

   cd /path/to/sysadm-provisioning
   # historic implementation detail (really use the following user)
   # using this user (uid 1000) permits to simplify steps down the
   # line
   ADMIN_USER=zack
   # Create the vm 'worker01' with type 'worker'
   # (other possible types: db, replica, <whatever>)
   ./azure/create-vm.sh worker01 worker
   # retrieve ip of the new vm
   # then copy the provision-vm.sh script to run there
   # (this does as entertained earlier puppet bootstrap + run)
   scp ./azure/provision-vm.sh $ADMIN_USER@<ip>:/tmp
   ssh $ADMIN_USER@<ip> chmod +x /tmp/provision-vm.sh
   ssh $ADMIN_USER@<ip> /tmp/provision-vm.sh public
   
   # note that you could also connect to the node, install tmux, run a
   # tmux session, then trigger the script from within

After this, run the puppet agent on the dns server:

   ssh <your-user>@pergamon.internal.softwareheritage.org sudo puppet agent --test

As always, the truth lies within the source code, details explained in comments:

Troubleshoot

Recreating machine with the same exact configuration

It so happens that we could scratch and recreate the same machine. We then need to clean up on the puppet-master the old certificate (based on the machine's fqdn).

   puppet cert clean <fqdn>

Duplicate resource found error

For information, after a wrong manipulation (wrong hostname setup for example), you could end up having stale data in the puppet master (in puppetdb).

You would end up with the puppet agent complaining about duplicate resources found, for example:

   A duplicate resource was found while collecting exported resources

That means there exists some stale data in the master (puppetdb). Here is the command to clean those up.

   puppet node deactivate <wrong-fqdn>