add manpage for mscp

doc/mscp.rst is generate from mscp.1 by make generate-mscp-rst.
README is also updateded to reference doc/mscp.rst.
This commit is contained in:
Ryo Nakamura
2024-01-13 19:06:56 +09:00
parent 6f4038a480
commit 1479607efe
5 changed files with 569 additions and 107 deletions

View File

@@ -142,6 +142,21 @@ target_compile_options(mscp PRIVATE ${MSCP_COMPILE_OPTS})
install(TARGETS mscp RUNTIME DESTINATION bin)
# mscp manpage and document
configure_file(
${mscp_SOURCE_DIR}/doc/mscp.1.in
${PROJECT_BINARY_DIR}/mscp.1)
add_custom_target(update-mscp-rst
COMMENT "Update doc/mscp.rst from mscp.1.in"
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
COMMAND
pandoc -s -f man mscp.1 -t rst -o ${PROJECT_SOURCE_DIR}/doc/mscp.rst)
install(FILES ${PROJECT_BINARY_DIR}/mscp.1
DESTINATION ${CMAKE_INSTALL_MANDIR}/man1)
# Test
add_test(NAME pytest
COMMAND python3 -m pytest -v

124
README.md
View File

@@ -3,17 +3,20 @@
[![build on ubuntu](https://github.com/upa/mscp/actions/workflows/build-ubuntu.yml/badge.svg)](https://github.com/upa/mscp/actions/workflows/build-ubuntu.yml) [![build on macOS](https://github.com/upa/mscp/actions/workflows/build-macos.yml/badge.svg)](https://github.com/upa/mscp/actions/workflows/build-macos.yml) [![test](https://github.com/upa/mscp/actions/workflows/test.yml/badge.svg)](https://github.com/upa/mscp/actions/workflows/test.yml)
`mscp`, a variant of `scp`, copies files over multiple ssh (SFTP)
connections. Multiple threads and connections in mscp transfer (1)
multiple files simultaneously and (2) a large file in parallel. It
would shorten the waiting time for transferring a lot of/large files
over networks.
`mscp`, a variant of `scp`, copies files over multiple SSH (SFTP)
connections by multiple threads. It enables transferring (1) multiple
files simultaneously and (2) a large file in parallel, reducing the
transfer time for a lot of/large files over networks.
You can use `mscp` like `scp`, for example, `mscp
user@example.com:srcfile /tmp/dstfile`. Remote hosts only need to run
standard `sshd` supporting the SFTP subsystem (e.g. openssh-server),
and you need to be able to ssh to the hosts as usual. `mscp` does not
require anything else.
You can use `mscp` like `scp`, for example:
```shell-session
$ mscp user@example.com:srcfile /tmp/dstfile
```
Remote hosts only need to run standard `sshd` supporting the SFTP
subsystem (e.g. openssh-server), and you need to be able to ssh to the
hosts as usual. `mscp` does not require anything else.
https://github.com/upa/mscp/assets/184632/19230f57-be7f-4ef0-98dd-cb4c460f570d
@@ -62,7 +65,7 @@ chmod 755 /usr/local/bin/mscp
## Build
mscp depends on a patched [libssh](https://www.libssh.org/). The
mscp depends on a patched [libssh](https://www.libssh.org/). The
patch introduces asynchronous SFTP Write, which is derived from
https://github.com/limes-datentechnik-gmbh/libssh (see [Re: SFTP Write
async](https://archive.libssh.org/libssh/2020-06/0000004.html)).
@@ -94,105 +97,12 @@ make
# install the mscp binary to CMAKE_INSTALL_PREFIX/bin (usually /usr/local/bin)
make install
```
Source tar balls (`mscp-X.X.X.tar.gz`, not `Source code`) in
[Releases page](https://github.com/upa/mscp/releases) contains the patched version
of libssh. So you can start from cmake with it.
## Run
- Usage
## Documentation
```console
$ mscp
mscp v0.0.8: copy files over multiple ssh connections
Usage: mscp [vqDHdNh] [-n nr_conns] [-m coremask] [-u max_startups]
[-s min_chunk_sz] [-S max_chunk_sz] [-a nr_ahead] [-b buf_sz]
[-l login_name] [-p port] [-i identity_file]
[-c cipher_spec] [-M hmac_spec] [-C compress] source ... target
```
- Example: copy a 15GB file on memory over a 100Gbps link
- Two Intel Xeon Gold 6130 machines directly connected with Intel E810 100Gbps NICs.
- Default `openssh-server` runs on the remote host.
```console
$ mscp /var/ram/test.img 10.0.0.1:/var/ram/
[======================================] 100% 15GB/15GB 1.7GB/s 00:00 ETA
```
```console
# with some optimizations. top speed reaches 3.0GB/s.
$ mscp -n 5 -m 0x1f -c aes128-gcm@openssh.com /var/ram/test.img 10.0.0.1:/var/ram/
[======================================] 100% 15GB/15GB 2.4GB/s 00:00 ETA
```
- `-v` option increments verbose output level.
```console
$ mscp test 10.0.0.1:
[=======================================] 100% 49B /49B 198.8B/s 00:00 ETA
```
```console
$ mscp -vv test 10.0.0.1:
file: test/test1 -> ./test/test1
file: test/testdir/asdf -> ./test/testdir/asdf
file: test/testdir/qwer -> ./test/testdir/qwer
file: test/test2 -> ./test/test2
we have only 4 chunk(s). set number of connections to 4
connecting to localhost for a copy thread...
connecting to localhost for a copy thread...
connecting to localhost for a copy thread...
copy start: test/test1
copy start: test/test2
copy start: test/testdir/asdf
copy start: test/testdir/qwer
copy done: test/test1
copy done: test/test2
copy done: test/testdir/qwer
copy done: test/testdir/asdf
[=======================================] 100% 49B /49B 198.1B/s 00:00 ETA
```
- Full usage
```console
$ mscp -h
mscp v0.0.9-11-g5802679: copy files over multiple ssh connections
Usage: mscp [vqDHdNh] [-n nr_conns] [-m coremask] [-u max_startups]
[-s min_chunk_sz] [-S max_chunk_sz] [-a nr_ahead] [-b buf_sz]
[-l login_name] [-p port] [-F ssh_config] [-i identity_file]
[-c cipher_spec] [-M hmac_spec] [-C compress] source ... target
-n NR_CONNECTIONS number of connections (default: floor(log(cores)*2)+1)
-m COREMASK hex value to specify cores where threads pinned
-u MAX_STARTUPS number of concurrent outgoing connections (default: 8)
-s MIN_CHUNK_SIZE min chunk size (default: 64MB)
-S MAX_CHUNK_SIZE max chunk size (default: filesize/nr_conn)
-a NR_AHEAD number of inflight SFTP commands (default: 32)
-b BUF_SZ buffer size for i/o and transfer
-v increment verbose output level
-q disable output
-D dry run. check copy destinations with -vvv
-r no effect
-l LOGIN_NAME login name
-p PORT port number
-F CONFIG path to user ssh config (default ~/.ssh/config)
-i IDENTITY identity file for public key authentication
-c CIPHER cipher spec
-M HMAC hmac spec
-C COMPRESS enable compression: yes, no, zlib, zlib@openssh.com
-H disable hostkey check
-d increment ssh debug output level
-N enable Nagle's algorithm (default disabled)
-h print this help
```
Note: mscp is still under development, and the author is not
responsible for any accidents due to mscp.
[manpage](/doc/mscp.rst) is available.

11
doc/README.md Normal file
View File

@@ -0,0 +1,11 @@
# Document
The base file of documents is `mscp.1.in`. The manpage of mscp and
`doc/mscp.rst` are generated from `mscp.1.in`.
When `mscp.1.in` is changed, update `doc/mscp.rst` by:
1. `cd build`
2. `cmake ..`
3. `make update-mscp-rst`

316
doc/mscp.1.in Normal file
View File

@@ -0,0 +1,316 @@
.TH MSCP 1 "@MSCP_BUILD_VERSION@" "mscp" "User Commands"
.SH NAME
mscp \- copy files over multiple SSH connections
.SH SYNOPSIS
.B mscp
.RB [ \-vqDHdNh ]
[\c
.BI \-n \ NR_CONNECTIONS\c
]
[\c
.BI \-m \ COREMASK\c
]
[\c
.BI \-u \ MAX_STARTUPS\c
]
[\c
.BI \-I \ INTERVAL\c
]
[\c
.BI \-s \ MIN_CHUNK_SIZE\c
]
[\c
.BI \-S \ MAX_CHUNK_SIZE\c
]
[\c
.BI \-a \ NR_AHEAD\c
]
[\c
.BI \-b \ BUF_SIZE\c
]
[\c
.BI \-l \ LOGIN_NAME\c
]
[\c
.BR \-p |\c
.BI \-P \ PORT\c
]
[\c
.BI \-F \ CONFIG\c
]
[\c
.BI \-i \ IDENTITY\c
]
[\c
.BI \-c \ CIPHER\c
]
[\c
.BI \-M \ HMAC\c
]
[\c
.BI \-C \ COMPRESS\c
]
.I source ... target
.SH DESCRIPTION
.PP
.B mscp
copies files over multiple SSH (SFTP) connections by multiple
threads. It enables transferring (1) multiple files simultaneously and
(2) a large file in parallel, reducing the transfer time for a lot
of/large files over networks.
.PP
The usage of
.B mscp
imitates the
.B scp
command of
.I OpenSSH,
for example:
.nf
$ mscp srcfile user@example.com:dstfile
.fi
Remote hosts only need to run standard
.B sshd
supporting the SFTP subsystem, and users need to be able to
.B ssh
to the hosts as usual.
.B mscp
does not require anything else.
.PP
.B mscp
uses
.UR https://\:www\:.libssh\:.org
libssh
.UE
as its SSH implementation. Thus, supported SSH features, for example,
authentication, encryption, and various options in ssh_config, follow
what
.I libssh
supports.
.SH OPTIONS
.TP
.B \-n \fINR_CONNECTIONS\fR
Specifies the number of SSH connections. The default value is
calculated from the number of CPU cores on the host with the following
formula: floor(log(nr_cores)*2)+1.
.TP
.B \-m \fICOREMASK\fR
Configures CPU cores to be used by the hexadecimal bitmask. All CPU
cores are used by default.
.TP
.B \-u \fIMAX_STARTUPS\fR
Specifies the number of concurrent outgoing SSH connections.
.B sshd
limits the number of simultaneous SSH connection attempts by
.I MaxStartups
in
.I sshd_config.
The default
.I MaxStartups
is 10; thus, we set the default MAX_STARTUPS 8.
.TP
.B \-I \fIINTERVAL\fR
Specifies the interval (in seconds) between SSH connection
attempts. Some firewall products treat SSH connection attempts from a
single source IP address for a short period as a brute force attack.
This option inserts intervals between the attempts to avoid being
determined as an attack. The default value is 0.
.TP
.B \-s \fIMIN_CHUNK_SIZE\fR
Specifies the minimum chunk size.
.B mscp
divides a file into chunks and copies the chunks in parallel.
.TP
.B \-S \fIMAX_CHUNK_SIZE\fR
Specifies the maximum chunk size. The default is file size divided by
the number of connections.
.TP
.B \-a \fINR_AHEAD\fR
Specifies the number of inflight SFTP commands. The default value is
32.
.TP
.B \-b \fIBUF_SIZE\fR
Specifies the buffer size for I/O and transfer over SFTP. The default
value is 16384. Note that the SSH specification restricts buffer size
delivered over SSH. Changing this value is not recommended at present.
.TP
.B \-v
Increments the verbose output level.
.TP
.B \-q
Quiet mode: turns off all outputs.
.TP
.B \-D
Dry-run mode: it scans source files to be copied, calculates chunks,
and resolves destination file paths. Dry-run mode with
.B -vv
option enables confirming files to be copied and their destination
paths.
.TP
.B \-r
No effect.
.B mscp
copies recursively if a source path is a directory. This option exists
for just compatibility.
.TP
.B \-l \fILOGIN_NAME\fR
Specifies the username to log in on the remote machine as with
.I ssh(1).
.TP
.B \-p,\-P \fIPORT\fR
Specifies the port number to connect to on the remote machine as with
ssh(1) and scp(1).
.TP
.B \-F \fICONFIG\fR
Specifies an alternative per-user ssh configuration file. Note that
acceptable options in the configuration file are what
.I libssh
supports.
.TP
.B \-i \fIIDENTITY\fR
Specifies the identity file for public key authentication.
.TP
.B \-c \fICIPHER\fR
Selects the cipher to use for encrypting the data transfer. See
.UR https://\:www\:.libssh\:.org/\:features/
libssh features
.UE .
.TP
.B \-M \fIHMAC\fR
Specifies MAC hash algorithms. See
.UR https://\:www\:.libssh\:.org/\:features/
libssh features
.UE .
.TP
.B \-C \fICOMPRESS\fR
Enables compression: yes, no, zlib, zlib@openssh.com. The default is
none. See
.UR https://\:www\:.libssh\:.org/\:features/
libssh features
.UE .
.TP
.B \-H
Disables hostkey checking.
.TP
.B \-d
Increments the ssh debug output level.
.TP
.B \-N
Enables Nagle's algorithm. It is disabled by default.
.TP
.B \-h
Prints help.
.SH EXIT STATUS
Exit status is 0 on success, and >0 if an error occurs.
.SH NOTES
.PP
.B mscp
uses glob(3) for globbing pathnames, including matching patterns for
local and remote paths. However, globbing on the
.I remote
side does not work with musl libc (used in Alpine Linux and the
single-binary version of mscp) because musl libc does not support
GLOB_ALTDIRFUNC.
.PP
.B mscp
does not support remote-to-remote copy, which
.B scp
supports.
.SH EXAMPLES
.PP
Copy a local file to a remote host with different name:
.nf
$ mscp ~/src-file 10.0.0.1:copied-file
.fi
.PP
Copy a local file and a directory to /tmp at a remote host:
.nf
$ mscp ~/src-file dir1 10.0.0.1:/tmp
.fi
.PP
In a long fat network, following options might improve performance:
.nf
$ mscp -n 64 -m 0xffff -a 64 -c aes128-gcm@openssh.com src 10.0.0.1:
.fi
.B -n
increases the number of SSH connections than default,
.B -m
pins threads to specific CPU cores,
.B -a
increases asynchronous inflight SFTP WRITE/READ commands, and
.B -c aes128-gcm@openssh.com
will be faster than the default chacha20-poly1305 cipher, particularly
on hosts that support AES-NI.
.SH "SEE ALSO"
.BR scp (1),
.BR ssh (1),
.BR sshd (8).
.SH "PAPER REFERENCE"
Ryo Nakamura and Yohei Kuga. 2023. Multi-threaded scp: Easy and Fast
File Transfer over SSH. In Practice and Experience in Advanced
Research Computing (PEARC '23). Association for Computing Machinery,
New York, NY, USA, 320323.
.UR https://\:doi\:.org/\:10.1145/\:3569951.3597582
DOI
.UE .
.SH CONTACT INFROMATION
.PP
For pathces, bug reports, or feature requests, please open an issue on
.UR https://\:github\:.com/\:upa/\:mscp
GitHub
.UE .
.SH AUTHORS
Ryo Nakamura <upa@haeena.net>

210
doc/mscp.rst Normal file
View File

@@ -0,0 +1,210 @@
====
MSCP
====
:Date: v0.1.2-14-g24617d2
NAME
====
mscp - copy files over multiple SSH connections
SYNOPSIS
========
**mscp** [**-vqDHdNh**] [ **-n**\ *NR_CONNECTIONS* ] [
**-m**\ *COREMASK* ] [ **-u**\ *MAX_STARTUPS* ] [ **-I**\ *INTERVAL* ] [
**-s**\ *MIN_CHUNK_SIZE* ] [ **-S**\ *MAX_CHUNK_SIZE* ] [
**-a**\ *NR_AHEAD* ] [ **-b**\ *BUF_SIZE* ] [ **-l**\ *LOGIN_NAME* ] [
**-p**\ \| **-P**\ *PORT* ] [ **-F**\ *CONFIG* ] [ **-i**\ *IDENTITY* ]
[ **-c**\ *CIPHER* ] [ **-M**\ *HMAC* ] [ **-C**\ *COMPRESS* ] *source
... target*
DESCRIPTION
===========
**mscp** copies files over multiple SSH (SFTP) connections by multiple
threads. It enables transferring (1) multiple files simultaneously and
(2) a large file in parallel, reducing the transfer time for a lot
of/large files over networks.
The usage of **mscp** imitates the **scp** command of *OpenSSH,* for
example:
::
$ mscp srcfile user@example.com:dstfile
Remote hosts only need to run standard **sshd** supporting the SFTP
subsystem, and users need to be able to **ssh** to the hosts as usual.
**mscp** does not require anything else.
**mscp** uses `libssh <https://www.libssh.org>`__ as its SSH
implementation. Thus, supported SSH features, for example,
authentication, encryption, and various options in ssh_config, follow
what *libssh* supports.
OPTIONS
=======
**-n NR_CONNECTIONS**
Specifies the number of SSH connections. The default value is
calculated from the number of CPU cores on the host with the
following formula: floor(log(nr_cores)*2)+1.
**-m COREMASK**
Configures CPU cores to be used by the hexadecimal bitmask. All CPU
cores are used by default.
**-u MAX_STARTUPS**
Specifies the number of concurrent outgoing SSH connections. **sshd**
limits the number of simultaneous SSH connection attempts by
*MaxStartups* in *sshd_config.* The default *MaxStartups* is 10;
thus, we set the default MAX_STARTUPS 8.
**-I INTERVAL**
Specifies the interval (in seconds) between SSH connection attempts.
Some firewall products treat SSH connection attempts from a single
source IP address for a short period as a brute force attack. This
option inserts intervals between the attempts to avoid being
determined as an attack. The default value is 0.
**-s MIN_CHUNK_SIZE**
Specifies the minimum chunk size. **mscp** divides a file into chunks
and copies the chunks in parallel.
**-S MAX_CHUNK_SIZE**
Specifies the maximum chunk size. The default is file size divided by
the number of connections.
**-a NR_AHEAD**
Specifies the number of inflight SFTP commands. The default value is
32.
**-b BUF_SIZE**
Specifies the buffer size for I/O and transfer over SFTP. The default
value is 16384. Note that the SSH specification restricts buffer size
delivered over SSH. Changing this value is not recommended at
present.
**-v**
Increments the verbose output level.
**-q**
Quiet mode: turns off all outputs.
**-D**
Dry-run mode: it scans source files to be copied, calculates chunks,
and resolves destination file paths. Dry-run mode with **-vv** option
enables confirming files to be copied and their destination paths.
**-r**
No effect. **mscp** copies recursively if a source path is a
directory. This option exists for just compatibility.
**-l LOGIN_NAME**
Specifies the username to log in on the remote machine as with
*ssh(1).*
**-p,-P PORT**
Specifies the port number to connect to on the remote machine as with
ssh(1) and scp(1).
**-F CONFIG**
Specifies an alternative per-user ssh configuration file. Note that
acceptable options in the configuration file are what *libssh*
supports.
**-i IDENTITY**
Specifies the identity file for public key authentication.
**-c CIPHER**
Selects the cipher to use for encrypting the data transfer. See
`libssh features <https://www.libssh.org/features/>`__.
**-M HMAC**
Specifies MAC hash algorithms. See `libssh
features <https://www.libssh.org/features/>`__.
**-C COMPRESS**
Enables compression: yes, no, zlib, zlib@openssh.com. The default is
none. See `libssh features <https://www.libssh.org/features/>`__.
**-H**
Disables hostkey checking.
**-d**
Increments the ssh debug output level.
**-N**
Enables Nagle's algorithm. It is disabled by default.
**-h**
Prints help.
EXIT STATUS
===========
Exit status is 0 on success, and >0 if an error occurs.
NOTES
=====
**mscp** uses glob(3) for globbing pathnames, including matching
patterns for local and remote paths. However, globbing on the *remote*
side does not work with musl libc (used in Alpine Linux and the
single-binary version of mscp) because musl libc does not support
GLOB_ALTDIRFUNC.
**mscp** does not support remote-to-remote copy, which **scp** supports.
EXAMPLES
========
Copy a local file to a remote host with different name:
::
$ mscp ~/src-file 10.0.0.1:copied-file
Copy a local file and a directory to /tmp at a remote host:
::
$ mscp ~/src-file dir1 10.0.0.1:/tmp
In a long fat network, following options might improve performance:
::
$ mscp -n 64 -m 0xffff -a 64 -c aes128-gcm@openssh.com src 10.0.0.1:
**-n** increases the number of SSH connections than default, **-m** pins
threads to specific CPU cores, **-a** increases asynchronous inflight
SFTP WRITE/READ commands, and **-c aes128-gcm@openssh.com** will be
faster than the default chacha20-poly1305 cipher, particularly on hosts
that support AES-NI.
SEE ALSO
========
**scp**\ (1), **ssh**\ (1), **sshd**\ (8).
PAPER REFERENCE
===============
Ryo Nakamura and Yohei Kuga. 2023. Multi-threaded scp: Easy and Fast
File Transfer over SSH. In Practice and Experience in Advanced Research
Computing (PEARC '23). Association for Computing Machinery, New York,
NY, USA, 320323. `DOI <https://doi.org/10.1145/3569951.3597582>`__.
CONTACT INFROMATION
===================
For pathces, bug reports, or feature requests, please open an issue on
`GitHub <https://github.com/upa/mscp>`__.
AUTHORS
=======
Ryo Nakamura <upa@haeena.net>