usage of Unix getaddrinfo C function to start set the server

问题

I am building a client-server application in C with the source code taken from the book Advanced Programming in Unix Environment.

In the server it is doing the following:

struct addrinfo hint;
memset(&hint, 0, sizeof(hint));
hint.ai_flags = AI_CANONNAME;
hint.ai_socktype = SOCK_STREAM;
hint.ai_addr = NULL;
hint.ai_next = NULL;
....
if ((n = sysconf(_SC_HOST_NAME_MAX))<0)
{
    n = HOST_NAME_MAX;
}
if((host = malloc(n)) == NULL)
{
    printf("malloc error\n");
    exit(1);
}
if (gethostname(host, n)<0)
{
    printf("gethostname error\n");
    exit(1);
}
...
if((err = getaddrinfo(host, "ruptime", &hint, &ailist))!=0)
{
    syslog(LOG_ERR, "ruptimed: getaddrinfo error %s", gai_strerror(err));
    exit(1);
}
for (aip = ailist; aip!=NULL; aip = aip->ai_next)
{
    if ((sockfd = initserver(SOCK_STREAM, aip->ai_addr, aip->ai_addrlen, QLEN))>=0)
    {
        //printf("starting to serve\n");
        serve(sockfd);
        exit(0);
    }
}

As far as I understood the function getaddrinfo is used to look on the host the socket address structures running the service named ruptime and of type SOCK_STREAM.

Although it was not specified in the book, to work I had to run a new entry in the file /etc/services/ with an unused port and the specified name ruptime:

ruptime         49152/tcp #ruptime Unix System Programming
ruptime         49152/udp #ruptime Unix System Programming

where although unused it was suggested to add also the UDP part.

However the documentation it says

If the AI_PASSIVE flag is specified in hints.ai_flags, and node is NULL, then the returned socket addresses will be suitable for bind(2)ing a socket that will accept(2) connections. The returned socket address will contain the "wildcard address" (INADDR_ANY for IPv4 addresses, IN6ADDR_ANY_INIT for IPv6 address). The wildcard address is used by applications (typically servers) that intend to accept connections on any of the host's network addresses.

So from here and from other discussions on SO something like:

hint.ai_flags |= AI_PASSIVE
...
getaddrinfo(NULL, myserviceport, &hint, &aihint)

seems more suitable.

Exactly what is the difference between these two methods? Is the second looking also for the SOCK_DGM? Is there any reason why in the book the first method was chosen? In the second way since I am specifying the port in the code, does it allow to avoid adding a new entry in the /etc/services/?

Another question. To the client I had to pass the host name. I thought the loopback (client and server are running on the same machine) address would be ok. Instead the hostname is something like ./client MBPdiPippo.lan. What defines the fact that the connection can be created with the hostname but not with the loopback address? Is it that I am passing host as first parameter to the getaddrinfo in the server?

FULL CODE

server.c

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h> //_SC_HOST_NAME_MAX
#include<string.h>
#include<netdb.h> //Here are defined AF_INET and the others of the family
#include<syslog.h> //LOG_ERR
#include<errno.h> //errno
#include <sys/types.h>

#include"utilities.h"
#include "error.h"

#define BUFLEN 128
#define QLEN 10

#ifndef HOST_NAME_MAX
#define HOST_NAME_MAX 156
#endif

int initserver(int type, const struct sockaddr *addr, socklen_t alen, int qlen);
void serve(int sockfd);

int main(int argc, char* argv[])
{
    printf("entered main\n");
    struct addrinfo *ailist, *aip, hint;
    int sockfd, err, n;
    char *host;
    if (argc != 1)
    {
        printf("usage: ruptimed\n");
        exit(1);
    }
    if ((n=sysconf(_SC_HOST_NAME_MAX))<0)
    {
        n = HOST_NAME_MAX;
    }
    if((host = malloc(n)) == NULL)
    {
        printf("malloc error\n");
        exit(1);
    }
    if (gethostname(host, n)<0)
    {
        printf("gethostname error\n");
        exit(1);
    }
    printf("host: %s\n", host);
    printf("Daemonizing\n");
    int res = daemonize("ruptimed");
    printf("%d\n", res);
    printf("Daemonized\n");
    memset(&hint, 0, sizeof(hint)); //set to 0 all bytes
    printf("hint initialized\n");
    hint.ai_flags = AI_CANONNAME;
    hint.ai_socktype = SOCK_STREAM;
    hint.ai_canonname = NULL;
    hint.ai_addr = NULL;
    hint.ai_next = NULL;
    printf("getting addresses\n");
    if((err = getaddrinfo(host, "ruptime", &hint, &ailist))!=0)
    {
        printf("error %s\n", gai_strerror(err));
        syslog(LOG_ERR, "ruptimed: getaddrinfo error %s", gai_strerror(err));
        exit(1);
    }
    printf("Got addresses\n");
    for (aip = ailist; aip!=NULL; aip = aip->ai_next)
    {
        if ((sockfd = initserver(SOCK_STREAM, aip->ai_addr, aip->ai_addrlen, QLEN))>=0)
        {
            printf("starting to serve\n");
            serve(sockfd);
            exit(0);
        }
    }
    exit(1);
}

void serve(int sockfd)
{
    int clfd;
    FILE *fp;
    char buf[BUFLEN];
    set_cloexec(sockfd);
    for(;;)
    {
        /*After listen, the socket can receive connect requests. accept
        retrieves a connect request and converts it into a connection.
        The file returned by accept is a socket descriptor connected to the client that
        called connect, haing the same coket type and family type. The original
        soket remains available to receive otherconneion requests. If we don't care
        about client's identity we can set the second (struct sockaddr *addr)
        and third parameter (socklen_t *len) to NULL*/
        if((clfd = accept(sockfd, NULL, NULL))<0)
        {
            /*This generates a log mesage.
            syslog(int priority, const char *fformat,...)
            priority is a combination of facility and level. Levels are ordered from highest to lowest:
            LOG_EMERG: emergency system unusable
            LOG_ALERT: condiotin that must be fied immediately
            LOG_CRIT: critical condition
            LOG_ERR: error condition
            LOG_WARNING
            LOG_NOTICE
            LOG_INFO
            LOG_DEBUG
            format and other arguements are passed to vsprintf function forf formatting.*/
            syslog(LOG_ERR, "ruptimed: accept error: %s", strerror(errno));
            exit(1);
        }
        /* set the FD_CLOEXEC file descriptor flag */
        /*it causes the file descriptor to be automatically and atomically closed
         when any of the exec family function is called*/
        set_cloexec(clfd);
        /**pg. 542 Since a common operation is to create a pipe to another process
        to either read its output or write its input Stdio has provided popen and
        pclose: popen creates pipe, close the unused ends of the pipe,
        forks a child and call exec to execute cmdstr and
        returns a file pointer (connected to std output if "r", to stdin if "w").
        pclose closes the stream, waits for the command to terminate*/
        if ((fp = popen("/usr/bin/uptime", "r")) == NULL)
        {
            /*sprintf copy the string passed as second parameter inside buf*/
            sprintf(buf, "error: %s\n", strerror(errno));
            /*pag 610. send is similar to write. send(int sockfd, const void *buf, size_t nbytes, it flags)*/
            send(clfd, buf, strlen(buf),0);
        }
        else
        {
            /*get data from the pipe that reads created to exec /usr/bin/uptime */
            while(fgets(buf, BUFLEN, fp)!=NULL)
            {
                /* clfd is returned by accept and it is a socket descriptor
                connected to the client that called connect*/
                send(clfd, buf, strlen(buf), 0);
            }
            /*see popen pag. 542*/
            pclose(fp);
        }
        close(clfd);
    }
}


int initserver(int type, const struct sockaddr *addr, socklen_t alen, int qlen)
{
    int fd, err;
    int reuse = 1;
    if ((fd = socket(addr->sa_family, type, 0))<0)
    {
        return (-1);
    }
    if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(int))<0)
    {
        goto errout;
    }
    if(bind(fd, addr, alen)<0)
    {
        goto errout;
    }
    if (type == SOCK_STREAM || type == SOCK_SEQPACKET)
    {
        if(listen(fd, qlen)<0)
        {
            goto errout;
        }
    }
    return fd;
    errout:
        err = errno;
        close (fd);
        errno = err;
        return(-1);
}

utilities.c: containing the demonize and setcloexec functions. In daemonize function I did not close file descriptors for debugging.

#include "utilities.h"
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <syslog.h>
#include <sys/time.h>//getrlimit
#include <sys/resource.h>//getrlimit
#include <signal.h> //sigempyset , asigcation (umask?)
#include <sys/resource.h>
#include <fcntl.h> //O_RDWR
#include <stdarg.h>

#include "error.h"
int daemonize(const char *cmd)
{
    int fd0, fd1, fd2;
    unsigned int i;
    pid_t pid;
    struct rlimit       rl;
    struct sigaction    sa;
    /* *Clear file creation mask.*/
    umask(0);
    /* *Get maximum number of file descriptors. */
    if (getrlimit(RLIMIT_NOFILE, &rl) < 0)
    {
        err_quit("%s: can’t get file limit", cmd);
    }
    /* *Become a session leader to lose controlling TTY. */
    if ((pid = fork()) < 0)
    {
        err_quit("%s: can’t fork", cmd);
    }
    else if (pid != 0) /* parent */
    {
        exit(0); //the parent will exit
    }
    setsid();
    /* *Ensure future opens won’t allocate controlling TTYs. */
    sa.sa_handler = SIG_IGN;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    if (sigaction(SIGHUP, &sa, NULL) < 0)
    {
        err_quit("%s: can’t ignore SIGHUP", cmd);
    }
    if ((pid = fork()) < 0)
    {
        err_quit("%s: can’t fork", cmd);
    }
    else if (pid != 0) /* parent */
    {
        exit(0);
    }
    /*
    *Change the current working directory to the root so
    * we won’t prevent file systems from being unmounted.
    */
    if (chdir("/") < 0)
    {
        err_quit("%s: can’t change directory to /", cmd);
    }
    /* Close all open file descriptors. */
    if (rl.rlim_max == RLIM_INFINITY)
    {
        rl.rlim_max = 1024;
    }
    printf("closing file descriptors\n");
    /*for (i = 0; i < rl.rlim_max; i++)
    {
        close(i);
    }*/
    /* *Attach file descriptors 0, 1, and 2 to /dev/null.*/
    //printf not working
    /*printf("closed all file descriptors for daemonizing\n");*/
    /*fd0 = open("/dev/null", O_RDWR);
    fd1 = dup(0);
    fd2 = dup(0);*/
    /* *Initialize the log file. Daemons do not have a controlling terminal so
    they can't write to stderror. We don't want them to write to the console device
    because on many workstations the control device runs a windowing system. They can't
    write on separate files either. A central daemon error-logging facility is required.
    This is the BSD. 3 ways to generate log messages:
    1) kernel routines call the log function. These messages can be read from /dev/klog
    2) Most user processes (daemons) call syslog to generate log messages. This causes
    messages to be sent to the UNIX domain datagram socket /dev/log
    3) A user process on this host or on other host connected to this with TCP/ID
    can send log messages to UDP port 514. Explicit network programmin is required
    (it is not managed by syslog.
    The syslogd daemon reads al three of log messages.

    openlog is optional since if not called, syslog calls it. Also closelog is optional
    openlog(const char *ident, int option, int facility)
    It lets us specify ident that is added to each logmessage. option is a bitmask:
        LOG_CONS tells that if the log message can't be sent to syslogd via UNIX
        domain datagram, the message is written to the console instead.
    facility lets the configuration file specify that messages from different
    facilities are to be handled differently. It can be specified also in the 'priority'
    argument of syslog. LOG_DAEMON is for system deamons
    */
    /*
    openlog(cmd, LOG_CONS, LOG_DAEMON);
    if (fd0 != 0 || fd1 != 1 || fd2 != 2)
    {*/
        /*This generates a log mesage.
        syslog(int priority, const char *fformat,...)
        priority is a combination of facility and level. Levels are ordered from highest to lowest:
        LOG_EMERG: emergency system unusable
        LOG_ALERT: condiotin that must be fied immediately
        LOG_CRIT: critical condition
        LOG_ERR: error condition
        LOG_WARNING
        LOG_NOTICE
        LOG_INFO
        LOG_DEBUG

        format and other arguements are passed to vsprintf function forf formatting.*/
        /*syslog(LOG_ERR, "unexpected file descriptors %d %d %d", fd0, fd1, fd2);
        exit(1);
    }*/
    return 0;
}

/*The function set the FD_CLOEXEC flag of the file descriptor already open that
is passed to as parameter. FD_CLOEXEC causes the file descriptor to be
automatically and atomically closed when any of the exec family function is
called*/
int set_cloexec(int fd)
{
    int val;
    /* retrieve the flags of the file descriptor */
    if((val = fcntl(fd, F_GETFD, 0))<0)
    {
        return -1;
    }
    /* set the FD_CLOEXEC file descriptor flag */
    /*it causes the file descriptor to be automatically and atomically closed
     when any of the exec family function is called*/
    val |= FD_CLOEXEC;
    return (fcntl(fd, F_SETFD, val));
}

error functions I used

/* Fatal error unrelated to a system call.
* Print a message and terminate*/
void err_quit (const char *fmt, ...)
{
    va_list ap;
    va_start (ap, fmt);
    err_doit (0, 0, fmt, ap);
    va_end (ap);
    exit(1);
}

/*Print a message and return to caller.
*Caller specifies "errnoflag"*/
static void err_doit(int errnoflag, int error, const char *fmt, va_list ap)
{
    char buf [MAXLINE];
    vsnprintf (buf, MAXLINE-1, fmt, ap);
    if (errnoflag)
    {
        snprintf (buf+strlen(buf), MAXLINE-strlen(buf)-1, ": %s",
            strerror (error));
    }
    strcat(buf, "\n");
    fflush(stdout); /*in case stdout and stderr are the same*/
    fputs (buf, stderr);
    fflush(NULL); /* flushes all stdio output streams*/
}

回答1:

First, a nitpick. The getaddrinfo() code should be incorporated into the initserver() function, and the linked list of socket structures freed (using freeaddrinfo()) after the loop. This makes the code much more maintainable; you want to keep tightly coupled implementations close together.

Exactly what is the difference between these two methods?

Binding to the wildcard address (i.e., using NULL node and AI_PASSIVE flag when obtaining suitable socket descriptions using getaddrinfo()) means the socket is bound to all network interfaces as a set, not to a specific network interface. When you bind to a specific node name, you bind to a specific network interface.

In practice, it means that if additional network interfaces become available at run time, the kernel will consider them when routing packets to/from sockets bound to the wildcard address.

It really should be a choice made by each system administrator, as there are use cases where the service (your application) should listen for incoming connections on all network interfaces, but also other use cases where the service should listen for incoming connections on a specific or some specific interfaces only. A typical case is when a machine is connected to multiple networks. It is surprisingly common for servers. For practical cases, see e.g. how the Apache web server can be configured.

Personally, I would rewrite OP's initServer() function to look something like the following:

enum {
    /* TCP=1, UDP=2, IPv4=4, IPv6=8 */
    SERVER_TCPv4 = 5,   /* IPv4 | TCP */
    SERVER_UDPv4 = 6,   /* IPv4 | UDP */
    SERVER_TCPv6 = 9,   /* IPv6 | TCP */
    SERVER_UDPv6 = 10,  /* IPv6 | UDP */
    SERVER_TCP   = 13,  /* Any  | TCP */
    SERVER_UDP   = 14   /* Any  | UDP */
};

int initServer(const char *host, const char *port,
               const int type, const int backlog)
{
    struct addrinfo  hints, *list, *curr;
    const char      *node;
    int              family, socktype, result, fd;

    if (!host || !*host || !strcmp(host, "*"))
        node = NULL;
    else
        node = host;

    switch (type) {
    case SERVER_TCPv4: family = AF_INET;   socktype = SOCK_STREAM; break;
    case SERVER_TCPv6: family = AF_INET6;  socktype = SOCK_STREAM; break;
    case SERVER_TCP:   family = AF_UNSPEC; socktype = SOCK_STREAM; break;
    case SERVER_UDPv4: family = AF_INET;   socktype = SOCK_DGRAM;  break;
    case SERVER_UDPv6: family = AF_INET6;  socktype = SOCK_DGRAM;  break;
    case SERVER_UDP:   family = AF_UNSPEC; socktype = SOCK_DGRAM;  break;
    default:
        fprintf(stderr, "initServer(): Invalid server type.\n");
        return -1;
    }
    memset(&hints, 0, sizeof hints);
    hints.ai_flags = AI_PASSIVE;
    hints.ai_family = family;
    hints.ai_socktype = socktype;
    hints.ai_protocol = 0;
    hints.ai_canonname = NULL;
    hints.ai_addr = NULL;
    hints.ai_next = NULL;
    result = getaddrinfo(node, port, &hints, &list);
    if (result) {
        /* Fail. Output error message to standard error. */
        fprintf(stderr, "initServer(): %s.\n", gai_strerror(result));
        return -1;
    }

    fd = -1;
    for (curr = list; curr != NULL; curr = curr->ai_next) {
        int  reuse = 1;

        fd = socket(curr->ai_family, curr->ai_socktype, curr->ai_protocol);
        if (fd == -1)
            continue;

        if (bind(fd, curr->ai_addr, curr->ai_addrlen) == -1) {
            close(fd);
            fd = -1;
            continue;
        }

        if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
                        &reuse, sizeof (int)) == -1) {
            close(fd);
            fd = -1;
            continue;
        }

        if (listen(fd, backlog) == -1) {
            close(fd);
            fd = -1;
            continue;
        }

        break;
    }
    freeaddrinfo(list);
    if (fd == -1) {
        fprintf(stderr, "initServer(): Cannot bind to a valid socket.\n");
        return -1;
    }

    return fd;
}

(Note: code is untested, not even compiled; but the underlying logic is sound. If you find any issues or errors, let me know in a comment, so I can review, check, and fix if necessary.)

This way, you can read the host and port from a configuration file. If host is "*", empty, or NULL, the function will attempt to bind to the wildcard address. (This should be the default, by the way; if the server administrator wants to limit to a specific interface, they can supply either the IP address, or the host name corresponding to that interface.)

Similarly, the system admin can use the configuration file to specify port as any string defined in the services database (getent services), or as a decimal number string; in OP's case, both "49152" and "ruptime" would both work.

Since I am specifying the port in the code, does it allow to avoid adding a new entry in the /etc/services/?

The services database (run getent services to see it on your machine) contains only the mapping between service names and port numbers for TCP (SOCK_STREAM) and/or UDP (SOCK_DGRAM) protocols.

The only way you can avoid having to add the ruptime 49152/tcp entry to your services database, is to specify the port as a decimal number string, "49152" instead of name "ruptime". This affects both servers and clients. (That is, even if your server knows ruptime is port 49152 for TCP sockets, the clients won't know that unless they have it in their own services database.)

Usually, most admins do not bother editing the services database, and use the explicit port numbers instead. When you have a firewall installed (and related utilities like fail2ban, which I recommend even on workstations and laptops), it is easier to maintain the rules if the port numbers are clearly shown in the service configuration files.

I'd use the port number, myself.

To the client running on the same machine I had to pass the host name. I thought the loopback address would work. What defines the fact that the connection can be created with the hostname but not with the loopback address? Is it that I am passing host as first parameter to the getaddrinfo in the server?

Yes. If you bind the service to the wildcard address, it will respond to requests on all network interfaces, including the loopback address.

If you bind to a specific host name, it will only respond to requests to that specific network interface.

(This is done by the OS kernel, and is part of how network packets are routed to userspace applications.)

This also means that a "proper" internet-enabled service that binds to specific host names (rather than the wildcard address) should really be able to listen for incoming connections on several sockets, rather than only one. It may not be absolutely necessary, or even needed in most use cases, but I can tell you it sure comes in handy when the service is run on a machine straddling several different networks, and you want to provide the service to only some of them. Fortunately, you can make listening socket nonblocking (using fcntl(fd, F_SETFL, O_NONBLOCK) -- I also recommend using fcntl(fd, F_SETFD, O_CLOEXEC) on systems that define O_CLOEXEC, so that the listening sockets are not accidentally passed on to child processes that execute external binaries), and then use select() or poll() to wait for accept()able connections; each socket becomes readable when a connection arrives.

来源：https://stackoverflow.com/questions/53972934/usage-of-unix-getaddrinfo-c-function-to-start-set-the-server

标签

unix

server

posix

getaddrinfo