Prevent loading from remote source if file is larger than a given size

孤街浪徒 提交于 2019-12-04 16:51:20

Ok, finally working. The headers solution was obviously not going to work broadly. In this solution, we open a file handle and read the XML line by line until it hits the threshold of $max_B. If the file is too big, we still have the overhead of reading it up until the 10MB mark, but it's working as expected. If the file is less than $max_B, it proceeds...

$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";

$fh = fopen($xml_file, "r");  

if($fh){
    $file_string = '';
    $total_B = 0;
    $max_B = 10485760;
    //run through lines of the file, concatenating them into a string
    while (!feof($fh)){
        if($line = fgets($fh)){
            $total_B += strlen($line);
            if($total_B < $max_B){
                $file_string .= $line;
            } else {
                break;
            }
        }
    } 

    if($total_B < $max_B){
        echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
        //proceed
        $dom = new DOMDocument();
        $dom->loadXML($file_string); //NOTE the method change because we're loading from a string   

    } else {
        //reject
        echo 'File too big! Max size = '.$max_B.' bytes.';  
    }

    fclose($fh);

} else {
    echo '404 file not found!';
}

10MB is equal to 10485760 B. If content-length is not specified, it will use curl which is available since php5. I got this source from somewhere in SO but could not remember it.:

function get_filesize($url) {
    $headers = get_headers($url, 1);
    if (isset($headers['Content-Length'])) return $headers['Content-Length'];
    if (isset($headers['Content-length'])) return $headers['Content-length'];
    $c = curl_init();
    curl_setopt_array($c, array(
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 
         (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) 
          Gecko/20090824 Firefox/3.5.3'),
        ));
    curl_exec($c);
    return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
    }
}
    $filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/");
    if($filesize<=10485760){
        echo 'Fine';
    }else{
       echo $filesize.'File is too big';
    }    

.

Check demo here

Edit: New Answer a bit workaroundish:
You can't check the Dom Elements Length, BUT, you can make a header request and get the filesize from the URL:

<?php

function i_hope_this_works( $XmlUrl ) {
    //lets assume we fk up so we set size to -1  
    $size = -1;

      $request = curl_init( $XmlUrl );

      // Go for a head request, so the body of a 1 gb file will take the same as 1 kb
      curl_setopt( $request, CURLOPT_NOBODY, true );
      curl_setopt( $request, CURLOPT_HEADER, true );
      curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
      curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
      curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );

      $requesteddata = curl_exec( $request );
      curl_close( $request );

      if( $requesteddata ) {
        $content_length = "unknown";
        $status = "unknown";

        if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
          $status = (int)$matches[1];
        }

        if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
          $content_length = (int)$matches[1];
        }

        // you can google status qoutes 200 is Ok for example
        if( $status == 200 || ($status > 300 && $status <= 308) ) {
          $result = $content_length;
        }
      }

      return $result;
    }
    ?>

You should now be able to get every Filesize you want by URL just with

$file_size = i_hope_this_works('yourURLasString')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!