HDFS is not necessary but recommendations appear in some places.
To help evaluate the effort spent in getting HDFS running:
What are the benefits of
HDFS (or any distributed Filesystems) makes distributing your data much simpler. Using a local filesystem you would have to partition/copy the data by hand to the individual nodes and be aware of the data distribution when running your jobs. In addition HDFS also handles failing nodes failures. From an integration between Spark and HDFS, you can imagine spark knowing about the data distribution so it will try to schedule tasks to the same nodes where the required data resides.
Second: which problems did you face exactly with the instruction?
BTW: if you are just looking for an easy setup on AWS, DCOS allows you to install HDFS with a single command...