Do Large Language Models with Long Context Windows Work Well?