Skip to content

PA2: Genetic Algorithm

Goal

In this assignment, you will implement a simple genetic algorithm (GA) to evolve a string that matches a given target string. You will use the following string as an example (to test your GA and to answer the question in the README.txt described below).

TARGET_STRING = "Everything Apple announced today: Apple Watch 6 and SE, Apple One, new iPad Air.

The Apple Store went down this morning, heralding another Apple launch day. At the company's virtual event, Tim Cook started out by telling us that the Apple Watch and new additions to the iPad family of tablets would be the highlights -- and we got an Apple Watch Series 6, Apple Watch SE, a redesigned iPad Air debuting the A14 Bionic chip and a new eighth-gen iPad."

The target string is everything within the "" quotations, and your goal is to evolve a string that matches the target string exactly (every character, even the invisible newline one).

As with the previous assignment, you can use any language for this assignment, but your program must run on the CSE machine (where I will evaluate your submission). So do not use external libraries or any extra tools etc. You are also strongly encouraged to use Python as it makes various string manipulation tasks easy.

For this assignment, you must work alone. However, you are allowed to talk to others (e.g., discussing problems, work on the ideas together), but the implementation and submission must be done individually. If you are stuck, you can post your question as a discussion on Canvas or approach me.

Specification

High level algorithm: You will implement a GA algorithm that we discussed in class. Recall that a GA works something like below

  1. Define what the chromosome (candidate solution) would be. In this assignment, a chromosome is represented as a string, i.e., a list of chars, and the size of the list will be exactly the length of the target string
  2. Create a population of random chromosomes
  3. Evaluate chromosomes (you need to define the fitness evaluation such that the one closer to the given target string will be assigned a "better" (lower) fitness, 0 means perfect fitness matching the target string)
  4. Check if we get the solution (or satisfy any termination criteria that you defined)
  5. Select chromosomes
  6. Perform crossover and mutation over selected chromosomes
  7. Repeat

You can be as creative as you want, but your GA must not run for too long and most important, must show progress (e.g., each generation or iteration the avg fitness should improve or the fitness of the best fit chromosome must improve) and must not take up too much space (e.g., do not generate over 50MB of files).

If you use Python, then name your program ga.exe and make it executable so that all I need to do is execute the command ./ga.exe to run it (no input parameters). Thus, the TARGET_STRING variable and its contents above should be built in directly into your program (e.g., a global constant variable). Also, explicitly say whether you use Python2 or Python3 in the README file (below). If you use a different language (e.g., C), then you still need to include TARGET_STRING directly in the code and provide instructions in the README for me to compile your code to obtain a ga.exe that can be executed without any input parameters (e.g., I will just run ./ga.exe).

At the end, the program will output

  1. done:, follows by
  2. the number of chromesomes evaluated,
  3. the number of generation
  4. the best fit string
  5. the fitness score of the best string. If you achieve a perfect fitness 0 (i.e., fitting the target string exactly), say so
  6. the total runtime in seconds

Also, the program will output at every X generations some statistic such as generation number, average fitness score, and the best fitness score and individual (the number X is something you defined, e.g., if you see that it usually takes 1000 iterations then perhaps print out the stat at every 50 iterations).

$ ./ga.exe
generation 0, average fit X, best fit Y
'your best chromosome at this generation 0'
generation 50, average fit X, best fit Y 
'your best chromosome at this generation 50'
generation 100, average fit X, best fit Y 
'your best chromosome at this generation 100'
generation 150, average fit X, best fit Y
'your best chromosome at this generation 150'
....
done: X evaluated chromosomes, Y generations, best fit 0 (perfect sol)
"Everything Apple announced today: Apple Watch 6 and SE, Apple One, new iPad Air.

The Apple Store went down this morning, heralding another Apple launch day. At the company's virtual event, Tim Cook started out by telling us that the Apple Watch and new additions to the iPad family of tablets would be the highlights -- and we got an Apple Watch Series 6, Apple Watch SE, a redesigned iPad Air debuting the A14 Bionic chip and a new eighth-gen iPad."

Important: a proper GA should improve the average fitness every X generation. Thus by outputting these information, you should see that your chromosomes are "evolving" and getting closer to the target string. If your GA does not have this kind of gradual improvement, it is not correct !!

What to Turn In

You must turn in a zip file containing the following files. Use the following names:

  1. the zip file must be named pa2-yourlast-firstname.zip
  2. One single main source file ga.ext, where ext is the extension of the source file of the language of your choice (e.g., .py, .c). Note if you use Python, you can submit either ga.py or rename and submit it as ga.exe. However, as usual, indicate in the README file on how I can run (e.g., do I just do ./ga.exe ? and if that doesn't work then can I do python3 ga.exe ?).
  3. a readme.txt file
  4. 2 screen shots showing how (1) you start your program (e.g., the commands used to compile and run your program) and (2) the end of the run (i.e., the output described above)
  5. Nothing else, do not include exe, binary file, or anything else.

  6. Be sure to test your code extensively the CSE machine. If your code does not build, you will simply get a 0 for the assignment.

The readme.txt file should be a plain ASCII text file (not a Word file, not an RTF file, not an HTML file) consisting of the following:

  1. a brief description how to run your program, e.g.,
$ gcc ga.c -o ga.exe
$ ./ga.exe
....
  1. Answer the following questions about your GA

  2. What is the population size?

  3. How do you evaluate an individual? I.e., give the formula that produces the fitness score
  4. What are the stopping criteria? (the GA loop stops when ...)
  5. How do you select the chromosomes? (using tournament selection or what other methods, what are the tournament sizes etc)
  6. How often do you perform crossover and mutation? Are the crossover and mutation rates fixed or adaptive?
  7. How many chromosomes do you need to evaluate to achieve a chromosome with a perfect fitness for the given target string ?
  8. Additional observations and GA designs

Grading Rubric

PA1 Grading (out of 20 points):

  • 12 points — for a complete GA as described (your points will be deducted if your GA does not gradually evolve the solution)
  • 8 points — for the answers in the README
  • 8 — clear explanations on running the program and thorough answers for the given questions
  • 2 — vague or hard to understand; omits important details
  • 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT